lias-laboratory/cidoccrm-llm-extractor

GitHubGH

Automated extraction of RDF triples from archaeological text into the CIDOC CRM (Conceptual Reference Model) ontology using LLMs.

View on GitHub

Defensibility

3.0/10

stars

Platform Dominationlow

Market Consolidationhigh

Displacement Horizon1-2 years

REASONING

The cidoccrm-llm-extractor addresses a highly specialized niche: the automation of CIDOC CRM knowledge graph population. While CIDOC CRM (ISO 21127) is a notoriously complex and deep ontology used in cultural heritage and archaeology, the project's defensibility is low due to its limited adoption (9 stars, 0 forks) and lack of community momentum over its year-long existence. From a technical standpoint, it functions as a specialized wrapper around LLM prompting and RDF serialization. While frontier labs (OpenAI, Google) will never build a dedicated 'CIDOC CRM' extractor, their general-purpose structured output capabilities (like GPT-4o's structured outputs or Gemini's function calling) increasingly commoditize the core logic of this project. The 'moat' here would be the domain-specific prompt engineering and validation logic, but without a significant dataset or user base, it remains a reproducible prototype. For institutions like museums or research labs, this is a useful reference implementation, but it lacks the infrastructure-grade hardening required for production-scale archival work. Competitors include general LLM extraction frameworks (LangChain, LlamaIndex) which can be configured for CIDOC CRM with relatively low effort, and specialized cultural heritage platforms like Arches that might eventually bake in these LLM features.

COMPOSABILITY

TECH STACK

PythonOpenAI APIRDFLibCIDOC CRM OntologyJSON-LD

INTEGRATION

library_import

ontology_mappingknowledge_graph_populationentity_extractionstructured_data_generation

READINESS

Composabilitycomponent

Depthprototype

Noveltyincremental