jiangnanboy/llm-knowledge-graph

GitHubGH

Extracts subject-predicate-object (SPO) triplets from Chinese unstructured data (text, images, PDFs) using LLMs and visualizes them as a knowledge graph.

View on GitHub

Defensibility

2.0/10

stars

forks

Platform Dominationhigh

Market Consolidationhigh

Displacement Horizon6 months

REASONING

The project is a straightforward implementation of a common RAG-adjacent workflow: using LLMs for named entity recognition and relationship extraction to build a knowledge graph. With only 26 stars and 3 forks over a year, it lacks the momentum or community of major competitors like LangChain (GraphIndex) or LlamaIndex. The technical approach is a standard pipeline of OCR/PDF-parsing followed by prompting, which is increasingly becoming a native capability of frontier models (e.g., GPT-4o, Gemini 1.5 Pro, and Chinese-specific models like Qwen-VL or DeepSeek). These models handle multimodal inputs directly, making the manual OCR-to-prompting pipeline in this repo redundant. The project serves more as a personal tutorial or experiment rather than a defensible piece of infrastructure. There is no unique dataset, fine-tuned model, or specialized algorithm that provides a moat against either frontier labs or more mature open-source frameworks.

COMPOSABILITY

TECH STACK

PythonLarge Language ModelsOCRPDF ParsingKnowledge Graph Visualization

INTEGRATION

reference_implementation

knowledge_graph_constructionspo_extractionchinese_nlpmultimodal_extraction

READINESS

Composabilityapplication

Depthprototype

Noveltyreimplementation