Collected molecules will appear here. Add from search or explore.
Extracts subject-predicate-object (SPO) triplets from Chinese unstructured data (text, images, PDFs) using LLMs and visualizes them as a knowledge graph.
Defensibility
stars
26
forks
3
The project is a straightforward implementation of a common RAG-adjacent workflow: using LLMs for named entity recognition and relationship extraction to build a knowledge graph. With only 26 stars and 3 forks over a year, it lacks the momentum or community of major competitors like LangChain (GraphIndex) or LlamaIndex. The technical approach is a standard pipeline of OCR/PDF-parsing followed by prompting, which is increasingly becoming a native capability of frontier models (e.g., GPT-4o, Gemini 1.5 Pro, and Chinese-specific models like Qwen-VL or DeepSeek). These models handle multimodal inputs directly, making the manual OCR-to-prompting pipeline in this repo redundant. The project serves more as a personal tutorial or experiment rather than a defensible piece of infrastructure. There is no unique dataset, fine-tuned model, or specialized algorithm that provides a moat against either frontier labs or more mature open-source frameworks.
TECH STACK
INTEGRATION
reference_implementation
READINESS