BartAmin/Clustered-Dynamic-RAG

GitHubGH

A benchmarking framework and implementation of Clustered Dynamic Retrieval-Augmented Generation (CDRAG), which uses hierarchical document clustering and LLM-based routing to optimize retrieval context.

View on GitHub

Defensibility

2.0/10

stars

Platform Dominationhigh

Market Consolidationmedium

Displacement Horizon6 months

REASONING

The project is a 1-day-old personal experiment with zero stars or forks, representing a prototype implementation of hierarchical RAG. While the specific 'CDRAG' acronym might be local to this project, the underlying technique—clustering documents and using an LLM to navigate a tree of summaries or clusters—is a well-established pattern in advanced RAG (e.g., the RAPTOR paper or LlamaIndex's 'Recursive Retriever'). Defensibility is near-zero as it lacks a community, proprietary dataset, or unique technical moat beyond a standard benchmark on the Legal RAG Bench. Frontier labs like OpenAI (SearchGPT) and Anthropic are rapidly integrating advanced retrieval logic directly into their inference pipelines. Furthermore, orchestration frameworks like LangChain and LlamaIndex already provide 'auto-merging' or 'hierarchical' retrieval primitives that achieve similar or superior results with better abstraction. Displacement risk is high because this specific implementation is likely to be superseded by standardized library features within months.

COMPOSABILITY

TECH STACK

PythonLLM (likely OpenAI/Anthropic)Scikit-learnFAISS or similar vector storeLegal RAG Bench dataset

INTEGRATION

reference_implementation

hierarchical_retrievaldocument_clusteringrag_optimizationlegal_domain_adaptation

READINESS

Composabilityalgorithm

Depthprototype

Novelty