Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
A comprehensive meta-repository and curated collection of tools, datasets, and models specifically for Chinese and English Natural Language Processing (NLP), covering over 100+ niche sub-tasks.
Utility
stars
79,929
forks
15,156
funNLP is the 'Swiss Army Knife' of the Chinese NLP ecosystem. With nearly 80k stars and 15k forks, it represents a massive community effort to aggregate niche linguistic resources (sensitive words, medical dictionaries, name-gender mappings, etc.) that are often difficult to find in one place. Its defensibility stems from 'data gravity' and the sheer breadth of its collection—replicating the code is easy, but replicating the curated lists of millions of specific Chinese entities, slang, and domain-specific terms is a significant task. However, it faces a severe 'frontier risk' from Large Language Models (LLMs). Frontier labs (OpenAI, Anthropic, and local leaders like Baidu/Zhipu) have built models that natively handle about 60-70% of the tasks in this repo (summarization, NER, sentiment analysis, gender inference) via zero-shot prompting. The project's most resilient components are the high-quality, labeled datasets and domain-specific knowledge graphs (medical, legal, financial) which remain valuable for fine-tuning or RAG pipelines. From a competitive standpoint, it is a non-commercial community pillar that acts as a 'discovery layer' rather than a unified product. While the individual scripts face high displacement risk from LLM APIs, the repo remains a primary reference for developers building localized Chinese applications.
TECH STACK
INTEGRATION
reference_implementation
READINESS
The reusable building blocks distilled from this project — each a mechanism you could lift into your own.
Prompt -> Map<ModelName, Response>
Dispatch a single prompt to multiple downstream LLM APIs concurrently to aggregate and compare their outputs.
List<ComparisonResult> -> Map<ModelID, EloScore>
Calculate relative Elo rating changes for LLMs based on win/loss outcomes from blind A/B comparisons.