Collected molecules will appear here. Add from search or explore.
Benchmark evaluation suite for Large Language Models using the 2024 Chinese Gaokao Mathematics examination questions with a focus on mitigating data contamination.
Defensibility
stars
19
forks
1
The project is a niche evaluation tool focused on a specific point-in-time dataset (Gaokao 2024 Math). With only 19 stars and 1 fork, it lacks significant community adoption or network effects. Its primary value—'zero-contamination'—is highly perishable as frontier models (especially Chinese models like Qwen, DeepSeek, and Baichuan) rapidly ingest new public examination data into their post-training and fine-tuning pipelines. The project faces extreme competition from institutional benchmark platforms like OpenCompass (Shanghai AI Lab), which provide much broader coverage across multiple subjects and years. For frontier labs, evaluating on the Gaokao is a standard internal procedure used for marketing and capability reports, making an external, low-visibility tool like this redundant. The '671 days' age compared to the '2024' content suggests the repository was likely repurposed or renamed, indicating a lack of long-term strategic focus on this specific benchmark.
TECH STACK
INTEGRATION
reference_implementation
READINESS