axxafo/awesome-agent-benchmarks

GitHub

View on GitHub

2.0/10

Platform Domination RiskN/A

Market Consolidation RiskN/A

Displacement HorizonN/A

CORE FUNCTION

Curated directory and discovery tool for LLM agent benchmark datasets

TRACTION

stars

0.0 velocity

forks

0.0 velocity

REASONING

This is a static curated list ('awesome' format) with no code, no active maintenance (0 velocity over 499 days), zero forks, and minimal engagement (3 stars). It serves as a directory/survey of existing benchmarks rather than implementing novel methodology or providing tooling. The README promises 'discover and evaluate' but appears to be purely informational—a taxonomy of links to external benchmark datasets (WebArena, ARC, etc.) rather than an interactive evaluation framework or novel benchmark itself. Defensibility is extremely low: anyone can fork and maintain a similar list, there are no switching costs, and the value is entirely in curation effort rather than technical moat. Frontier labs are not at risk because this isn't a tool or model—it's a reading list. Low frontier risk because OpenAI/Anthropic have their own internal benchmark suites and don't need a crowdsourced markdown directory. This is categorically a personal knowledge-sharing project with no users, no code dependencies, and no ecosystem lock-in. Scores as tutorial/demo tier.

COMPOSABILITY

TECH STACK

markdowngithub

INTEGRATION

reference_implementation

benchmark_curationdataset_discoveryevaluation_resources

READINESS

Composabilitytheoretical

Depthsurvey

Noveltyderivative