Collected molecules will appear here. Add from search or explore.
A full-stack web application providing a GUI for benchmarking local and OpenAI-compatible LLMs against standard datasets like MMLU and GSM8K.
stars
1
forks
0
This project is a nascent wrapper around standard LLM benchmarks. While the addition of a Vue/FastAPI dashboard is convenient for local users, it lacks a technical moat and faces heavy competition from established, industry-standard tools like EleutherAI's LM Evaluation Harness and OpenCompass, as well as native evaluation suites from frontier labs.
TECH STACK
INTEGRATION
docker_container
READINESS