CORE FUNCTION

A full-stack web application providing a GUI for benchmarking local and OpenAI-compatible LLMs against standard datasets like MMLU and GSM8K.

TRACTION

stars

↑0.1 velocity

forks

0.0 velocity

REASONING

This project is a nascent wrapper around standard LLM benchmarks. While the addition of a Vue/FastAPI dashboard is convenient for local users, it lacks a technical moat and faces heavy competition from established, industry-standard tools like EleutherAI's LM Evaluation Harness and OpenCompass, as well as native evaluation suites from frontier labs.

COMPOSABILITY

TECH STACK

PythonFastAPIVue.jsOpenAI APIPydantic

INTEGRATION

docker_container

llm_evaluationbenchmark_automationmodel_comparisonlocal_inference_testing

READINESS

Composabilityapplication

Depthprototype

Noveltyreimplementation