OnChainAIIntel/pqs-mcp-server

GitHubGH

An MCP server that scores user-provided AI prompts using a “prompt quality score” rubric, returning grades (A–F), a numeric score out of 40, percentile, and breakdown across 8 quality dimensions before the prompt is sent to an LLM.

View on GitHub

Defensibility

2.0/10

stars

Platform Dominationhigh

Market Consolidationhigh

Displacement Horizon6 months

REASONING

Quantitative signals indicate extremely low adoption and momentum: 0 stars, 0 forks, and 0.0/hr velocity with only ~12 days of age. This is consistent with a new/early prototype and provides no evidence of a stable user base, ecosystem, or data flywheel. Defensibility (2/10): The concept—grading prompts across dimensions and returning a score/grade—is largely a known pattern in prompt evaluation/rubric-based assessment (e.g., prompt review checklists, LLM-as-a-judge, and automated evaluation frameworks). The main differentiator claimed (“named AI prompt quality score”, A–F, out of 40, 8 dimensions) is presentation/packaging of an evaluation rubric rather than a clearly new scoring methodology. With no measurable traction, and with no demonstrated proprietary dataset, model weights, or long-running benchmark dataset that would create switching costs, the project is easily cloned or replaced. Frontier risk (high): Frontier labs can readily integrate prompt-quality scoring into existing developer tools (eval frameworks, tracing, prompt management, or safety/policy linting). Even if they don’t replicate the exact rubric labels, they can add an adjacent feature: “pre-send prompt evaluation” as part of the platform developer experience. Since the tool is positioned as scoring before model inference, it overlaps with capabilities that platform products naturally add (prompt lint/evals, automated judge scoring, quality feedback). With near-zero user adoption and early stage maturity, it’s unlikely to establish a moat before platforms subsume it. Key threats / displacing actors: - Platform absorption: OpenAI/Anthropic/Google developer ecosystems can add prompt-scoring or rubric evaluation directly (or via their eval APIs/tracing tooling). They can also use their own judge models to replicate scoring outputs. - Ecosystem displacement: LangChain/LangGraph, LlamaIndex, and existing eval toolchains (e.g., common “LLM-as-judge” and rubric scoring utilities) can implement this same workflow as a few components, reducing differentiation. Why platform_domination_risk is high: The functionality is not inherently tied to exclusive infrastructure or proprietary data. It’s a generic “prompt evaluation service” that big platforms can offer quickly through their existing models and evaluation endpoints. Any large provider could standardize rubric evaluation as a feature. Why market_consolidation_risk is high: Developer tooling in this space tends to consolidate into a few dominant ecosystems (model-provider tooling, mainstream orchestration/eval frameworks). Without differentiation via unique datasets, integration lock-in, or a standard that others are forced to follow, incumbents can converge quickly. Why displacement_horizon is 6 months: Given 12 days age and no traction signals, there is little time to build defensible assets (benchmarks, dataset rights, integrations, documented methodology, or community lock-in). A platform feature or a common library implementation could make this redundant quickly once the idea becomes part of standard evaluation/prompt-lint workflows. Opportunities (what could improve defensibility if the project evolves): - Publish and maintain a benchmark dataset and scoring methodology with transparency, consistency checks, and longitudinal validation. - Provide an open standard (schemas, rubric definitions, test suites) that others adopt, creating de facto lock-in. - Build integrations with popular MCP clients and orchestration frameworks, plus an ecosystem of prompt libraries/real-world scored examples. - Demonstrate accuracy/utility via measured correlation with downstream task success (not just self-reported quality). That would shift it from a packaging/prototype to an evidence-backed evaluation product. Given current information (no stars/forks/velocity; early age; generic rubric-style scoring positioning), defensibility remains very low and frontier obsolescence risk is very high.

COMPOSABILITY

TECH STACK

INTEGRATION

api_endpoint

prompt_quality_scoringllm_pre_inference_evaluationrubric_dimension_breakdownmcp_server_integration

READINESS

Composabilityapplication

Depthprototype

Noveltyincremental