MMRareBench: A Rare-Disease Multimodal and Multi-Image Medical Benchmark

arXivarX

A benchmarking framework and dataset designed to evaluate Multimodal Large Language Models (MLLMs) on their ability to diagnose and reason about rare diseases using multi-image clinical evidence.

View on arXiv

Defensibility

5.0/10

citations

co_authors

Platform Dominationlow

Market Consolidationlow

Displacement Horizon1-2 years

REASONING

MMRareBench addresses a critical gap in medical AI: the evaluation of models in 'data-scarce' scenarios where common knowledge fails. While general medical benchmarks (like MedQA or PathVQA) focus on common conditions, rare disease diagnosis requires high-fidelity reasoning across multiple images (e.g., MRI slices or longitudinal data), which is a known weakness of current MLLMs. The project scores a 5 for defensibility because, while the code is a standard evaluation harness, the curated rare-disease dataset and the specific focus on multi-image evidence provide a niche moat. Rare disease data is notoriously difficult to aggregate and annotate, creating a barrier to entry. The 12 forks against 0 stars in just 5 days suggest immediate interest from the research community (likely peer researchers or labs replicating results from the associated paper). The primary risk is that frontier labs like Google (Med-PaLM) or OpenAI may eventually ingest enough rare-disease literature to bypass the need for specialized reasoning, but for now, this benchmark serves as a necessary 'hard' test for medical AI. Platform domination risk is low because big tech prefers general-purpose clinical tools over the highly fragmented and low-volume rare disease market.

COMPOSABILITY

TECH STACK

pythonpytorchmllmvision-transformerllavaqwen-vl

INTEGRATION

reference_implementation

multimodal_reasoningrare_disease_diagnosismulti_image_alignmentclinical_evidence_integration

READINESS

Composabilityalgorithm

Depthreference_implementation