GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays

arXivarX

A clinical-grade benchmark dataset and evaluation framework that uses radiologist eye-tracking data to assess the authenticity and diagnostic realism of AI-generated chest X-rays.

View on arXiv

Defensibility

6.0/10

citations

co_authors

Platform Dominationlow

Market Consolidationlow

Displacement Horizon3+ years

REASONING

GazeVaLM addresses a critical bottleneck in medical AI: the gap between pixel-level accuracy and clinical utility. Its defensibility (Score: 6) is derived from 'expert data gravity.' Recruiting 16 radiologists to provide 960 high-fidelity gaze recordings is a logistically intensive and expensive process that creates a high barrier for replication. The project's 25 forks within one day of release, despite 0 stars, suggests a highly coordinated academic launch or significant latent interest in the research community. While frontier labs like Google and OpenAI are developing medical foundation models (e.g., Med-PaLM), they typically lack the niche, multi-observer gaze data required to validate specific clinical perception errors in synthetic imagery. The risk of platform domination is low because GazeVaLM serves as an auditor or benchmark rather than a primary inference service. The primary threat would be a larger entity (like RSNA or ACR) standardizing a different gaze-tracking protocol, but GazeVaLM's first-mover advantage in the synthetic realism space gives it a strong head start. It provides a unique 'Visual Turing Test' framework that will likely be integrated into the evaluation pipelines of generative medical AI startups (e.g., Artisan, Rad AI) rather than being displaced by them.

COMPOSABILITY

TECH STACK

pythonpytorchgaze-tracking-analysisdiffusion-modelsmedical-imaging-libraries

INTEGRATION

reference_implementation

medical_ai_benchmarkingeye_tracking_analysisclinical_realism_assessmentsynthetic_data_validationradiology_expert_modeling

READINESS

Composabilityframework

Depthreference_implementation