GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays

arXivarX

A benchmark dataset and evaluation framework that uses radiologist eye-tracking data to measure the clinical realism and authenticity of AI-generated chest X-rays.

View on arXiv

Defensibility

7.0/10

citations

co_authors

Platform Dominationlow

Market Consolidationlow

Displacement Horizon3+ years

REASONING

GazeVaLM occupies a high-value niche at the intersection of medical AI safety and human-computer interaction. The project's defensibility (Score: 7) is driven by 'data gravity' and the high cost of acquisition: recruiting 16 expert radiologists for nearly 1,000 gaze recordings is a non-trivial logistical and financial undertaking that cannot be easily replicated by software engineers. While the image set itself is small (60 images), the multi-modal gaze data (fixations, scanpaths, saliency) provides a unique ground truth for how experts distinguish real from synthetic medical data. Frontier labs (OpenAI, Google) are unlikely to compete directly here, as they focus on general-purpose generative models rather than the specialized clinical validation of those models. The 25 forks at age 0 with 0 stars suggests this is a coordinated release from a research lab, likely intended as a foundational benchmark for the 'Visual Turing Test' in medicine. Its primary competition comes from larger medical datasets like REFLACX or MIMIC-EYE, but GazeVaLM's specific focus on synthetic/generative AI realism makes it a timely and unique contribution to the field of AI-generated medical evidence validation.

COMPOSABILITY

TECH STACK

PythonDiffusion ModelsPyTorchEye-tracking Hardware (Tobii/EyeLink implied)Medical Imaging (DICOM/PNG)

INTEGRATION

reference_implementation

medical_imaging_evaluationgaze_tracking_analysissynthetic_data_validationclinical_perception_modeling

READINESS

Composabilityalgorithm

Depthreference_implementation