Seeing Isn't Orienting: A Cognitively Grounded Benchmark Reveals Systematic Orientation Failures in MLLMs

arXivarX

DORI (Discriminative Orientation Reasoning Intelligence) is a cognitively grounded benchmark designed to evaluate how well Multi-modal Large Language Models (MLLMs) perceive and reason about object orientation, a critical skill for robotics and AR that current benchmarks overlook.

View on arXiv

Defensibility

3.0/10

citations

co_authors

Platform Dominationlow

Market Consolidationlow

Displacement Horizon1-2 years

REASONING

DORI addresses a specific, high-value blind spot in current MLLM evaluation: spatial orientation reasoning. While models like GPT-4o and Gemini are excellent at scene description, they frequently fail at fine-grained orientation tasks essential for robotic manipulation. The project is currently in the 'paper release' stage (0 stars, 7 forks, 7 days old), which explains its low defensibility score; benchmarks only gain a moat through community adoption and becoming a 'standard' for leaderboard prestige. Frontier labs are unlikely to build this themselves—they prefer to compete on benchmarks created by academia—but they will likely optimize their models to perform better on DORI if it gains traction. The primary competition comes from broader spatial benchmarks like RealWorldQA (xAI) or MMMU. The 7 forks so soon after release suggest initial interest from the research community, likely peer reviewers or collaborators. The defensibility is low because the methodology can be replicated, but the first-mover advantage in a specific niche like 'orientation perception' provides a temporary intellectual moat.

COMPOSABILITY

TECH STACK

PythonPyTorchMLLM Evaluation FrameworksOpenAI APIHugging Face Transformers

INTEGRATION

reference_implementation

visual_reasoningobject_orientationmllm_evaluationspatial_intelligencebenchmark_dataset

READINESS

Composabilityalgorithm

Depthreference_implementation