SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

📰 ArXiv cs.AI

arXiv:2511.21471v4 Announce Type: replace Abstract: Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal large language models (MLLMs) have made significant strides, existing benchmarks often oversimplify spatial cognition, reducing it to a single-dimensional metric, which fails to capture the hierarchical structure and interdependence of spatial abilities. To address this gap, we propose a

Published 9 May 2026

Read full paper → ← Back to Reads