TriViewBench: Controlled Complexity Scaling for Multi-View Structural Reasoning in MLLMs

📰 ArXiv cs.AI

arXiv:2606.26029v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) demonstrate strong performance on standard visual question answering benchmarks, yet their scalability under controlled structural complexity remains poorly understood. We introduce TriViewBench, a controlled three-view visual reasoning benchmark constructed from synthetic 3D scenes with explicitly parameterized object count and occlusion. The benchmark contains 1,923 scenes and over 14K Question-Answer (Q

Published 25 Jun 2026
Read full paper → ← Back to Reads