DRAGON: A Benchmark for Evidence-Grounded Visual Reasoning over Diagrams

📰 ArXiv cs.AI

arXiv:2604.25231v1 Announce Type: cross Abstract: Diagram question answering (DQA) requires models to interpret structured visual representations such as charts, maps, infographics, circuit schematics, and scientific diagrams. Recent vision-language models (VLMs) often achieve high answer accuracy on these tasks, yet correct answers do not guarantee that models ground their reasoning in the diagram regions that support the prediction. Models may instead rely on textual correlations or dataset ar

Published 29 Apr 2026

Read full paper → ← Back to Reads