Reflect to Inform: Boosting Multimodal Reasoning via Information-Gain-Driven Verification

📰 ArXiv cs.AI

arXiv:2603.26348v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) achieve strong multimodal reasoning performance, yet we identify a recurring failure mode in long-form generation: as outputs grow longer, models progressively drift away from image evidence and fall back on textual priors, resulting in ungrounded reasoning and hallucinations. Interestingly, Based on attention analysis, we find that MLLMs have a latent capability for late-stage visual verification that is

Published 30 Mar 2026

Read full paper → ← Back to News