Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine

📰 ArXiv cs.AI

arXiv:2603.06665v2 Announce Type: replace-cross Abstract: Large vision-language models (VLMs) often benefit from chain-of-thought (CoT) prompting in general domains, yet its efficacy in medical vision-language tasks remains underexplored. We report a counter-intuitive trend: on medical visual question answering, CoT frequently underperforms direct answering (DirA) across general-purpose and medical-specific models. We attribute this to a \emph{medical perception bottleneck}: subtle, domain-speci

Published 13 Apr 2026

Read full paper → ← Back to Reads