Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

📰 ArXiv cs.AI

arXiv:2511.00710v4 Announce Type: replace Abstract: Recent studies posit that Reinforcement Learning with Verifiable Rewards (RLVR) primarily amplifies behaviors inherent to the pre-training distribution rather than inducing new capabilities, but these insights are predominantly limited to language-only domains, leaving the dynamics of visual-centric spatial reasoning under-explored. To examine the impact of RLVR on the capability boundaries of Vision-Language Models (VLMs), we introduce \textbf

Published 15 Apr 2026

Read full paper → ← Back to Reads