Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models
📰 ArXiv cs.AI
arXiv:2511.00710v4 Announce Type: replace Abstract: Recent studies posit that Reinforcement Learning with Verifiable Rewards (RLVR) primarily amplifies behaviors inherent to the pre-training distribution rather than inducing new capabilities, but these insights are predominantly limited to language-only domains, leaving the dynamics of visual-centric spatial reasoning under-explored. To examine the impact of RLVR on the capability boundaries of Vision-Language Models (VLMs), we introduce \textbf
DeepCamp AI