Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective
📰 ArXiv cs.AI
arXiv:2510.10150v4 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) serves as a cornerstone technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, its training is often plagued by \emph{entropy collapse}, a rapid decline in policy entropy that limits exploration and undermines training effectiveness. While recent works attempt to mitigate this issue via several heuristic entropy interventions, the underlying mechan
DeepCamp AI