Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training

📰 ArXiv cs.AI

arXiv:2605.07316v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the former may degrade accuracy and induce underthinking, whereas the latter assumes that substantial portions of reasoning traces can be safely truncated. To obtain a compression signal without these limita

Published 11 May 2026

Read full paper → ← Back to Reads