Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization

📰 ArXiv cs.AI

arXiv:2508.10164v2 Announce Type: replace Abstract: Recent advances in Large Reasoning Models (LRMs) have demonstrated strong performance on complex tasks through long Chain-of-Thought (CoT) reasoning. However, their lengthy outputs increase computational costs and may lead to overthinking, raising challenges in balancing reasoning effectiveness and efficiency. Current solutions often compromise reasoning quality or require extensive resources. In this paper, we investigate how to reduce the gen

Published 16 Apr 2026
Read full paper → ← Back to Reads