When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling

📰 ArXiv cs.AI

arXiv:2604.10739v1 Announce Type: new Abstract: Scaling test-time compute through extended chains of thought has become a dominant paradigm for improving large language model reasoning. However, existing research implicitly assumes that longer thinking always yields better results. This assumption remains largely unexamined. We systematically investigate how the marginal utility of additional reasoning tokens changes as compute budgets increase. We find that marginal returns diminish substantial

Published 14 Apr 2026

Read full paper → ← Back to Reads