StyleBench: Evaluating thinking styles in Large Language Models
📰 ArXiv cs.AI
arXiv:2509.20868v2 Announce Type: replace-cross Abstract: Structured reasoning can improve the inference performance of large language models (LLMs), but it also introduces computational cost and control constraints. When additional reasoning structure helps, and when it instead reduces efficiency or robustness, remains poorly understood. We propose StyleBench, where we study reasoning structure as a capacity-constrained design choice rather than a fixed inference recipe. We evaluate five repres
DeepCamp AI