Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

📰 ArXiv cs.AI

arXiv:2603.03332v3 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents a comprehensive empirical evaluation of LLM robustness to a structured taxonomy of 5 CoT perturbation types: \textit{MathError, UnitConversion, Sycophancy, SkippedSteps,} and \text

Published 20 Apr 2026

Read full paper → ← Back to Reads