Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective

📰 ArXiv cs.AI

arXiv:2604.08880v1 Announce Type: cross Abstract: Chain-of-thought (CoT) distillation transfers reasoning behaviors from a strong teacher to a smaller student, but prior work reports a capacity gap: distillation may fail when the teacher-student capability mismatch is large. We revisit the capacity gap from a practical perspective by re-examining commonly used experimental settings. Notably, we find that CoT distillation often degrades performance compared to the student's pre-distillation basel

Published 13 Apr 2026
Read full paper → ← Back to Reads