Reinforcement-aware Knowledge Distillation for LLM Reasoning

📰 ArXiv cs.AI

arXiv:2602.22495v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought reasoning large language models (LLMs), but the high inference cost of such models motivates distillation into smaller students. Most existing knowledge distillation (KD) methods are designed for supervised fine-tuning (SFT), relying on fixed teacher traces or teacher-student Kullback-Leibler (KL) divergence-based regularization. When combin

Published 13 Apr 2026

Read full paper → ← Back to Reads