SkillFactory: Self-Distillation For Learning Cognitive Behaviors

📰 ArXiv cs.AI

arXiv:2512.04072v2 Announce Type: replace-cross Abstract: Reasoning models leveraging long chains of thought employ various cognitive skills, such as verification of their answers, backtracking, retrying by an alternate method, and more. Previous work has shown that when a base language model exhibits these skills, training that model further with reinforcement learning (RL) can learn to leverage them. How can we get models to leverage skills that aren't exhibited by base models? Our work, Skill

Published 13 Apr 2026
Read full paper → ← Back to Reads