Exploring “Self-Distillation for Reinforcement Learning and Continual Learning” with Jonas and Idan

Name: Exploring “Self-Distillation for Reinforcement Learning and Continual Learning” with Jonas and Idan
Uploaded: 2026-03-26T14:08:32+00:00
Channel: Deep Learning with Yacine
Description: Today we’re exploring an interesting paradigm that is gaining steam in the reinforcement learning and continual learning space : self-distillation We’re...

Deep Learning with Yacine · Beginner ·🤖 AI Agents & Automation ·1mo ago

Today we’re exploring an interesting paradigm that is gaining steam in the reinforcement learning and continual learning space : self-distillation We’re going to interview the authors of “Reinforcement Learning via Self-Disitllation” and “Self Distillation enable Continual Learning” Jonas Hübotter and Idan Shenfeld! The basic idea is to use the student itself as the teacher but with feedback from the environment about what went wrong. The trick is to have the teacher “comment” on the student output tokens using its logits to create a sort of dense reward at the token level instead of one reward per roll out. It’s pretty cool since it doesn’t require the teacher to generate a roll out which ends up creating a lot of complexity. What I like about this paradigm is that it: is relatively simple and just works bootstrap the learning using the models own in-context learning (like reasoning) Is flexible for multiple type of learning methodology Scale with model size This family of methods is already being implemented in agentic systems like OpenClaw RL and frontier open source models like GLM-5 use a similar sort of methodology in their post-training pipeline! Come hang out and ask questions! 👏

Watch on YouTube ↗ (saves to browser)