Exploring “Self-Distillation for Reinforcement Learning and Continual Learning” with Jonas and Idan
Today we’re exploring an interesting paradigm that is gaining steam in the reinforcement learning and continual learning space : self-distillation
We’re going to interview the authors of “Reinforcement Learning via Self-Disitllation” and “Self Distillation enable Continual Learning” Jonas Hübotter and Idan Shenfeld!
The basic idea is to use the student itself as the teacher but with feedback from the environment about what went wrong. The trick is to have the teacher “comment” on the student output tokens using its logits to create a sort of dense reward at the token level instead of one re…
Watch on YouTube ↗
(saves to browser)
DeepCamp AI