Understanding R1-Zero Training From First Principles
R1-Zero sparked a replication wave across the AI research community. Zichen Liu explains what his team found when they dug deeper from GRPO instabilities to the precise conditions that give rise to the aha moment and what that means for anyone trying to study R1-Zero-like training.
Watch on YouTube ↗
(saves to browser)
DeepCamp AI