Policy Gradient in One Minute
This is a (very) quick, one-minute summary of the development of Policy Gradient algorithms over the past 30 years.
Check out this video for a more detailed explanation. https://youtu.be/mg-iU-WxiNs
References:
- REINFORCE https://link.springer.com/content/pdf/10.1007/BF00992696.pdf
- Actor-critic: https://arxiv.org/abs/1602.01783
- GAE: https://arxiv.org/abs/1506.02438
- TRPO: https://arxiv.org/abs/1502.05477
- PPO: https://arxiv.org/abs/1707.06347
- GRPO: https://arxiv.org/pdf/2402.03300
- DeepSeek-R1: https://arxiv.org/abs/2501.12948
- Dr. GRPO: https://arxiv.org/abs/2503.20783
Video made with Manim: https://www.manim.community/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: RL Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The ABCs of reading medical research and review papers these days
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
ArXiv cs.AI
🎓
Tutor Explanation
DeepCamp AI