Policy Gradient in One Minute
This is a (very) quick, one-minute summary of the development of Policy Gradient algorithms over the past 30 years.
Check out this video for a more detailed explanation. https://youtu.be/mg-iU-WxiNs
References:
- REINFORCE https://link.springer.com/content/pdf/10.1007/BF00992696.pdf
- Actor-critic: https://arxiv.org/abs/1602.01783
- GAE: https://arxiv.org/abs/1506.02438
- TRPO: https://arxiv.org/abs/1502.05477
- PPO: https://arxiv.org/abs/1707.06347
- GRPO: https://arxiv.org/pdf/2402.03300
- DeepSeek-R1: https://arxiv.org/abs/2501.12948
- Dr. GRPO: https://arxiv.org/abs/2503.20783
Video mad…
Watch on YouTube ↗
(saves to browser)
DeepCamp AI