Policy Gradient in One Minute

Jia-Bin Huang · Intermediate ·📄 Research Papers Explained ·11mo ago

Skills: RL Foundations90%Policy Gradient Methods80%

This is a (very) quick, one-minute summary of the development of Policy Gradient algorithms over the past 30 years. Check out this video for a more detailed explanation. https://youtu.be/mg-iU-WxiNs References: - REINFORCE https://link.springer.com/content/pdf/10.1007/BF00992696.pdf - Actor-critic: https://arxiv.org/abs/1602.01783 - GAE: https://arxiv.org/abs/1506.02438 - TRPO: https://arxiv.org/abs/1502.05477 - PPO: https://arxiv.org/abs/1707.06347 - GRPO: https://arxiv.org/pdf/2402.03300 - DeepSeek-R1: https://arxiv.org/abs/2501.12948 - Dr. GRPO: https://arxiv.org/abs/2503.20783 Video made with Manim: https://www.manim.community/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: RL Foundations

View skill →

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Build a Doom AI Model with Python | Gaming Reinforcement Learning Full Course

Nicholas Renotte

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Deep Reinforcement Learning for Atari Games Python Tutorial | AI Plays Space Invaders

Nicholas Renotte

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Training & Testing Deep reinforcement learning (DQN) Agent - Reinforcement Learning p.6

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

How to Win Slot Machines - Intro to Deep Learning #13

How to Win Slot Machines - Intro to Deep Learning #13

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Build an Mario AI Model with Python | Gaming Reinforcement Learning

Nicholas Renotte

Related AI Lessons

The ABCs of reading medical research and review papers these days

Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.

Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity

How to Set Up a Karpathy-Style Wiki for Your Research Field

Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively

The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap

Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research

Microsoft Research Forum | Season 2, Episode 4

Microsoft Research