Skills › Reinforcement Learning

Policy Gradient Methods

Implement policy gradient algorithms — REINFORCE, PPO, and Actor-Critic.

0%
Confidence · no data yet
Sign in to track

After this skill you can…

  • Implement REINFORCE from scratch
  • Train a PPO agent with Stable-Baselines3
  • Explain the advantage function in Actor-Critic

Prerequisites

Watch (10 videos)

Proximal Policy Optimization Implementation: 9 Atari-specific Details (2/3)
Weights & Biases · beginner hands-on
→ Use policy gradient methods for PPO
Implementing DeepMind's DQN from scratch! | Project Update
Aleksa Gordić - The AI Epiphany · beginner hands-on
→ Develop policy gradient methods→ Improve reinforcement learning models
Reinforcement Learning Course: Intro to Advanced Actor Critic Methods
freeCodeCamp.org · beginner hands-on
→ Apply policy gradient methods→ Optimize policies in reinforcement learning
An introduction to Policy Gradient methods - Deep Reinforcement Learning
arXiv Insights · beginner hands-on
→ Implement PPO in PyTorch or TensorFlow→ Analyze the trade-offs between sample efficiency and code complexity
Build a board game app with policy gradient (Reinforcement learning with TensorFlow Agents)
TensorFlow · beginner hands-on
→ Implement policy gradient reinforcement learning→ Use TensorFlow Agents for policy-based algorithms
Proximal Policy Optimization | ChatGPT uses this
CodeEmporium · advanced hands-on
→ Apply policy gradient methods in a Reinforcement Learning algorithm
Policy Gradient in One Minute
Jia-Bin Huang · intermediate hands-on
→ Apply Policy Gradient methods to real-world problems→ Analyze GAE and TRPO algorithms
Cutting-Edge Topics in Deep Reinforcement Learning
Coursera · advanced hands-on
→ Apply advanced exploration strategies in RL→ Optimize RL policies using black-box optimization
Lightning Talk: TorchRL - RLHF Support - Vincent Moens, Meta
PyTorch · intermediate hands-on
→ Apply policy gradient methods→ Use TorchRL for RL tasks
Research talk: Safe reinforcement learning using advantage-based intervention
Microsoft Research · advanced
→ Develop policy gradient methods→ Optimize policies for safe rl