An introduction to Policy Gradient methods - Deep Reinforcement Learning

arXiv Insights · Beginner ·📄 Research Papers Explained ·7y ago
In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning. After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks. If you want to support this channel, here is my patreon link: https://patreon.com/ArxivInsights --- You are amazing!! ;) If you have questions you would like to discuss with me personally…
Watch on YouTube ↗ (saves to browser)
Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning
Next Up
Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning
Latent Space