How to solve Reinforcement Learning when there are ZERO rewards (Curiosity & RND)

Neural Breakdown with AVB · Beginner ·📄 Research Papers Explained ·1y ago
In this video, we will learn about two great RL methods for self supervised exploration - Curiosity and Random Network Distillation (RND). We will use a popular on-policy RL algorithm (A2C or Advantage Actor-Critic) to do our exploration of this fascinating space. Along the way, we will learn some code examples in Python and Pytorch, and study why exactly these methods work and the various challenges they solve. Curiosity teaches agents to explore states that they find the least predictable, and RND teaches agents to explore states that are the most "novel". The neural network architectures behind the two are incredibly simple yet genius, and they avoid noisy reward hacking solutions for a more streamlined intrinsic rewarding approach. #reinforcementlearning #ai #deeplearning Follow on Twitter: https://x.com/neural_avb Buy me a coffee at https://ko-fi.com/neuralavb ! To join our Patreon, visit: https://www.patreon.com/NeuralBreakdownwithAVB Members get access to EVERYTHING behind-the-scenes that go into producing my videos. Plus, it supports the channel in a big way and helps to pay my bills. Papers: Curiosity: https://arxiv.org/abs/1705.05363 RND: https://arxiv.org/abs/1810.12894 A2C: https://arxiv.org/abs/1602.01783
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI
Up next
Microsoft Research Forum | Season 2, Episode 4
Microsoft Research
Watch →