Foundations

Reinforcement Learning

RL algorithms, reward modelling, RLHF, policy gradients, Q-learning and multi-agent RL

37

lessons

Skills in this topic

3 skills — Sign in to track your progress

View full skill map →

Formalise a problem as an MDP

Policy Gradient Methods

Implement REINFORCE from scratch

RLHF & Alignment

Describe the RLHF pipeline end-to-end

Videos 20 Reads 17

All Reads (17) Articles (11)Tutorials (3)Research Papers (3)

Level: All Beginner Intermediate Advanced

Newest Popular Oldest

ArXiv cs.AI 🎮 Reinforcement Learning 📄 Paper ⚡ AI Lesson 1w ago

Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

arXiv:2605.30576v1 Announce Type: new Abstract: Exploration in reinforcement learning for autonomous driving is inherently unsafe: agents must experience novel

ArXiv cs.AI 🎮 Reinforcement Learning 📄 Paper ⚡ AI Lesson 2w ago

Exploiting Local Dynamics Regularity for Reusable Skills in Offline Hierarchical RL

arXiv:2605.26371v1 Announce Type: new Abstract: Hierarchical Reinforcement Learning (HRL) promises to solve long-horizon Reinforcement Learning (RL) tasks more

The More I Tuned My Reward Function, The Worse My RL Agent Got

Medium · AI 🎮 Reinforcement Learning ⚡ AI Lesson 2w ago

The More I Tuned My Reward Function, The Worse My RL Agent Got

A practical lesson from building a drone navigation agent and why simpler rewards often win in reinforcement learning Continue reading on Towards AI »

The More I Tuned My Reward Function, The Worse My RL Agent Got

Medium · Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 2w ago

The More I Tuned My Reward Function, The Worse My RL Agent Got

A practical lesson from building a drone navigation agent and why simpler rewards often win in reinforcement learning Continue reading on Towards AI »

Medium · Deep Learning 🎮 Reinforcement Learning ⚡ AI Lesson 2w ago

Building Adaptive Game AI with Reinforcement Learning in Unity

Most enemies in video games are predictable. Continue reading on Medium »

Reinforcement Learning in Chip Design

Medium · AI 🎮 Reinforcement Learning ⚡ AI Lesson 2w ago

Reinforcement Learning in Chip Design

Continue reading on AI Simplified in Plain English »

Reinforcement Learning in Chip Design

Medium · Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 2w ago

Reinforcement Learning in Chip Design

Continue reading on AI Simplified in Plain English »

Intelligent Routing with Reinforcement Learning (RL)

Medium · Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 4w ago

Intelligent Routing with Reinforcement Learning (RL)

 Reinforcement Learning (RL) is transforming network optimization by enabling systems to learn from real-time interactions. Instead of… Continue reading on Med

Dev.to AI 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago

Understanding Reinforcement Learning with Neural Networks Part 2: Why Backpropagation Is Not Enough

In the previous article , we explored an example where reinforcement learning is required and standard methods do not work. In this article, we will understand

AWS Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago

Overcoming reward signal challenges: Verifiable rewards-based reinforcement learning with GRPO on SageMaker AI

In this post, you will learn how to implement reinforcement learning with verifiable rewards (RLVR) to introduce verification and transparency into reward signa

The Four Conditions: A Framework for Making Correctness the Path of Least Resistance in RLVR

Medium · Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago

The Four Conditions: A Framework for Making Correctness the Path of Least Resistance in RLVR

You can read every RLVR paper from the last two years — DeepSeek-R1, DAPO, SCOPE, the Tsinghua mode-collapse analysis, the reward hacking… Continue reading on M

The Death of RLHF: A Practitioner’s Guide to the New Post-Training Stack

Medium · Deep Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago

The Death of RLHF: A Practitioner’s Guide to the New Post-Training Stack

GRPO, DAPO, and RLVR didn’t just improve on RLHF — they replaced it. Here’s why the old recipe broke, and what’s actually shipping now. Continue reading on Towa

ArXiv cs.AI 🎮 Reinforcement Learning 📄 Paper ⚡ AI Lesson 1mo ago

Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation

arXiv:2604.14032v1 Announce Type: new Abstract: Reinforcement learning has shown promise for automating power-grid operation tasks such as topology control and

RLHF Explained: The Secret Sauce That Makes Models Smarter

Medium · Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago

RLHF Explained: The Secret Sauce That Makes Models Smarter

In 2022, OpenAI released InstructGPT, a model 100× smaller than GPT-3 that humans consistently preferred. The secret wasn’t architecture… Continue reading on Le

Medium · LLM 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago

Proximal Policy Optimization (PPO) from Background to Full Implementation

Medium-ready version with equations prepared as clean renderable blocks 25 min read · Beginner to Advanced · Python Implementation Continue reading on Medium »

BAIR Blog 🎮 Reinforcement Learning 📄 Paper ⚡ AI Lesson 1y ago

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

<meta name="author" content="Nathan Li

Lilian Weng's Blog 🎮 Reinforcement Learning ⚡ AI Lesson 6y ago

Curriculum for Reinforcement Learning

[Updated on 2020-02-03: mentioning PCG in the “Task-Specific Curriculum” section. <spa