✕ Clear all filters
11 articles

Articles

11 articles · Updated every 3 hours · View all reads

All Articles 75,434Blog Posts 102,388Tech Tutorials 18,459Research Papers 16,003News 13,146 ⚡ AI Lessons
AWS Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago
Overcoming reward signal challenges: Verifiable rewards-based reinforcement learning with GRPO on SageMaker AI
In this post, you will learn how to implement reinforcement learning with verifiable rewards (RLVR) to introduce verification and transparency into reward signa
The Four Conditions: A Framework for Making Correctness the Path of Least Resistance in RLVR
Medium · Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago
The Four Conditions: A Framework for Making Correctness the Path of Least Resistance in RLVR
You can read every RLVR paper from the last two years — DeepSeek-R1, DAPO, SCOPE, the Tsinghua mode-collapse analysis, the reward hacking… Continue reading on M
The Death of RLHF: A Practitioner’s Guide to the New Post-Training Stack
Medium · Deep Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago
The Death of RLHF: A Practitioner’s Guide to the New Post-Training Stack
GRPO, DAPO, and RLVR didn’t just improve on RLHF — they replaced it. Here’s why the old recipe broke, and what’s actually shipping now. Continue reading on Towa
RLHF Explained: The Secret Sauce That Makes Models Smarter
Medium · Machine Learning 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago
RLHF Explained: The Secret Sauce That Makes Models Smarter
In 2022, OpenAI released InstructGPT, a model 100× smaller than GPT-3 that humans consistently preferred. The secret wasn’t architecture… Continue reading on Le
Medium · LLM 🎮 Reinforcement Learning ⚡ AI Lesson 1mo ago
Proximal Policy Optimization (PPO) from Background to Full Implementation
Medium-ready version with equations prepared as clean renderable blocks 25 min read · Beginner to Advanced · Python Implementation Continue reading on Medium »
BAIR Blog 🎮 Reinforcement Learning 📄 Paper ⚡ AI Lesson 1y ago
Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment
<meta name="author" content="Nathan Li
Lilian Weng's Blog 🎮 Reinforcement Learning ⚡ AI Lesson 6y ago
Curriculum for Reinforcement Learning
[Updated on 2020-02-03: mentioning PCG in the &ldquo;Task-Specific Curriculum&rdquo; section. <spa