MaxRL Theory Overview feat. Fahim Tajwar and Guanning Zeng
One of the most important contributions to the RLVR space in recent months is in my opinion the Maximum Likelihood Reinforcement Learning or MaxRL, from Fahim Tajwar, Guanning Zeng.
In this video I'm covering this methods and where it sits with the rest of the literature along with chatting with the first authors and the head of the lab Andrea Zanette!
# Important Links:
👉 MaxRL Paper: https://arxiv.org/pdf/2602.02710
👉 fahim twitter: https://x.com/FahimTajwar10
👉 guanning twitter: https://x.com/guanningzeng
👉 andrea twitter: https://x.com/Zanette_ai
📌Also if you are an early beginner: learn to code from full-stack to AI with Scrimba https://scrimba.com/?via=yacineMahdid (extra 20% off pro with my link, great resource, I love the team)
# Table of Content
- intro: 0:00
- paper walkthrough overview: 1:50
- result to keep in mind: 9:44
- definitions: 16:15
- MaxRL Objective: 24:27
- Results and Takeaway: 42:55
- first authors interview: 47:10
- intuition behind MaxRL: 51:39
- links between the literature: 54:55
- are there other principled weighting function beyond max likelihood: 58:26
- why do other algo degrade pass(at)K: 1:02:29
- how much of the discovery was empirically found: 1:06:33
- grpo easy upweighting finding: 1:13:06
- negative samples / exploration in MaxRL: 1:14:50
- does variance reduction help with exploration: 1:15:32
- is there still massive gains in increasing T to 1028: 1:19:44
- how much gain with increase in model size: 1:20:55
- curiosity driven curriculum and MaxRL: 1:25:44
- negative gradients inclusion into the algo: 1:27:39
- is binary reward settings and continuous setting that different: 1:34:30
- what's next in that research direction: 1:36:03
- conclusion: 1:49:24
Additional Relevant Literature:
👉 A Minimalist Approach to LLM Reasoning: from Rejection
Sampling to Reinforce: https://arxiv.org/pdf/2504.11343
👉 What is the objective of reasoning with reinforcement
learning?: https://arxiv.org/pdf/2510.13651
Enjoy!
----
Jo
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Related AI Lessons
⚡
⚡
⚡
⚡
The ABCs of reading medical research and review papers these days
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
ArXiv cs.AI
🎓
Tutor Explanation
DeepCamp AI