MaxRL Theory Overview feat. Fahim Tajwar and Guanning Zeng

Deep Learning with Yacine · Beginner ·📄 Research Papers Explained ·1mo ago
One of the most important contributions to the RLVR space in recent months is in my opinion the Maximum Likelihood Reinforcement Learning or MaxRL, from Fahim Tajwar, Guanning Zeng. In this video I'm covering this methods and where it sits with the rest of the literature along with chatting with the first authors and the head of the lab Andrea Zanette! # Important Links: 👉 MaxRL Paper: https://arxiv.org/pdf/2602.02710 👉 fahim twitter: https://x.com/FahimTajwar10 👉 guanning twitter: https://x.com/guanningzeng 👉 andrea twitter: https://x.com/Zanette_ai 📌Also if you are an early beginner: learn to code from full-stack to AI with Scrimba https://scrimba.com/?via=yacineMahdid (extra 20% off pro with my link, great resource, I love the team) # Table of Content - intro: 0:00 - paper walkthrough overview: 1:50 - result to keep in mind: 9:44 - definitions: 16:15 - MaxRL Objective: 24:27 - Results and Takeaway: 42:55 - first authors interview: 47:10 - intuition behind MaxRL: 51:39 - links between the literature: 54:55 - are there other principled weighting function beyond max likelihood: 58:26 - why do other algo degrade pass(at)K: 1:02:29 - how much of the discovery was empirically found: 1:06:33 - grpo easy upweighting finding: 1:13:06 - negative samples / exploration in MaxRL: 1:14:50 - does variance reduction help with exploration: 1:15:32 - is there still massive gains in increasing T to 1028: 1:19:44 - how much gain with increase in model size: 1:20:55 - curiosity driven curriculum and MaxRL: 1:25:44 - negative gradients inclusion into the algo: 1:27:39 - is binary reward settings and continuous setting that different: 1:34:30 - what's next in that research direction: 1:36:03 - conclusion: 1:49:24 Additional Relevant Literature: 👉 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce: https://arxiv.org/pdf/2504.11343 👉 What is the objective of reasoning with reinforcement learning?: https://arxiv.org/pdf/2510.13651 Enjoy! ---- Jo
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI
Up next
Microsoft Research Forum | Season 2, Episode 4
Microsoft Research
Watch →