MaxRL Theory Overview feat. Fahim Tajwar and Guanning Zeng

Name: MaxRL Theory Overview feat. Fahim Tajwar and Guanning Zeng
Uploaded: 2026-03-19T13:11:37+00:00
Channel: Deep Learning with Yacine
Description: One of the most important contributions to the RLVR space in recent months is in my opinion the Maximum Likelihood Reinforcement Learning or MaxRL, fro...

Deep Learning with Yacine · Beginner ·📄 Research Papers Explained ·1mo ago

One of the most important contributions to the RLVR space in recent months is in my opinion the Maximum Likelihood Reinforcement Learning or MaxRL, from Fahim Tajwar, Guanning Zeng. In this video I'm covering this methods and where it sits with the rest of the literature along with chatting with the first authors and the head of the lab Andrea Zanette! # Important Links: 👉 MaxRL Paper: https://arxiv.org/pdf/2602.02710 👉 fahim twitter: https://x.com/FahimTajwar10 👉 guanning twitter: https://x.com/guanningzeng 👉 andrea twitter: https://x.com/Zanette_ai 📌Also if you are an early beginner: learn to code from full-stack to AI with Scrimba https://scrimba.com/?via=yacineMahdid (extra 20% off pro with my link, great resource, I love the team) # Table of Content - intro: 0:00 - paper walkthrough overview: 1:50 - result to keep in mind: 9:44 - definitions: 16:15 - MaxRL Objective: 24:27 - Results and Takeaway: 42:55 - first authors interview: 47:10 - intuition behind MaxRL: 51:39 - links between the literature: 54:55 - are there other principled weighting function beyond max likelihood: 58:26 - why do other algo degrade pass(at)K: 1:02:29 - how much of the discovery was empirically found: 1:06:33 - grpo easy upweighting finding: 1:13:06 - negative samples / exploration in MaxRL: 1:14:50 - does variance reduction help with exploration: 1:15:32 - is there still massive gains in increasing T to 1028: 1:19:44 - how much gain with increase in model size: 1:20:55 - curiosity driven curriculum and MaxRL: 1:25:44 - negative gradients inclusion into the algo: 1:27:39 - is binary reward settings and continuous setting that different: 1:34:30 - what's next in that research direction: 1:36:03 - conclusion: 1:49:24 Additional Relevant Literature: 👉 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce: https://arxiv.org/pdf/2504.11343 👉 What is the objective of reasoning with reinforcement learning?: https://arxiv.org/pdf/2510.13651 Enjoy! ---- Jo

Watch on YouTube ↗ (saves to browser)