GRPO 2.0? DAPO LLM Reinforcement Learning Explained

AI Papers Academy · Beginner ·📄 Research Papers Explained ·1y ago

Skills: Reading ML Papers90%LLM Engineering80%

In this video, we break down DAPO: An Open-Source LLM Reinforcement Learning System at Scale — a new research paper from ByteDance that introduces DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization), a powerful reinforcement learning (RL) algorithm built on GRPO (Grouped Relative Policy Optimization). DAPO tackles key challenges in training large language models (LLMs) with RL, especially issues encountered when trying to reproduce DeepSeek-R1’s results. The researchers trained Qwen2.5-32B with DAPO, achieving 50 points on the challenging AIME 2024 benchmark — outperforming DeepSeek-R1's 47 points while using only 50% of the training steps. Written Review - https://aipapersacademy.com/dapo/ Paper - https://arxiv.org/abs/2503.14476 Code & Dataset - https://github.com/BytedTsinghua-SIA/DAPO #ai #reinforcementlearning #llm #deepseek #grpo #dapo #rl #airesearch ___________________ 🔔 Subscribe for more AI paper reviews! 📩 Join the newsletter → https://aipapersacademy.com/newsletter/ Patreon - https://www.patreon.com/aipapersacademy The video was edited using VideoScribe - https://tidd.ly/44TZEiX ___________________ Chapters: 0:00 Introduction 2:30 Introducing DAPO 5:05 Clip-Higher 7:45 Dynamic Sampling 9:35 Token-Level Loss 11:13 Overlong Responses 12:23 Ablation Study 12:57 KL Divergence Removal

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Reading ML Papers

View skill →

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

Claude 3.7 Sonnet API | Build a Research Assistant

Claude 3.7 Sonnet API | Build a Research Assistant

I Built An Obsidian AI Research Assistant with Oz...

I Built An Obsidian AI Research Assistant with Oz...

Related AI Lessons

The ABCs of reading medical research and review papers these days

Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.

Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity

How to Set Up a Karpathy-Style Wiki for Your Research Field

Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively

The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap

Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research

Chapters (8)

Introduction

2:30 Introducing DAPO

5:05 Clip-Higher

7:45 Dynamic Sampling

9:35 Token-Level Loss

11:13 Overlong Responses

12:23 Ablation Study

12:57 KL Divergence Removal

Microsoft Research Forum | Season 2, Episode 4

Microsoft Research