Reinforcement Pre-Training (RPT) By Microsoft Explained

AI Papers Academy · Beginner ·🧠 Large Language Models ·9mo ago
In this video we dive into a recent Microsoft's paper titled Reinforcement Pre-Training (RPT). The paper introduces a mechanism to scale reinforcement learning (RL) training data to that used for the pre-training stage of large language models (LLMs). While in standard LLM pre-training, we use a next-token prediction (NTP) objective, RPT introduces next-token reasoning, transforming each token prediction into a reasoning task. Next-token reasoning is then used for RL training with GRPO. We covered GRPO in depth here: GRPO full review -https://aipapersacademy.com/deepseekmath-grpo/ GRPO vi…
Watch on YouTube ↗ (saves to browser)

Chapters (6)

Introduction
0:50 LLM Training & RPT
3:52 Next-Token Reasoning
4:28 RPT Training
5:27 Scale Up RL Data
5:55 RPT Results
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)