Reinforcement Pre-Training (RPT) By Microsoft Explained
In this video we dive into a recent Microsoft's paper titled Reinforcement Pre-Training (RPT).
The paper introduces a mechanism to scale reinforcement learning (RL) training data to that used for the pre-training stage of large language models (LLMs).
While in standard LLM pre-training, we use a next-token prediction (NTP) objective, RPT introduces next-token reasoning, transforming each token prediction into a reasoning task.
Next-token reasoning is then used for RL training with GRPO.
We covered GRPO in depth here:
GRPO full review -https://aipapersacademy.com/deepseekmath-grpo/
GRPO vi…
Watch on YouTube ↗
(saves to browser)
Chapters (6)
Introduction
0:50
LLM Training & RPT
3:52
Next-Token Reasoning
4:28
RPT Training
5:27
Scale Up RL Data
5:55
RPT Results
DeepCamp AI