Reasoning Models and DeepSeek R1 from scratch
How do reasoning models like DeepSeek R1 work? A short cartoon that explains reasoning models.
0:05 - large language models
0:25 - math problems
0:55 - superhuman performance
1:10 - AlphaZero
2:52 - Math as a game
3:16 - DeepSeek R1-Zero
3:38 - GRPO
4:34 - Chain-of-Thought Prompting (CoT)
4:51 - think-answer template
7:21 - DeepSeek R1
7:54 - GPQA
8:23 - towards superhuman performance
Watch on YouTube ↗
(saves to browser)
Chapters (12)
0:05
large language models
0:25
math problems
0:55
superhuman performance
1:10
AlphaZero
2:52
Math as a game
3:16
DeepSeek R1-Zero
3:38
GRPO
4:34
Chain-of-Thought Prompting (CoT)
4:51
think-answer template
7:21
DeepSeek R1
7:54
GPQA
8:23
towards superhuman performance
DeepCamp AI