How to finetune LLMs to THINK with Reinforcement Learning (GRPO from scratch!)

Name: How to finetune LLMs to THINK with Reinforcement Learning (GRPO from scratch!)
Uploaded: 2025-06-29T11:39:18+00:00
Channel: Neural Breakdown with AVB
Description: In this hands-on tutorial video, I am explaining Reasoning LLMs and SLMs and writing the Group Relative Policy Optimization (GRPO) algorithm from scratc...

Neural Breakdown with AVB · Advanced ·🧠 Large Language Models ·9mo ago

In this hands-on tutorial video, I am explaining Reasoning LLMs and SLMs and writing the Group Relative Policy Optimization (GRPO) algorithm from scratch in Pytorch. This tutorial is specially directed towards Small Language Models (SLMs) but the same principles apply for Large Language Models (LLMs) too. Plus, we are going through the policy gradient equation, explaining RLVR (reinforcement learning with verifiable rewards), and visualizing exactly how reasoning models work! All materials with this video (as well as all other videos in the channel) have been shared on my Patreon page. https…

Watch on YouTube ↗ (saves to browser)