Become a Model Whisperer : The "On-Policy" Secret to Better LLM results
Ever wonder why a perfectly crafted prompt or a carefully curated fine-tuning dataset falls flat? The problem isn't always your instructions - it's that you might be fighting against the model's fundamental nature.
This video dives deep into a critical lesson from Large Language Model Reinforcement Learning (RL): the principle of 'On-Policy' interaction. We break down why forcing an LLM to follow a script it wasn't trained on ('Off-Policy') can lead to poor performance, brittleness, and even hallucinations.
You'll learn a new mental model for working with LLMs, understanding them not as simp…
Watch on YouTube ↗
(saves to browser)
Chapters (5)
Introduction: Lessons from Reinforcement Learning
1:06
How LLMs are Trained (And Why It's a Problem)
4:05
The Inference Paradox: Untrained for Their Own Output
6:14
Reinforcement Learning: Teaching Models Consequences
8:10
Three Key Lessons from AI Resear
DeepCamp AI