Become a Model Whisperer : The "On-Policy" Secret to Better LLM results

Name: Become a Model Whisperer : The "On-Policy" Secret to Better LLM results
Uploaded: 2025-12-09T14:45:03+00:00
Channel: Martin Andrews
Description: Ever wonder why a perfectly crafted prompt or a carefully curated fine-tuning dataset falls flat? The problem isn't always your instructions - it's that...

Martin Andrews · Beginner ·🧠 Large Language Models ·3mo ago

Ever wonder why a perfectly crafted prompt or a carefully curated fine-tuning dataset falls flat? The problem isn't always your instructions - it's that you might be fighting against the model's fundamental nature. This video dives deep into a critical lesson from Large Language Model Reinforcement Learning (RL): the principle of 'On-Policy' interaction. We break down why forcing an LLM to follow a script it wasn't trained on ('Off-Policy') can lead to poor performance, brittleness, and even hallucinations. You'll learn a new mental model for working with LLMs, understanding them not as simp…

Watch on YouTube ↗ (saves to browser)

Chapters (5)

Introduction: Lessons from Reinforcement Learning

1:06 How LLMs are Trained (And Why It's a Problem)

4:05 The Inference Paradox: Untrained for Their Own Output

6:14 Reinforcement Learning: Teaching Models Consequences

8:10 Three Key Lessons from AI Resear

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)