Become a Model Whisperer : The "On-Policy" Secret to Better LLM results

Martin Andrews · Beginner ·🧠 Large Language Models ·3mo ago
Ever wonder why a perfectly crafted prompt or a carefully curated fine-tuning dataset falls flat? The problem isn't always your instructions - it's that you might be fighting against the model's fundamental nature. This video dives deep into a critical lesson from Large Language Model Reinforcement Learning (RL): the principle of 'On-Policy' interaction. We break down why forcing an LLM to follow a script it wasn't trained on ('Off-Policy') can lead to poor performance, brittleness, and even hallucinations. You'll learn a new mental model for working with LLMs, understanding them not as simp…
Watch on YouTube ↗ (saves to browser)

Chapters (5)

Introduction: Lessons from Reinforcement Learning
1:06 How LLMs are Trained (And Why It's a Problem)
4:05 The Inference Paradox: Untrained for Their Own Output
6:14 Reinforcement Learning: Teaching Models Consequences
8:10 Three Key Lessons from AI Resear
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)