Why LLMs Shouldn’t Follow Instructions (But Do)

ML Guy · Advanced ·🧠 Large Language Models ·2mo ago
A pretrained language model can predict text, but it doesn’t know how to help you. In this video, we break down how raw LLMs are transformed into instruction-following assistants like ChatGPT. You’ll learn how fine-tuning, human preference data, and reinforcement learning from human feedback (RLHF) reshape a model’s behavior — without changing its architecture. We cover: Why next-token prediction alone is not enough Supervised fine-tuning with instruction–response pairs How human rankings become a reward model What RLHF actually optimizes (and what it doesn’t) How safety, refusals, and “hel…
Watch on YouTube ↗ (saves to browser)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)