Transformers & Diffusion LLMs: What's the connection?

Julia Turc · Advanced ·🧠 Large Language Models ·4mo ago
Diffusion-based LLMs are a new paradigm for text generation; they progressively refine gibberish into a coherent response. But what's their connection to Transformers? In this video, I unpack how Transformers evolved from a simple machine translation tool into the universal backbone of modern AI — powering everything from auto-regressive models like GPT to diffusion-based models like LLaDA. We’ll go step-by-step through: • How the Transformer architecture actually works (encoder, decoder, attention) • Why attention replaced recurrence in natural language processing • How GPT training differs…
Watch on YouTube ↗ (saves to browser)

Chapters (8)

Intro
1:25 The Transformer origin story
3:52 The alignment problem & attention
6:26 The architecture: encoder vs decoder
11:25 Auto-regressive LLMs & GPT
16:09 Text classification & BERT
18:51 Diffusion LLMs & LLaDA
24:17 Outro
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)