Transformers & Diffusion LLMs: What's the connection?
Diffusion-based LLMs are a new paradigm for text generation; they progressively refine gibberish into a coherent response. But what's their connection to Transformers?
In this video, I unpack how Transformers evolved from a simple machine translation tool into the universal backbone of modern AI — powering everything from auto-regressive models like GPT to diffusion-based models like LLaDA.
We’ll go step-by-step through:
• How the Transformer architecture actually works (encoder, decoder, attention)
• Why attention replaced recurrence in natural language processing
• How GPT training differs…
Watch on YouTube ↗
(saves to browser)
Chapters (8)
Intro
1:25
The Transformer origin story
3:52
The alignment problem & attention
6:26
The architecture: encoder vs decoder
11:25
Auto-regressive LLMs & GPT
16:09
Text classification & BERT
18:51
Diffusion LLMs & LLaDA
24:17
Outro
DeepCamp AI