Transformers & Diffusion LLMs: What's the connection?

Name: Transformers & Diffusion LLMs: What's the connection?
Uploaded: 2025-11-06T13:01:04+00:00
Channel: Julia Turc
Description: Diffusion-based LLMs are a new paradigm for text generation; they progressively refine gibberish into a coherent response. But what's their connection t...

Julia Turc · Advanced ·🧠 Large Language Models ·4mo ago

Diffusion-based LLMs are a new paradigm for text generation; they progressively refine gibberish into a coherent response. But what's their connection to Transformers? In this video, I unpack how Transformers evolved from a simple machine translation tool into the universal backbone of modern AI — powering everything from auto-regressive models like GPT to diffusion-based models like LLaDA. We’ll go step-by-step through: • How the Transformer architecture actually works (encoder, decoder, attention) • Why attention replaced recurrence in natural language processing • How GPT training differs…

Watch on YouTube ↗ (saves to browser)

Chapters (8)

Intro

1:25 The Transformer origin story

3:52 The alignment problem & attention

6:26 The architecture: encoder vs decoder

11:25 Auto-regressive LLMs & GPT

16:09 Text classification & BERT

18:51 Diffusion LLMs & LLaDA

24:17 Outro

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)