How did diffusion LLMs get so fast?

Julia Turc · Beginner ·🧠 Large Language Models ·1mo ago
This video discusses techniques for making diffusion LLMs faster, including: • Self-Distillation Through Time • Curriculum Learning • Confidence scores for unmasking • Guided diffusion (FlashDLM) • Approximate KV caching (dLLM-Cache, dKV-Cache) • Block diffusion 🔗 Inception: Home: https://www.inceptionlabs.ai/ API: https://docs.inceptionlabs.ai/ X: https://x.com/_inception_ai Stefano Ermon, cofounder & CEO: https://cs.stanford.edu/~ermon/ 📚 Papers Self-Distillation Through Time: https://arxiv.org/abs/2410.21035 FlashDLM (Guided Diffusion): https://arxiv.org/abs/2505.21467 dLLM-Cache: https…
Watch on YouTube ↗ (saves to browser)

Chapters (11)

Intro
2:00 Auto-regressive vs diffusion LLMs
3:06 Reducing refinement steps
5:54 Self-Distillation Through Time
7:15 Curriculum learning
8:17 Speeding up sampling
9:40 Confidence scores
11:35 Guided diffusion (FlashDLM)
13:24 Approximate KV caching (dLLM-Cache, dKV-Cache)
19:03 Block diffusion
21:19 Where to find diffusion models
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)