How did diffusion LLMs get so fast?
This video discusses techniques for making diffusion LLMs faster, including:
• Self-Distillation Through Time
• Curriculum Learning
• Confidence scores for unmasking
• Guided diffusion (FlashDLM)
• Approximate KV caching (dLLM-Cache, dKV-Cache)
• Block diffusion
🔗 Inception:
Home: https://www.inceptionlabs.ai/
API: https://docs.inceptionlabs.ai/
X: https://x.com/_inception_ai
Stefano Ermon, cofounder & CEO: https://cs.stanford.edu/~ermon/
📚 Papers
Self-Distillation Through Time: https://arxiv.org/abs/2410.21035
FlashDLM (Guided Diffusion): https://arxiv.org/abs/2505.21467
dLLM-Cache: https…
Watch on YouTube ↗
(saves to browser)
Chapters (11)
Intro
2:00
Auto-regressive vs diffusion LLMs
3:06
Reducing refinement steps
5:54
Self-Distillation Through Time
7:15
Curriculum learning
8:17
Speeding up sampling
9:40
Confidence scores
11:35
Guided diffusion (FlashDLM)
13:24
Approximate KV caching (dLLM-Cache, dKV-Cache)
19:03
Block diffusion
21:19
Where to find diffusion models
DeepCamp AI