The Race to Production-Grade Diffusion LLMs [Stefano Ermon] - 764

Name: The Race to Production-Grade Diffusion LLMs [Stefano Ermon] - 764
Uploaded: 2026-03-26T22:20:26Z
Channel: The TWIML AI Podcast with Sam Charrington
Description: Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig ...

The TWIML AI Podcast with Sam Charrington · Beginner ·🧠 Large Language Models ·3d ago

Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregressive LLMs. Stefano introduces Mercury 2, a commercial-scale diffusion LLM that can generate multiple tokens simultaneously and achieve inference speeds 5-10x faster than small frontier models, paving …

Watch on YouTube ↗ (saves to browser)

Chapters (6)

Introduction

4:11 Origins of diffusion models

7:24 From image diffusion to text diffusion

8:07 Discrete data challenges

9:54 Limitations of embeddings

11:10 Diffusion versus autoregressive mode

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)