Text diffusion: A new paradigm for LLMs

Julia Turc · Beginner ·🧠 Large Language Models ·9mo ago

Skills: LLM Foundations80%

Key Takeaways

This video introduces Text Diffusion as a new paradigm for Large Language Models (LLMs) and explains its differences from auto-regressive models

Original Description

Text diffusion is a new paradigm for LLMs. As opposed to mainstream auto-regressive models like GPT, Claude or Gemini (which predict one token at a time), diffusion-based LLMs draft an entire response and refine it progressively. This leads to 10x faster inference. Models like Gemini Diffusion, Mercury Coder from Inception Labs and Seed Diffusion from ByteDance are already competitive on coding benchmarks. Inspired by physical diffusion, such models make use of Markov chains to model data generation as a particle hopping through discrete states. We'll walk through the D3PM and LLaDA papers as case studies. 📖 Papers: Full reading list: https://www.patreon.com/posts/papers-diffusion-140452266 D3PM: https://arxiv.org/abs/2107.03006 LLaDA: https://arxiv.org/abs/2502.09992 Scaling up Masked Diffusion Models on Text: https://arxiv.org/abs/2410.18514 ▶️ The physics behind diffusion models: https://youtu.be/R0uMcXsfo2o?si=OqdGg4TPefSNTK3t 00:00 Intro 01:04 Auto-regressive vs diffusion LLMs 02:06 Why bother with diffusion for text? 06:30 The probability landscape 07:57 Diffusion in latent embedding space 11:00 Diffusion in token embedding space 12:13 Diffusion in text token space 13:49 Markov chains 16:46 Paper study: D3PM 19:42 Paper study: LLaDA 22:30 Evaluation

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

Open Assistant Live Coding (Open-Source ChatGPT Replication)

Open Assistant Live Coding (Open-Source ChatGPT Replication)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Related Reads

I Tried to Run “GLM” Locally Nobody Warned Me it’s Actually Six Different Hardware Requirements…

Running large language models like GLM locally requires significant hardware resources, exceeding typical consumer-grade computer capabilities

I Tried to Run “GLM” Locally Nobody Warned Me it’s Actually Six Different Hardware Requirements…

Running large language models like GLM locally requires significant hardware resources, beyond typical consumer-grade computers

Medium · ChatGPT

RAG in Laravel: Embeddings and pgvector for a Knowledge-Base Bot

Learn to integrate RAG in Laravel using embeddings and pgvector for a knowledge-base bot, improving its ability to answer questions about your specific data

The AI That Can Re-Write Its Own Brain: Why Inkling is the New Frontier for Open Weights

Learn about Inkling, the AI that can re-write its own brain, and why it's a game-changer for open weights

Chapters (11)

Intro

1:04 Auto-regressive vs diffusion LLMs

2:06 Why bother with diffusion for text?

6:30 The probability landscape

7:57 Diffusion in latent embedding space

11:00 Diffusion in token embedding space

12:13 Diffusion in text token space

13:49 Markov chains

16:46 Paper study: D3PM

19:42 Paper study: LLaDA

22:30 Evaluation

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)