Diffusion Language Models vs Autoregressive Language Models

Neural Breakdown with AVB · Advanced ·🧠 Large Language Models ·10mo ago
There is a new Google Gemini model which uses Diffusion to generate text instead of the more tried-and-true autoregressive token generation approach that GPT models use. In this video, we are breaking down what exactly text diffusion is. How these models are trained, and why this could be a big deal? We are also comparing them with standard GPT's across a variety of axes - underlying algorithms, ideologies, training methods, inference speeds, interpretability, and how controllable these two approaches are. We are also looking at the new LLaDA diffusion paper that also explores diffusion based …
Watch on YouTube ↗ (saves to browser)

Chapters (8)

Intro
2:24 Ideological differences between Autoregressive and Diffusion
5:50 Diffusion for Image Generation
6:55 Diffusion for Text - LLaDA paper
8:42 LLaDA Diffusion LLM training
9:53 LLaDA Diffusion LLM Inferencing
11:51 ARMs vs Diffusion - Speed, Scalability, Controllability
14:20 Why Diffusion LMs can be HUGE!
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)