Diffusion Language Models vs Autoregressive Language Models
There is a new Google Gemini model which uses Diffusion to generate text instead of the more tried-and-true autoregressive token generation approach that GPT models use. In this video, we are breaking down what exactly text diffusion is. How these models are trained, and why this could be a big deal? We are also comparing them with standard GPT's across a variety of axes - underlying algorithms, ideologies, training methods, inference speeds, interpretability, and how controllable these two approaches are. We are also looking at the new LLaDA diffusion paper that also explores diffusion based …
Watch on YouTube ↗
(saves to browser)
Chapters (8)
Intro
2:24
Ideological differences between Autoregressive and Diffusion
5:50
Diffusion for Image Generation
6:55
Diffusion for Text - LLaDA paper
8:42
LLaDA Diffusion LLM training
9:53
LLaDA Diffusion LLM Inferencing
11:51
ARMs vs Diffusion - Speed, Scalability, Controllability
14:20
Why Diffusion LMs can be HUGE!
DeepCamp AI