Diffusion model (DDPM) PART 2 - Coding from scratch

Name: Diffusion model (DDPM) PART 2 - Coding from scratch
Uploaded: 2026-05-02T05:30:00Z
Channel: Vizuara
Description: Diffusion models look intimidating mathematically on first contact because they come with long equations, UNets, and many Greek letters if you look at t...

Vizuara · Beginner ·🔢 Mathematical Foundations ·1mo ago

Skills: Maths for ML90%Generative Models80%

Diffusion models look intimidating mathematically on first contact because they come with long equations, UNets, and many Greek letters if you look at the paper. However, at its core, a DDPM is built on an intuitive idea, and once that idea clicks, the entire model starts making sense. 1) The starting point of a DDPM is not generation but destruction. We take a real image from the dataset and we intentionally add small amounts of Gaussian noise to it, in a very controlled and mathematical way. This process is called "forward diffusion". At every step, the image loses a tiny bit of structure and gains a tiny bit of randomness, and if we keep doing this long enough, the image eventually becomes pure noise. The important thing here is that this corruption process is fully known to us, we choose exactly how much noise to add at each step, and nothing is learned in this phase. 2) Once we understand how to destroy images in a controlled way, the real question becomes interesting: can we learn how to reverse this process? This is where the neural network comes in. Instead of asking the model to directly produce a clean image, which is a very hard problem, we ask it something much simpler and more well-defined. Given a noisy image and the timestep, can you tell me what noise is present in this image? That is all the UNet is trained to do. It does not generate pixels, it does not hallucinate images, it only predicts noise. 3) The timestep itself matters a lot, because denoising an almost clean image and denoising near-pure noise are completely different tasks. So we encode time using sinusoidal features and feed it into the network, which allows the same model to behave differently at different noise levels. In simple terms, the model knows how noisy the image is and how aggressive the denoising should be at that step. 4) During generation, we start from pure random noise and then repeatedly apply a mathematically derived denoising rule. At each step, the model predi

Watch on YouTube ↗ (saves to browser)