Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 6 - Model Training

Stanford Online · Beginner ·🎨 Image & Video AI ·4h ago

Skills: Modern CV Models80%ML Pipelines50%

Learn more details about this course: https://online.stanford.edu/courses/cme296-diffusion-and-large-vision-models To follow along with the course schedule and syllabus, visit: https://cme296.stanford.edu/syllabus/ Chapters: 00:00:00 Introduction 00:07:45 Training lifecycle overview 00:10:39 Loss parameterization 00:16:08 Timestep sampling 00:22:27 Logit normal distribution 00:25:44 Sampling shift for different resolutions 00:43:56 Representation alignment (REPA) 00:49:21 Pre-training 00:52:46 Continued training (CT) 00:54:02 Supervised finetuning (SFT) 00:54:39 Preference tuning 00:56:48 Reward feedback learning (ReFL) 01:00:16 Flow-GRPO 01:03:23 Diffusion-DPO 01:05:06 Prompt enhancement (PE) 01:10:00 DreamBooth, low-rank adaptation (LoRA) 01:16:35 Distillation 01:21:42 Progressive distillation (PD) 01:25:14 InstaFlow 01:30:18 Consistency models (CM) 01:34:23 Distribution matching distillation (DMD) 01:39:16 Adversarial diffusion distillation (ADD) For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education Afshine Amidi is an Adjunct Lecturer at Stanford University. Shervine Amidi is an Adjunct Lecturer at Stanford University. View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNdy8rt2rZ4T2xM0OjADnfu

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Modern CV Models

View skill →

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Nicholas Renotte

Deep Learning with PyTorch : Image Segmentation

Deep Learning with PyTorch : Image Segmentation

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

NVIDIA Developer

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Related AI Lessons

remove-ai-watermarks: una CLI borra SynthID, C2PA y el sparkle de Gemini

Learn how to remove AI watermarks from images using a Python-based CLI tool

Dev.to · lu1tr0n

I benchmarked 6 WASM image codecs in the browser. Here is what beats the server.

Benchmarking 6 WASM image codecs in the browser to find the best alternative to server-side compression

Dev.to · Convertilo

I Thought AI Image Tools Were Broken… Until I Realized My Prompts Had No Structure

Learn how to improve AI-generated images by structuring your prompts, a crucial step for reliable results

Medium · ChatGPT

I built a Stable Diffusion playground in 200 lines and zero API keys. Here's how.

Build a Stable Diffusion playground in under 200 lines of code without needing API keys, and explore AI image generation

Dev.to · Devanshu Biswas

Chapters (22)

Introduction

7:45 Training lifecycle overview

10:39 Loss parameterization

16:08 Timestep sampling

22:27 Logit normal distribution

25:44 Sampling shift for different resolutions

43:56 Representation alignment (REPA)

49:21 Pre-training

52:46 Continued training (CT)

54:02 Supervised finetuning (SFT)

54:39 Preference tuning

56:48 Reward feedback learning (ReFL)

1:00:16 Flow-GRPO

1:03:23 Diffusion-DPO

1:05:06 Prompt enhancement (PE)

1:10:00 DreamBooth, low-rank adaptation (LoRA)

1:16:35 Distillation

1:21:42 Progressive distillation (PD)

1:25:14 InstaFlow

1:30:18 Consistency models (CM)

1:34:23 Distribution matching distillation (DMD)

1:39:16 Adversarial diffusion distillation (ADD)

Introducing Gemini Omni

Google for Developers