Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Stanford Online · Beginner ·🎨 Image & Video AI ·2d ago

Skills: CV Basics70%

Learn more details about this course: https://online.stanford.edu/courses/cme296-diffusion-and-large-vision-models To follow along with the course schedule and syllabus, visit: https://cme296.stanford.edu/syllabus/ Chapters: 00:00:00 Introduction 00:05:26 Objective 00:09:58 Convolutions, filters 00:14:44 Receptive field 00:17:14 Pooling 00:19:06 U-Net 00:27:52 Timestep representation 00:30:31 Class label representation 00:33:21 Timeline of U-Net models 00:35:43 Diffusion Transformer (DiT) 00:48:08 Adaptive layer normalization (adaLN) 01:02:30 DiT end-to-end example 01:12:57 Multimodal DiT (MM-DiT) 01:23:33 Qwen-Image, Z-Image, FLUX.1 01:24:27 Timeline of DiT models 01:25:25 Absolute position embeddings 01:38:48 Rotary position embeddings (RoPE) 01:39:59 2D RoPE variants For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education Afshine Amidi is an Adjunct Lecturer at Stanford University. Shervine Amidi is an Adjunct Lecturer at Stanford University. View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNdy8rt2rZ4T2xM0OjADnfu

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

Related AI Lessons

How to Write Better AI Image Prompts for Midjourney (With Examples That Actually Work)

Learn to write effective AI image prompts for Midjourney with actionable examples and techniques

Medium · ChatGPT

Image to Video AI: The Complete Workflow Playbook That Actually Produces Results

Learn a step-by-step workflow for image-to-video AI that produces results, from preparation to delivery

Image Harvest v1.0.2: Internationalization, Free Pro Trial & Quality-of-Life Improvements

Learn about Image Harvest v1.0.2, a Chrome extension with internationalization, free pro trial, and quality-of-life improvements, and how to utilize it for privacy-first image extraction

Dev.to · kyriewen

Pix2Pix: Image-to-Image Translation using Conditional GANs

Learn how to use Pix2Pix for image-to-image translation with conditional GANs, a powerful technique for generating realistic images

Medium · Deep Learning

Chapters (18)

Introduction

5:26 Objective

9:58 Convolutions, filters

14:44 Receptive field

17:14 Pooling

19:06 U-Net

27:52 Timestep representation

30:31 Class label representation

33:21 Timeline of U-Net models

35:43 Diffusion Transformer (DiT)

48:08 Adaptive layer normalization (adaLN)

1:02:30 DiT end-to-end example

1:12:57 Multimodal DiT (MM-DiT)

1:23:33 Qwen-Image, Z-Image, FLUX.1

1:24:27 Timeline of DiT models

1:25:25 Absolute position embeddings

1:38:48 Rotary position embeddings (RoPE)

1:39:59 2D RoPE variants

Top AI Video Editing Tools You Should Try | Must-Try AI Video Editing Tools | #Shorts | #Simplilearn