Brian Chao - Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation

Cohere · Beginner ·🎨 Image & Video AI ·1w ago

Skills: Image Generation Basics85%

00:00 Intro and Setup 01:02 Why Efficiency Matters 02:48 Two Speedup Paradigms 04:38 Human Vision and Foveation 06:34 Foveated Diffusion Overview 07:45 Mask and Tokenization 09:54 Generation Pipeline 11:09 Naive Artifacts and RoPE Fix 14:39 Training with LoRA Finetune 15:55 Image and Video Results 17:43 Designing Better Masks 21:10 User Study Findings 22:24 Future Directions and Apps 25:27 Website Demo Walkthrough 29:48 Q&A on Speed and Distillation 31:34 Q&A on Token Length and Training 37:45 Closing Remarks Diffusion and flow matching models have unlocked unprecedented capabilities for creative content creation, such as interactive image and streaming video generation. The growing demand for higher resolutions, frame rates, and context lengths, however, makes efficient generation increasingly challenging, as computational complexity grows quadratically with the number of generated tokens. Their work seeks to optimize the efficiency of the generation process in settings where the user's gaze location is known or can be estimated, for example, by using eye tracking. In these settings, we leverage the eccentricity-dependent acuity of human vision: while a user perceives very high-resolution visual information in a small region around their gaze location (the foveal region), the ability to resolve detail quickly degrades in the periphery of the visual field. Their approach starts with a mask modeling the foveated resolution to allocate tokens non-uniformly, assigning higher token density to foveal regions and lower density to peripheral regions. An image or video is generated in a mixed-resolution token setting, yielding results perceptually indistinguishable from full-resolution generation, while drastically reducing the token count and generation time. To this end, we develop a principled mechanism for constructing mixed-resolution tokens directly from high-resolution data, allowing a foveated diffusion model to be post-trained from an existing base model while mai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Image Generation Basics

View skill →

Create and Master 3D Assets in Blender from Scratch

Create and Master 3D Assets in Blender from Scratch

ControlNet and Stable Diffusion Local Step by Step Installation Guide

ControlNet and Stable Diffusion Local Step by Step Installation Guide

Onur Yuce Gun, PhD

Qwen 2.5 AI: Complete Beginner Tutorial [100% Free and OpenSource]

Qwen 2.5 AI: Complete Beginner Tutorial [100% Free and OpenSource]

FREE Video AI - Deforum Local Install - Super Easy!

FREE Video AI - Deforum Local Install - Super Easy!

GEN-3 gives live to Midjourney images

GEN-3 gives live to Midjourney images

Baby Alpaca · Sora Showcase

Baby Alpaca · Sora Showcase

Related AI Lessons

Image Captioning API: Auto-Generate Alt Text and Descriptions

Learn to auto-generate alt text and descriptions for images using an API and why it matters for accessibility and SEO

Dev.to · Om Prakash

Long video generation blog: Six Approaches, One Decision

Learn six approaches to generate long videos and make an informed decision on which method to use

Dev.to · Atlas Cloud

Optimasi Kompresi Citra Tanpa Kehilangan Detail (Lossless) pada Data High-Resolution

Learn to optimize lossless image compression for high-resolution data without losing details

Medium · Data Science

The Complete Guide to Programmatic Image Generation

Generate images programmatically at scale using Puppeteer, layer-based APIs, and other methods

Dev.to · Iteration Layer

Chapters (17)

Intro and Setup

1:02 Why Efficiency Matters

2:48 Two Speedup Paradigms

4:38 Human Vision and Foveation

6:34 Foveated Diffusion Overview

7:45 Mask and Tokenization

9:54 Generation Pipeline

11:09 Naive Artifacts and RoPE Fix

14:39 Training with LoRA Finetune

15:55 Image and Video Results

17:43 Designing Better Masks

21:10 User Study Findings

22:24 Future Directions and Apps

25:27 Website Demo Walkthrough

29:48 Q&A on Speed and Distillation

31:34 Q&A on Token Length and Training

37:45 Closing Remarks

How to Create Monetizeable Viral AI Short Videos 4K 🎬 | AI Short Video Kaise Banaye 💹

Learning and Earning with Anjum Iqbal