Training video generation with Wan 2.2: Conan O’Brien and Will Smith character consistency

Oxen · Advanced ·🎨 Image & Video AI ·8mo ago

Skills: Multimodal LLMs90%Fine-tuning LLMs80%Modern CV Models70%Prompt Systems Engineering60%Agent Foundations50%

Links + Notes 📝 https://www.oxen.ai/blog Join Fine-Tune Fridays 🔧 https://oxen.ai/community Discord 🗿 https://discord.com/invite/s3tBEn7Ptg Use Oxen AI 🐂 https://oxen.ai/ Oxen.ai offers one click fine-tuning or fine-tunes for you! Built on top of the worlds best data versioning tool, we offer tools to automate model evals, generate synthetic data, and effortlessly fine-tune models. -- Chapters 0:00 Is it Wan like “Anne” or “won”? 0:55 The Wan suite of models 1:10 Wan 2.1’s model architecture and research paper 3:50 Wan 2.2 video improvements from Wan 2.1 5:35 Our fine-tuning goal: Conan O’Brien interviewing Will Smith who’s wearing a Denver Broncos shirt 7:30 Base model results 8:55 Wan 2.2’s model architecture 12:55 Fine-tuning: How we created our data 17:12 Fine-tuning: How we fine-tuned each Wan model 19:22 Question: How many images do you need? 20:24 Question: Did we use musubi-tuner? 20:40 Question: How to train camera panning 22:45 Fine-tuning: Comparing images as we fine-tune 29:37 Bringing our Will Smith fine-tuned model to Comfyui 42:00 Configuring Comfyui to run our fine-tuned model 47:28 Question: Does the image input format matter? 48:40 Loading our Conan O’Brien fine-tuned model on Comfyui 57:45 Question: How are the LoRAs loaded into the pipeline 58:40 Final Results: Conan interviewing Will Smith

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Multimodal LLMs

View skill →

INSTALL NEW UNCENSORED FaceGen Ai WebUI LOCALLY in 1 CLICK!

INSTALL NEW UNCENSORED FaceGen Ai WebUI LOCALLY in 1 CLICK!

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

AI Tool Journey

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Building Multimodal Search and RAG

Building Multimodal Search and RAG

Midjourney Trick: Consistent Character in Different Images

Midjourney Trick: Consistent Character in Different Images

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Related AI Lessons

How to Write Better AI Image Prompts for Midjourney (With Examples That Actually Work)

Learn to write effective AI image prompts for Midjourney with actionable examples and techniques

Medium · ChatGPT

Image to Video AI: The Complete Workflow Playbook That Actually Produces Results

Learn a step-by-step workflow for image-to-video AI that produces results, from preparation to delivery

Image Harvest v1.0.2: Internationalization, Free Pro Trial & Quality-of-Life Improvements

Learn about Image Harvest v1.0.2, a Chrome extension with internationalization, free pro trial, and quality-of-life improvements, and how to utilize it for privacy-first image extraction

Dev.to · kyriewen

Pix2Pix: Image-to-Image Translation using Conditional GANs

Learn how to use Pix2Pix for image-to-image translation with conditional GANs, a powerful technique for generating realistic images

Medium · Deep Learning

Chapters (19)

Is it Wan like “Anne” or “won”?

0:55 The Wan suite of models

1:10 Wan 2.1’s model architecture and research paper

3:50 Wan 2.2 video improvements from Wan 2.1

5:35 Our fine-tuning goal: Conan O’Brien interviewing Will Smith who’s wearing a Denv

7:30 Base model results

8:55 Wan 2.2’s model architecture

12:55 Fine-tuning: How we created our data

17:12 Fine-tuning: How we fine-tuned each Wan model

19:22 Question: How many images do you need?

20:24 Question: Did we use musubi-tuner?

20:40 Question: How to train camera panning

22:45 Fine-tuning: Comparing images as we fine-tune

29:37 Bringing our Will Smith fine-tuned model to Comfyui

42:00 Configuring Comfyui to run our fine-tuned model

47:28 Question: Does the image input format matter?

48:40 Loading our Conan O’Brien fine-tuned model on Comfyui

57:45 Question: How are the LoRAs loaded into the pipeline

58:40 Final Results: Conan interviewing Will Smith

Krea 2 makes Diffusion FUN Again!