Training video generation with Wan 2.2: Conan OโBrien and Will Smith character consistency
Skills:
Multimodal LLMs90%Fine-tuning LLMs80%Modern CV Models70%Prompt Systems Engineering60%Agent Foundations50%
Links + Notes ๐ https://www.oxen.ai/blog
Join Fine-Tune Fridays ๐ง https://oxen.ai/community
Discord ๐ฟ https://discord.com/invite/s3tBEn7Ptg
Use Oxen AI ๐ https://oxen.ai/
Oxen.ai offers one click fine-tuning or fine-tunes for you! Built on top of the worlds best data versioning tool, we offer tools to automate model evals, generate synthetic data, and effortlessly fine-tune models.
--
Chapters
0:00 Is it Wan like โAnneโ or โwonโ?
0:55 The Wan suite of models
1:10 Wan 2.1โs model architecture and research paper
3:50 Wan 2.2 video improvements from Wan 2.1
5:35 Our fine-tuning goal: Conan OโBrien interviewing Will Smith whoโs wearing a Denver Broncos shirt
7:30 Base model results
8:55 Wan 2.2โs model architecture
12:55 Fine-tuning: How we created our data
17:12 Fine-tuning: How we fine-tuned each Wan model
19:22 Question: How many images do you need?
20:24 Question: Did we use musubi-tuner?
20:40 Question: How to train camera panning
22:45 Fine-tuning: Comparing images as we fine-tune
29:37 Bringing our Will Smith fine-tuned model to Comfyui
42:00 Configuring Comfyui to run our fine-tuned model
47:28 Question: Does the image input format matter?
48:40 Loading our Conan OโBrien fine-tuned model on Comfyui
57:45 Question: How are the LoRAs loaded into the pipeline
58:40 Final Results: Conan interviewing Will Smith
Watch on YouTube โ
(saves to browser)
Sign in to unlock AI tutor explanation ยท โก30
More on: Multimodal LLMs
View skill โRelated AI Lessons
โก
โก
โก
โก
How to Write Better AI Image Prompts for Midjourney (With Examples That Actually Work)
Medium ยท ChatGPT
Image to Video AI: The Complete Workflow Playbook That Actually Produces Results
Medium ยท AI
Image Harvest v1.0.2: Internationalization, Free Pro Trial & Quality-of-Life Improvements
Dev.to ยท kyriewen
Pix2Pix: Image-to-Image Translation using Conditional GANs
Medium ยท Deep Learning
Chapters (19)
Is it Wan like โAnneโ or โwonโ?
0:55
The Wan suite of models
1:10
Wan 2.1โs model architecture and research paper
3:50
Wan 2.2 video improvements from Wan 2.1
5:35
Our fine-tuning goal: Conan OโBrien interviewing Will Smith whoโs wearing a Denv
7:30
Base model results
8:55
Wan 2.2โs model architecture
12:55
Fine-tuning: How we created our data
17:12
Fine-tuning: How we fine-tuned each Wan model
19:22
Question: How many images do you need?
20:24
Question: Did we use musubi-tuner?
20:40
Question: How to train camera panning
22:45
Fine-tuning: Comparing images as we fine-tune
29:37
Bringing our Will Smith fine-tuned model to Comfyui
42:00
Configuring Comfyui to run our fine-tuned model
47:28
Question: Does the image input format matter?
48:40
Loading our Conan OโBrien fine-tuned model on Comfyui
57:45
Question: How are the LoRAs loaded into the pipeline
58:40
Final Results: Conan interviewing Will Smith
๐
Tutor Explanation
DeepCamp AI