Fine-Tune Vision Language Models (VLMs) Like a Pro: Live Demo + Benchmarks | Predibase Webinar

Name: Fine-Tune Vision Language Models (VLMs) Like a Pro: Live Demo + Benchmarks | Predibase Webinar
Uploaded: 2025-08-05T04:44:06+00:00
Channel: Predibase by Rubrik
Description: Multimodal AI is no longer optional—it's the future. In this in-depth webinar, the ML experts at Predibase break down everything you need to know about ...

Predibase by Rubrik · Advanced ·🧠 Large Language Models ·7mo ago

Multimodal AI is no longer optional—it's the future. In this in-depth webinar, the ML experts at Predibase break down everything you need to know about Vision Language Models (VLMs)—from architectures and use cases to training, inference, and real-world performance. ✅ Learn why fine-tuning open-source VLMs often beats closed models like GPT-4V ✅ See a live demo of fine-tuning a Pokémon card captioning model ✅ Get benchmark results showing performance boosts over GPT-4 ✅ Discover real-world use cases: healthcare, retail, drive-thrus, content moderation & more ✅ Understand the challenges in tra…

Watch on YouTube ↗ (saves to browser)

Chapters (19)

Intro & Speakers

1:45 Why Multimodal AI Matters

4:20 Real-World Multimodal Use Cases (Amazon, Duolingo, Converse Now)

7:10 Developer Interest in Open-Source VLMs

9:40 What Are Vision Language Models (VLMs)?

11:05 VLM Architecture: Encoder, Projector, Decoder

14:00 Popular Components: CLIP, LLaMA, Qwen

16:00 Prompting & Use Cases (Image QA, Captioning, Video Analysis)

19:15 Strengths & Limitations of VLMs

21:00 VLMs vs Humans: Reading Handwritten Text

23:00 Biases & Benchmark Failures in VLMs

25:30 Open vs Closed Source: Who Wins in Vision?

28:00 Fine-Tuning for Accuracy in Specialized Tasks

31:20 Impact of Image Resolution on Token Count, Latency, Accuracy

34:00 Latency vs Resolution Trade-offs

36:00 VLM Fine-Tuning = Better Accuracy, Lower Cost

38:00 Challenges in Training & Serving VLMs

40:45 How Predibase Simplifies VLM Training & Inference

43:00 Pokémon Card

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)