How AI Vision Evolved | Merve Noyan
In this clip, Merve breaks down how AI vision evolved and explains why it matters in practice.
Dense explanation of how vision evolved and why progress feels incremental now.
🤗 Listen to the full podcast episode
👉 Here: https://youtu.be/SjjCpeTjXIY
Connect with Merve:
- Merve on X — https://x.com/mervenoyann
- Vision Language Models (O'Reilly) — https://www.oreilly.com/library/view/vision-language-models/9798341624030/
Chapters:
- 00:00 How AI Vision Evolved
- 00:12 Vision Transformers
- 01:06 LLaVA
- 01:38 IDEFICS
- 02:06 CLIP + Projection Layer
- 02:54 Interleaving
- 05:42 Segment Anything
Topics covered:
- Vision Transformers
- LLaVA
- IDEFICS
- CLIP + Projection Layer
- Interleaving
Sources mentioned:
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — https://arxiv.org/abs/2010.11929
- Visual Instruction Tuning project page — https://llava-vl.github.io/
- IDEFICS: an open reproduction of Flamingo — https://huggingface.co/blog/idefics
- CLIP: Connecting text and images — https://arxiv.org/abs/2103.00020
- IDEFICS2 model documentation — https://huggingface.co/docs/transformers/model_doc/idefics2
- Segment Anything — https://arxiv.org/abs/2304.02643
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: CV Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Demystifying CNNs: How Convolutional Filters and Max-Pooling Actually Work
Medium · Data Science
Your "Biometric Age Check" Isn't Verifying Identity — And Defense Lawyers Know It
Dev.to AI
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
Medium · Machine Learning
How I Built a Perceptual Color Quantization Engine for LEGO Mosaics
Dev.to · BMBrick
🎓
Tutor Explanation
DeepCamp AI