How AI Vision Evolved | Merve Noyan

Hugging Face · Intermediate ·👁️ Computer Vision ·1mo ago
In this clip, Merve breaks down how AI vision evolved and explains why it matters in practice. Dense explanation of how vision evolved and why progress feels incremental now. 🤗 Listen to the full podcast episode 👉 Here: https://youtu.be/SjjCpeTjXIY Connect with Merve: - Merve on X — https://x.com/mervenoyann - Vision Language Models (O'Reilly) — https://www.oreilly.com/library/view/vision-language-models/9798341624030/ Chapters: - 00:00 How AI Vision Evolved - 00:12 Vision Transformers - 01:06 LLaVA - 01:38 IDEFICS - 02:06 CLIP + Projection Layer - 02:54 Interleaving - 05:42 Segment Anything Topics covered: - Vision Transformers - LLaVA - IDEFICS - CLIP + Projection Layer - Interleaving Sources mentioned: - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — https://arxiv.org/abs/2010.11929 - Visual Instruction Tuning project page — https://llava-vl.github.io/ - IDEFICS: an open reproduction of Flamingo — https://huggingface.co/blog/idefics - CLIP: Connecting text and images — https://arxiv.org/abs/2103.00020 - IDEFICS2 model documentation — https://huggingface.co/docs/transformers/model_doc/idefics2 - Segment Anything — https://arxiv.org/abs/2304.02643
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Demystifying CNNs: How Convolutional Filters and Max-Pooling Actually Work
Learn how Convolutional Neural Networks (CNNs) use convolutional filters and max-pooling to recognize images
Medium · Data Science
Your "Biometric Age Check" Isn't Verifying Identity — And Defense Lawyers Know It
Biometric age checks don't verify identity, a crucial distinction for developers in computer vision and biometrics
Dev.to AI
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons
Learn about MoCapAnything V2, an end-to-end motion capture system for arbitrary skeletons, and its applications in 3D animation
Medium · Machine Learning
How I Built a Perceptual Color Quantization Engine for LEGO Mosaics
Learn how to build a perceptual color quantization engine for LEGO mosaics and improve image conversion
Dev.to · BMBrick
Up next
How Transformers Finally Ate Vision – Isaac Robinson, Roboflow
AI Engineer
Watch →