The Future of Vision in ML | Merve Noyan | HF Podcast #1

Hugging Face · Beginner ·👁️ Computer Vision ·1mo ago

Skills: Modern CV Models80%

In this episode, we sit down with Merve to talk about where vision AI is heading: from early computer vision systems to modern multimodal models, world models, robotics, and open source AI. We discuss LLaVA, IDEFICS, Vision Transformers, CNNs, JEPA, V-JEPA, Genie 3, OpenClaw, IMCP, PaliGemma, ColPali, ColQwen, and why Hugging Face has become such a central part of the open ecosystem. ## Connect with Merve Noyan, the open-sourceress 👇 - X (twitter): https://x.com/mervenoyann - LinkedIn: https://www.linkedin.com/in/merve-noyan-28b1a113a/ - Personal Site: https://merveenoyan.github.io/me/ - GitHub: https://github.com/merveenoyan ## Chapters 00:00 Intro: vision, Hugging Face, and the future of AI 00:31 Why vision feels different now 03:58 LLaVA, IDEFICS, and multimodal training 08:56 CNNs, ViTs, and older vision architectures 15:46 How vision models could reach everyday users 16:50 World models, JEPA, V-JEPA, Genie 3, and robotics 25:44 OpenClaw, IMCP, and agent safety 28:01 Small vision models, fine-tuning, and getting started 34:39 Why Hugging Face matters in open source AI 42:49 PaliGemma, ColPali, ColQwen, and vision retrieval 47:26 Before Hugging Face: how models were shared 49:48 Mentors, culture, and closing thoughts If you enjoyed the episode, subscribe for more conversations about open models, multimodal systems, and the future of AI.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Modern CV Models

View skill →

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

YOLOE: Real-time Zero-shot Object Detection | Visual Prompting | Live Coding & Q&A (Mar 14th)

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

RF-DETR: How to Train SOTA for Object Detection on a Custom Dataset | Step-by-step guide

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Build a Deep Facial Recognition App // Part 8 - Kivy Computer Vision App with OpenCV and Tensorflow

Nicholas Renotte

Deep Learning with PyTorch : Image Segmentation

Deep Learning with PyTorch : Image Segmentation

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

Mesh Optimization Using FlexiCubes with NVIDIA Kaolin Library v0.15.0

NVIDIA Developer

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Related AI Lessons

Demystifying CNNs: How Convolutional Filters and Max-Pooling Actually Work

Learn how Convolutional Neural Networks (CNNs) use convolutional filters and max-pooling to recognize images

Medium · Data Science

Your "Biometric Age Check" Isn't Verifying Identity — And Defense Lawyers Know It

Biometric age checks don't verify identity, a crucial distinction for developers in computer vision and biometrics

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

Learn about MoCapAnything V2, an end-to-end motion capture system for arbitrary skeletons, and its applications in 3D animation

Medium · Machine Learning

How I Built a Perceptual Color Quantization Engine for LEGO Mosaics

Learn how to build a perceptual color quantization engine for LEGO mosaics and improve image conversion

Dev.to · BMBrick

Chapters (12)

Intro: vision, Hugging Face, and the future of AI

0:31 Why vision feels different now

3:58 LLaVA, IDEFICS, and multimodal training

8:56 CNNs, ViTs, and older vision architectures

15:46 How vision models could reach everyday users

16:50 World models, JEPA, V-JEPA, Genie 3, and robotics

25:44 OpenClaw, IMCP, and agent safety

28:01 Small vision models, fine-tuning, and getting started

34:39 Why Hugging Face matters in open source AI

42:49 PaliGemma, ColPali, ColQwen, and vision retrieval

47:26 Before Hugging Face: how models were shared

49:48 Mentors, culture, and closing thoughts

How Transformers Finally Ate Vision – Isaac Robinson, Roboflow