Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,539
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (394) Articles (216)Blog Posts (117)Tutorials (47)Research Papers (13)News (1)
Building a Real-Time Security Dashboard with Stream Vision Agents and YOLO11
Dev.to · Quincy Oghenetejiri 👁️ Computer Vision 4mo ago
Building a Real-Time Security Dashboard with Stream Vision Agents and YOLO11
Traditional security camera stacks built with OpenCV and Flask often break down under real-world...
Exploring conv-kmeans-lab: A C++ Tool for CIELAB Image Color Segmentation
Dev.to · 💻 Arpad Kish 💻 👁️ Computer Vision 4mo ago
Exploring conv-kmeans-lab: A C++ Tool for CIELAB Image Color Segmentation
Image segmentation is a fundamental task in computer vision, and grouping pixels by color is one of...
Audio Segmentation with YAMNet: Detecting Speech, Music, and Silence
Dev.to · vast cow 👁️ Computer Vision 4mo ago
Audio Segmentation with YAMNet: Detecting Speech, Music, and Silence
This article explains a Python program that analyzes an audio file and automatically segments it into...
How to Auto-Label your Segmentation Dataset with SAM3
Dev.to · Artem Zabarov 👁️ Computer Vision 4mo ago
How to Auto-Label your Segmentation Dataset with SAM3
How to Auto-Label Your Entire Segmentation Dataset Using SAM 3 Text Prompts Stop...
Stop Manual Segmentation: Meet NotumAi - An Open-Source AI Annotation Tool
Dev.to · Maulik Sompura 👁️ Computer Vision 4mo ago
Stop Manual Segmentation: Meet NotumAi - An Open-Source AI Annotation Tool
If you've ever built a computer vision model, you know this truth: Data annotation is the slowest,...
Image Classification with CNNs – Part 3: Understanding Max Pooling and Results
Dev.to · Rijul Rajesh 👁️ Computer Vision 4mo ago
Image Classification with CNNs – Part 3: Understanding Max Pooling and Results
In the previous article, we were going through the creation of feature map. In this article we will...
Implementing Tamil OCR Using Python and Tesseract
Dev.to · Yuvan Shankar 👁️ Computer Vision 4mo ago
Implementing Tamil OCR Using Python and Tesseract
INTRODUCTION: Optical Character Recognition (OCR) is a technology that converts images containing...
Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG
Dev.to · Beck_Moulton 👁️ Computer Vision 4mo ago
Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG
We’ve all been there: staring at a tiny medicine box, squinting at chemical names like Acetaminophen...
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago
Calling Roboticists & Vision Experts: Tackle Dexterous Manipulation and Win Big in the AI for Industry Challenge
A real-world robotics challenge with a $180K prize pool, where innovation and industry impact collide. We’re standing at an inflection point in robotics: electr
EXPLORING OCR MODEL AND BACKEND SUPPORT IN PYTHON
Dev.to · Yuvan Shankar 👁️ Computer Vision 4mo ago
EXPLORING OCR MODEL AND BACKEND SUPPORT IN PYTHON
Optical Character Recognition (OCR) is a technology that converts images, scanned documents, or PDFs...
Multimodal Visual Understanding in Swift (aka: "why is this still so hard on-device?")
Dev.to · Timothy Fosteman 👁️ Computer Vision 4mo ago
Multimodal Visual Understanding in Swift (aka: "why is this still so hard on-device?")
I’ve been spending a lot of time lately thinking about one thing: how to get good image-to-text...
What is OCR? (And 4 Real-World Use Cases)
Dev.to · Resumemind 👁️ Computer Vision 4mo ago
What is OCR? (And 4 Real-World Use Cases)
What is OCR? OCR stands for Optical Character Recognition. In simple terms, it is the...
Real-Time Face Tracking: OpenCV Control of a UR Robot
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago
Real-Time Face Tracking: OpenCV Control of a UR Robot
This project controls a Universal Robots UR5 using real-time face tracking built with OpenCV. A standard webcam provides a live video stream that detects a huma
2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding
Dev.to · Sienna 👁️ Computer Vision 4mo ago
2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding
🎯 Core Takeaways (TL;DR) GLM-OCR is a 0.9B-parameter multimodal OCR model built on the...
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago
Part 3: Simultaneous Localization & Mapping: Which SLAM Is For You? on OpenCV Live!
Note: This event has been rescheduled but the links still work. Simultaneous Localization & Mapping (SLAM) is one of the most active and contentious areas of CV
"Semantic Chaining" Bypasses Multimodal AI Safety Filters
Dev.to · Alessandro Pignati 👁️ Computer Vision 5mo ago
"Semantic Chaining" Bypasses Multimodal AI Safety Filters
Ever wondered how "unbreakable" AI safety filters actually are? As developers, we’re often told that...
Multimodal RAG in Action: Building a Skin Health Assistant with CLIP and Milvus
Dev.to · Beck_Moulton 👁️ Computer Vision 5mo ago
Multimodal RAG in Action: Building a Skin Health Assistant with CLIP and Milvus
In the world of AI, we've moved far beyond simple text-based search. But when it comes to healthcare,...
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 5mo ago
OpenCV Live: The Low-Power Computer Vision Challenge 2026
This year the Low-Power Computer Vision Challenge (LPCV) has three tracks with serious prize money including Image-to-Text Retrieval, Action Recognition in Vide
🎯 YOLOトレーニング実践
Dev.to · TK Lin 👁️ Computer Vision 5mo ago
🎯 YOLOトレーニング実践
YOLO動物認識トレーニング実践:0から80%精度への完全ガイド 和心村 AI Director 技術ノート #2 🎯...
🎯 YOLO訓練實戰
Dev.to · TK Lin 👁️ Computer Vision 5mo ago
🎯 YOLO訓練實戰
YOLO 動物辨識訓練實戰:從 0 到 80% 準確率的完整指南 和心村 AI Director 技術筆記 #2 🎯 目標:讓 AI...
The GreenEyes.AI Vision Stack: A Hybrid Pipeline for Object Labeling and Feature-Based Recognition
Dev.to · 💻 Arpad Kish 💻 👁️ Computer Vision 5mo ago
The GreenEyes.AI Vision Stack: A Hybrid Pipeline for Object Labeling and Feature-Based Recognition
Introduction In the rapidly evolving landscape of computer vision, the challenge often...
From Image Features to Visual Place Recognition: OpenCV Approach
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 5mo ago
From Image Features to Visual Place Recognition: OpenCV Approach
In this blog, we explore Visual Place Recognition (VPR) with hands-on examples using OpenCV and lightweight Python tools. You will create a practical VPR pipeli
Get Started With Image Classification in Kaggle using Python
Dev.to · Debajyati Dey 👁️ Computer Vision 5mo ago
Get Started With Image Classification in Kaggle using Python
WHAT KAGGLE IS Kaggle is a fantastic and great platform for enthusiastic Data Science...
D4RT: Teaching AI to see the world in four dimensions
DeepMind Blog 👁️ Computer Vision ⚡ AI Lesson 5mo ago
D4RT: Teaching AI to see the world in four dimensions
D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.
Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)
Dev.to · Eyasu Asnake 👁️ Computer Vision 5mo ago
Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)
Most object detection systems assume a fixed label set: train a model on COCO, Open Images, or a...
Did You Know CLIP Works as an AI Image Detector?
Dev.to · Jason Peterson 👁️ Computer Vision 5mo ago
Did You Know CLIP Works as an AI Image Detector?
OpenAI's CLIP model was trained to match images with text descriptions. But here's something...
Watershed Segmentation Using OpenCV
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 5mo ago
Watershed Segmentation Using OpenCV
Explore the elegant intersection of nature-inspired algorithms and computer vision. This comprehensive technical guide unveils the powerful watershed segmentati
Building a Production-Ready Traffic Violation Detection System with Computer Vision
Dev.to · Harris Bashir 👁️ Computer Vision 5mo ago
Building a Production-Ready Traffic Violation Detection System with Computer Vision
Traffic monitoring and violation detection is a classic computer vision problem that looks...
Beyond Image Labels: Estimating Food Portions and Calories using Grounding DINO + SAM
Dev.to · Beck_Moulton 👁️ Computer Vision 5mo ago
Beyond Image Labels: Estimating Food Portions and Calories using Grounding DINO + SAM
Ever tried those calorie tracking apps where you have to manually search for "medium-sized chicken...
Multimodal AI: Why Text-Only Models Are Already Dead!
Dev.to · SATINATH MONDAL 👁️ Computer Vision 5mo ago
Multimodal AI: Why Text-Only Models Are Already Dead!
Vision, audio, video, and text in a single AI model. Here's why multimodal AI is revolutionizing development and how to build with it today.
BAIR Blog 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 5mo ago
Information-Driven Design of Imaging Systems
<!-- These are comments in HTML. The above header text is needed to format the title, authors, etc. The "information-driven-imaging" is the representative image
Why Rust?
Dev.to · Cyrus Tse 👁️ Computer Vision 5mo ago
Why Rust?
"Every programmer remembers the first time their program crashed with a segmentation fault. Or...
Enhancing Images: Adaptive Shadow Correction Using OpenCV
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 5mo ago
Enhancing Images: Adaptive Shadow Correction Using OpenCV
In this blog post, we'll tackle this challenge head-on with a practical approach to shadow correction using OpenCV. Our method leverages Multi-Scale Retinex (MS
KNN Algorithm from Scratch -Cat vs Dog Image Classification in Python (Complete Experiment)
Dev.to · Yogender 👁️ Computer Vision 5mo ago
KNN Algorithm from Scratch -Cat vs Dog Image Classification in Python (Complete Experiment)
🧠 KNN Algorithm from Scratch — Real Image Classification Experiment I recently built a...
Laravel Face Recognition and Authentication
Dev.to · Pius oruko 👁️ Computer Vision 5mo ago
Laravel Face Recognition and Authentication
Introduction A recurring security and usability issue with web applications is passwords. They are...
From Prototype to Production: Building a Multimodal Video Search Engine
Dev.to · Jason Peterson 👁️ Computer Vision 5mo ago
From Prototype to Production: Building a Multimodal Video Search Engine
In my last post, I wrote about the unreasonable effectiveness of model stacking for media...
Smart Document Scanning with Live OCR using OpenCV.js
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 6mo ago
Smart Document Scanning with Live OCR using OpenCV.js
This blog explores how to build a smart, browser-based document scanner using OpenCV.js and live OCR. It covers document detection, perspective correction, inte
AI Clothes Changer Models Explained: Diffusion, Segmentation
Dev.to · FreePixel 👁️ Computer Vision 6mo ago
AI Clothes Changer Models Explained: Diffusion, Segmentation
AI clothes changer models are the systems that make realistic outfit swapping in images possible....
OpenCV G-API: From Imperative to Declarative Pipelines
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 6mo ago
OpenCV G-API: From Imperative to Declarative Pipelines
Explore OpenCV G-API and how it transforms image-processing pipelines from imperative to declarative with graph-based execution. The post OpenCV G-API: From Imp
Computer vision for code: What PVS-Studio saw in OpenCV
Dev.to · Unicorn Developer 👁️ Computer Vision 6mo ago
Computer vision for code: What PVS-Studio saw in OpenCV
What do computer vision and static analysis have in common? Both seek meaning in data. OpenCV finds...
Building an Event-Driven OCR Service: Challenges and Solutions
Dev.to · Rajesh Pethe 👁️ Computer Vision 6mo ago
Building an Event-Driven OCR Service: Challenges and Solutions
Optical Character Recognition (OCR) is a powerful AI/ML technology that recognizes and extracts text...
How I Built a Computer Vision Chess Board Detector
Dev.to · MD ABUBAKAR 👁️ Computer Vision 6mo ago
How I Built a Computer Vision Chess Board Detector
I Built a Chess Scanner That Converts Any Chess Image Into a FEN + Analyzes Games Like Chess.com 👉...
Building a Unified Benchmarking Pipeline for Computer Vision — Without Rewriting Code for Every Task
Dev.to · Michal S 👁️ Computer Vision 6mo ago
Building a Unified Benchmarking Pipeline for Computer Vision — Without Rewriting Code for Every Task
This project was developed as part of the Extra-Tech Computer Vision Bootcamp, in collaboration with...
Multimodal Agents and Their Applications
Dev.to · pranav s 👁️ Computer Vision 7mo ago
Multimodal Agents and Their Applications
Multimodal Agents and Their Applications Author: Pranav S - 2025-12-01 ...
Run FLUX.2 on Replicate
Replicate Blog 👁️ Computer Vision ⚡ AI Lesson 7mo ago
Run FLUX.2 on Replicate
FLUX.2 brings professional-grade image generation and editing with unprecedented detail, multi-reference support, and enterprise efficiency.
Why this ESP32-CAM Became My New Favorite Module
Dev.to · Rifat 👁️ Computer Vision 7mo ago
Why this ESP32-CAM Became My New Favorite Module
For the last six months, I have been working with various AI projects, including object detection,...
Build a Face Detection App with Python OOP — From Zero to Pro(part-3)
Dev.to · MohammadReza Mahdian 👁️ Computer Vision 7mo ago
Build a Face Detection App with Python OOP — From Zero to Pro(part-3)
Part 3: OpenCVBase — Designing a Clean Parent Class Why Create a Base...
2025 Complete Guide: In-Depth Analysis of ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model
Dev.to · cz 👁️ Computer Vision 7mo ago
2025 Complete Guide: In-Depth Analysis of ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model
🎯 Key Takeaways (TL;DR) Lightweight & Efficient: Activates only 3B parameters while...