Foundations
Computer Vision
Object detection, segmentation, YOLO, CLIP, and vision-language models
Skills in this topic
3 skills — Sign in to track your progress

Dev.to · Quincy Oghenetejiri
👁️ Computer Vision
4mo ago
Building a Real-Time Security Dashboard with Stream Vision Agents and YOLO11
Traditional security camera stacks built with OpenCV and Flask often break down under real-world...

Dev.to · 💻 Arpad Kish 💻
👁️ Computer Vision
4mo ago
Exploring conv-kmeans-lab: A C++ Tool for CIELAB Image Color Segmentation
Image segmentation is a fundamental task in computer vision, and grouping pixels by color is one of...

Dev.to · vast cow
👁️ Computer Vision
4mo ago
Audio Segmentation with YAMNet: Detecting Speech, Music, and Silence
This article explains a Python program that analyzes an audio file and automatically segments it into...

Dev.to · Artem Zabarov
👁️ Computer Vision
4mo ago
How to Auto-Label your Segmentation Dataset with SAM3
How to Auto-Label Your Entire Segmentation Dataset Using SAM 3 Text Prompts Stop...

Dev.to · Maulik Sompura
👁️ Computer Vision
4mo ago
Stop Manual Segmentation: Meet NotumAi - An Open-Source AI Annotation Tool
If you've ever built a computer vision model, you know this truth: Data annotation is the slowest,...

Dev.to · Rijul Rajesh
👁️ Computer Vision
4mo ago
Image Classification with CNNs – Part 3: Understanding Max Pooling and Results
In the previous article, we were going through the creation of feature map. In this article we will...

Dev.to · Yuvan Shankar
👁️ Computer Vision
4mo ago
Implementing Tamil OCR Using Python and Tesseract
INTRODUCTION: Optical Character Recognition (OCR) is a technology that converts images containing...

Dev.to · Beck_Moulton
👁️ Computer Vision
4mo ago
Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG
We’ve all been there: staring at a tiny medicine box, squinting at chemical names like Acetaminophen...
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
4mo ago
Calling Roboticists & Vision Experts: Tackle Dexterous Manipulation and Win Big in the AI for Industry Challenge
A real-world robotics challenge with a $180K prize pool, where innovation and industry impact collide. We’re standing at an inflection point in robotics: electr

Dev.to · Yuvan Shankar
👁️ Computer Vision
4mo ago
EXPLORING OCR MODEL AND BACKEND SUPPORT IN PYTHON
Optical Character Recognition (OCR) is a technology that converts images, scanned documents, or PDFs...

Dev.to · Timothy Fosteman
👁️ Computer Vision
4mo ago
Multimodal Visual Understanding in Swift (aka: "why is this still so hard on-device?")
I’ve been spending a lot of time lately thinking about one thing: how to get good image-to-text...

Dev.to · Resumemind
👁️ Computer Vision
4mo ago
What is OCR? (And 4 Real-World Use Cases)
What is OCR? OCR stands for Optical Character Recognition. In simple terms, it is the...
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
4mo ago
Real-Time Face Tracking: OpenCV Control of a UR Robot
This project controls a Universal Robots UR5 using real-time face tracking built with OpenCV. A standard webcam provides a live video stream that detects a huma

Dev.to · Sienna
👁️ Computer Vision
4mo ago
2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding
🎯 Core Takeaways (TL;DR) GLM-OCR is a 0.9B-parameter multimodal OCR model built on the...
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
4mo ago
Part 3: Simultaneous Localization & Mapping: Which SLAM Is For You? on OpenCV Live!
Note: This event has been rescheduled but the links still work. Simultaneous Localization & Mapping (SLAM) is one of the most active and contentious areas of CV

Dev.to · Alessandro Pignati
👁️ Computer Vision
5mo ago
"Semantic Chaining" Bypasses Multimodal AI Safety Filters
Ever wondered how "unbreakable" AI safety filters actually are? As developers, we’re often told that...

Dev.to · Beck_Moulton
👁️ Computer Vision
5mo ago
Multimodal RAG in Action: Building a Skin Health Assistant with CLIP and Milvus
In the world of AI, we've moved far beyond simple text-based search. But when it comes to healthcare,...
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
5mo ago
OpenCV Live: The Low-Power Computer Vision Challenge 2026
This year the Low-Power Computer Vision Challenge (LPCV) has three tracks with serious prize money including Image-to-Text Retrieval, Action Recognition in Vide

Dev.to · TK Lin
👁️ Computer Vision
5mo ago
🎯 YOLOトレーニング実践
YOLO動物認識トレーニング実践:0から80%精度への完全ガイド 和心村 AI Director 技術ノート #2 🎯...

Dev.to · TK Lin
👁️ Computer Vision
5mo ago
🎯 YOLO訓練實戰
YOLO 動物辨識訓練實戰:從 0 到 80% 準確率的完整指南 和心村 AI Director 技術筆記 #2 🎯 目標:讓 AI...

Dev.to · 💻 Arpad Kish 💻
👁️ Computer Vision
5mo ago
The GreenEyes.AI Vision Stack: A Hybrid Pipeline for Object Labeling and Feature-Based Recognition
Introduction In the rapidly evolving landscape of computer vision, the challenge often...
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
5mo ago
From Image Features to Visual Place Recognition: OpenCV Approach
In this blog, we explore Visual Place Recognition (VPR) with hands-on examples using OpenCV and lightweight Python tools. You will create a practical VPR pipeli

Dev.to · Debajyati Dey
👁️ Computer Vision
5mo ago
Get Started With Image Classification in Kaggle using Python
WHAT KAGGLE IS Kaggle is a fantastic and great platform for enthusiastic Data Science...
DeepMind Blog
👁️ Computer Vision
⚡ AI Lesson
5mo ago
D4RT: Teaching AI to see the world in four dimensions
D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.

Dev.to · Eyasu Asnake
👁️ Computer Vision
5mo ago
Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)
Most object detection systems assume a fixed label set: train a model on COCO, Open Images, or a...

Dev.to · Jason Peterson
👁️ Computer Vision
5mo ago
Did You Know CLIP Works as an AI Image Detector?
OpenAI's CLIP model was trained to match images with text descriptions. But here's something...

OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
5mo ago
Watershed Segmentation Using OpenCV
Explore the elegant intersection of nature-inspired algorithms and computer vision. This comprehensive technical guide unveils the powerful watershed segmentati

Dev.to · Harris Bashir
👁️ Computer Vision
5mo ago
Building a Production-Ready Traffic Violation Detection System with Computer Vision
Traffic monitoring and violation detection is a classic computer vision problem that looks...

Dev.to · Beck_Moulton
👁️ Computer Vision
5mo ago
Beyond Image Labels: Estimating Food Portions and Calories using Grounding DINO + SAM
Ever tried those calorie tracking apps where you have to manually search for "medium-sized chicken...

Dev.to · SATINATH MONDAL
👁️ Computer Vision
5mo ago
Multimodal AI: Why Text-Only Models Are Already Dead!
Vision, audio, video, and text in a single AI model. Here's why multimodal AI is revolutionizing development and how to build with it today.
BAIR Blog
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
5mo ago
Information-Driven Design of Imaging Systems
<!-- These are comments in HTML. The above header text is needed to format the title, authors, etc. The "information-driven-imaging" is the representative image

Dev.to · Cyrus Tse
👁️ Computer Vision
5mo ago
Why Rust?
"Every programmer remembers the first time their program crashed with a segmentation fault. Or...
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
5mo ago
Enhancing Images: Adaptive Shadow Correction Using OpenCV
In this blog post, we'll tackle this challenge head-on with a practical approach to shadow correction using OpenCV. Our method leverages Multi-Scale Retinex (MS

Dev.to · Yogender
👁️ Computer Vision
5mo ago
KNN Algorithm from Scratch -Cat vs Dog Image Classification in Python (Complete Experiment)
🧠 KNN Algorithm from Scratch — Real Image Classification Experiment I recently built a...

Dev.to · Pius oruko
👁️ Computer Vision
5mo ago
Laravel Face Recognition and Authentication
Introduction A recurring security and usability issue with web applications is passwords. They are...

Dev.to · Jason Peterson
👁️ Computer Vision
5mo ago
From Prototype to Production: Building a Multimodal Video Search Engine
In my last post, I wrote about the unreasonable effectiveness of model stacking for media...

OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
6mo ago
Smart Document Scanning with Live OCR using OpenCV.js
This blog explores how to build a smart, browser-based document scanner using OpenCV.js and live OCR. It covers document detection, perspective correction, inte

Dev.to · FreePixel
👁️ Computer Vision
6mo ago
AI Clothes Changer Models Explained: Diffusion, Segmentation
AI clothes changer models are the systems that make realistic outfit swapping in images possible....
OpenCV Blog
👁️ Computer Vision
⚡ AI Lesson
6mo ago
OpenCV G-API: From Imperative to Declarative Pipelines
Explore OpenCV G-API and how it transforms image-processing pipelines from imperative to declarative with graph-based execution. The post OpenCV G-API: From Imp

Dev.to · Unicorn Developer
👁️ Computer Vision
6mo ago
Computer vision for code: What PVS-Studio saw in OpenCV
What do computer vision and static analysis have in common? Both seek meaning in data. OpenCV finds...

Dev.to · Rajesh Pethe
👁️ Computer Vision
6mo ago
Building an Event-Driven OCR Service: Challenges and Solutions
Optical Character Recognition (OCR) is a powerful AI/ML technology that recognizes and extracts text...

Dev.to · MD ABUBAKAR
👁️ Computer Vision
6mo ago
How I Built a Computer Vision Chess Board Detector
I Built a Chess Scanner That Converts Any Chess Image Into a FEN + Analyzes Games Like Chess.com 👉...

Dev.to · Michal S
👁️ Computer Vision
6mo ago
Building a Unified Benchmarking Pipeline for Computer Vision — Without Rewriting Code for Every Task
This project was developed as part of the Extra-Tech Computer Vision Bootcamp, in collaboration with...

Dev.to · pranav s
👁️ Computer Vision
7mo ago
Multimodal Agents and Their Applications
Multimodal Agents and Their Applications Author: Pranav S - 2025-12-01 ...

Replicate Blog
👁️ Computer Vision
⚡ AI Lesson
7mo ago
Run FLUX.2 on Replicate
FLUX.2 brings professional-grade image generation and editing with unprecedented detail, multi-reference support, and enterprise efficiency.

Dev.to · Rifat
👁️ Computer Vision
7mo ago
Why this ESP32-CAM Became My New Favorite Module
For the last six months, I have been working with various AI projects, including object detection,...

Dev.to · MohammadReza Mahdian
👁️ Computer Vision
7mo ago
Build a Face Detection App with Python OOP — From Zero to Pro(part-3)
Part 3: OpenCVBase — Designing a Clean Parent Class Why Create a Base...

Dev.to · cz
👁️ Computer Vision
7mo ago
2025 Complete Guide: In-Depth Analysis of ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model
🎯 Key Takeaways (TL;DR) Lightweight & Efficient: Activates only 3B parameters while...
DeepCamp AI