Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,541
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (396) Articles (217)Blog Posts (117)Tutorials (48)Research Papers (13)News (1)
Generating SEM Images from Segmentation Masks
Dev.to · Shira S 👁️ Computer Vision 4mo ago
Generating SEM Images from Segmentation Masks
Acknowledgements We would like to thank our mentors, Asaf Nisani and Yoav Lebendiker, for...
Computer Vision for Web Developers: Build an Image Recognition App with TensorFlow.js
Dev.to · Paul Robertson 👁️ Computer Vision 4mo ago
Computer Vision for Web Developers: Build an Image Recognition App with TensorFlow.js
Learn to build a complete image recognition web app using TensorFlow.js with real-time webcam classification and object detection. Includes practical code examp
Building a Real-Time Security Dashboard with Stream Vision Agents and YOLO11
Dev.to · Quincy Oghenetejiri 👁️ Computer Vision 4mo ago
Building a Real-Time Security Dashboard with Stream Vision Agents and YOLO11
Traditional security camera stacks built with OpenCV and Flask often break down under real-world...
Exploring conv-kmeans-lab: A C++ Tool for CIELAB Image Color Segmentation
Dev.to · 💻 Arpad Kish 💻 👁️ Computer Vision 4mo ago
Exploring conv-kmeans-lab: A C++ Tool for CIELAB Image Color Segmentation
Image segmentation is a fundamental task in computer vision, and grouping pixels by color is one of...
Audio Segmentation with YAMNet: Detecting Speech, Music, and Silence
Dev.to · vast cow 👁️ Computer Vision 4mo ago
Audio Segmentation with YAMNet: Detecting Speech, Music, and Silence
This article explains a Python program that analyzes an audio file and automatically segments it into...
How to Auto-Label your Segmentation Dataset with SAM3
Dev.to · Artem Zabarov 👁️ Computer Vision 4mo ago
How to Auto-Label your Segmentation Dataset with SAM3
How to Auto-Label Your Entire Segmentation Dataset Using SAM 3 Text Prompts Stop...
Stop Manual Segmentation: Meet NotumAi - An Open-Source AI Annotation Tool
Dev.to · Maulik Sompura 👁️ Computer Vision 4mo ago
Stop Manual Segmentation: Meet NotumAi - An Open-Source AI Annotation Tool
If you've ever built a computer vision model, you know this truth: Data annotation is the slowest,...
Image Classification with CNNs – Part 3: Understanding Max Pooling and Results
Dev.to · Rijul Rajesh 👁️ Computer Vision 4mo ago
Image Classification with CNNs – Part 3: Understanding Max Pooling and Results
In the previous article, we were going through the creation of feature map. In this article we will...
Implementing Tamil OCR Using Python and Tesseract
Dev.to · Yuvan Shankar 👁️ Computer Vision 4mo ago
Implementing Tamil OCR Using Python and Tesseract
INTRODUCTION: Optical Character Recognition (OCR) is a technology that converts images containing...
Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG
Dev.to · Beck_Moulton 👁️ Computer Vision 4mo ago
Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG
We’ve all been there: staring at a tiny medicine box, squinting at chemical names like Acetaminophen...
EXPLORING OCR MODEL AND BACKEND SUPPORT IN PYTHON
Dev.to · Yuvan Shankar 👁️ Computer Vision 4mo ago
EXPLORING OCR MODEL AND BACKEND SUPPORT IN PYTHON
Optical Character Recognition (OCR) is a technology that converts images, scanned documents, or PDFs...
Multimodal Visual Understanding in Swift (aka: "why is this still so hard on-device?")
Dev.to · Timothy Fosteman 👁️ Computer Vision 4mo ago
Multimodal Visual Understanding in Swift (aka: "why is this still so hard on-device?")
I’ve been spending a lot of time lately thinking about one thing: how to get good image-to-text...
What is OCR? (And 4 Real-World Use Cases)
Dev.to · Resumemind 👁️ Computer Vision 4mo ago
What is OCR? (And 4 Real-World Use Cases)
What is OCR? OCR stands for Optical Character Recognition. In simple terms, it is the...
2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding
Dev.to · Sienna 👁️ Computer Vision 4mo ago
2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding
🎯 Core Takeaways (TL;DR) GLM-OCR is a 0.9B-parameter multimodal OCR model built on the...
"Semantic Chaining" Bypasses Multimodal AI Safety Filters
Dev.to · Alessandro Pignati 👁️ Computer Vision 5mo ago
"Semantic Chaining" Bypasses Multimodal AI Safety Filters
Ever wondered how "unbreakable" AI safety filters actually are? As developers, we’re often told that...
Multimodal RAG in Action: Building a Skin Health Assistant with CLIP and Milvus
Dev.to · Beck_Moulton 👁️ Computer Vision 5mo ago
Multimodal RAG in Action: Building a Skin Health Assistant with CLIP and Milvus
In the world of AI, we've moved far beyond simple text-based search. But when it comes to healthcare,...
🎯 YOLOトレーニング実践
Dev.to · TK Lin 👁️ Computer Vision 5mo ago
🎯 YOLOトレーニング実践
YOLO動物認識トレーニング実践:0から80%精度への完全ガイド 和心村 AI Director 技術ノート #2 🎯...
🎯 YOLO訓練實戰
Dev.to · TK Lin 👁️ Computer Vision 5mo ago
🎯 YOLO訓練實戰
YOLO 動物辨識訓練實戰:從 0 到 80% 準確率的完整指南 和心村 AI Director 技術筆記 #2 🎯 目標:讓 AI...
The GreenEyes.AI Vision Stack: A Hybrid Pipeline for Object Labeling and Feature-Based Recognition
Dev.to · 💻 Arpad Kish 💻 👁️ Computer Vision 5mo ago
The GreenEyes.AI Vision Stack: A Hybrid Pipeline for Object Labeling and Feature-Based Recognition
Introduction In the rapidly evolving landscape of computer vision, the challenge often...
Get Started With Image Classification in Kaggle using Python
Dev.to · Debajyati Dey 👁️ Computer Vision 5mo ago
Get Started With Image Classification in Kaggle using Python
WHAT KAGGLE IS Kaggle is a fantastic and great platform for enthusiastic Data Science...
Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)
Dev.to · Eyasu Asnake 👁️ Computer Vision 5mo ago
Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)
Most object detection systems assume a fixed label set: train a model on COCO, Open Images, or a...
Did You Know CLIP Works as an AI Image Detector?
Dev.to · Jason Peterson 👁️ Computer Vision 5mo ago
Did You Know CLIP Works as an AI Image Detector?
OpenAI's CLIP model was trained to match images with text descriptions. But here's something...
Building a Production-Ready Traffic Violation Detection System with Computer Vision
Dev.to · Harris Bashir 👁️ Computer Vision 5mo ago
Building a Production-Ready Traffic Violation Detection System with Computer Vision
Traffic monitoring and violation detection is a classic computer vision problem that looks...
Beyond Image Labels: Estimating Food Portions and Calories using Grounding DINO + SAM
Dev.to · Beck_Moulton 👁️ Computer Vision 5mo ago
Beyond Image Labels: Estimating Food Portions and Calories using Grounding DINO + SAM
Ever tried those calorie tracking apps where you have to manually search for "medium-sized chicken...
Multimodal AI: Why Text-Only Models Are Already Dead!
Dev.to · SATINATH MONDAL 👁️ Computer Vision 5mo ago
Multimodal AI: Why Text-Only Models Are Already Dead!
Vision, audio, video, and text in a single AI model. Here's why multimodal AI is revolutionizing development and how to build with it today.
Why Rust?
Dev.to · Cyrus Tse 👁️ Computer Vision 5mo ago
Why Rust?
"Every programmer remembers the first time their program crashed with a segmentation fault. Or...
KNN Algorithm from Scratch -Cat vs Dog Image Classification in Python (Complete Experiment)
Dev.to · Yogender 👁️ Computer Vision 5mo ago
KNN Algorithm from Scratch -Cat vs Dog Image Classification in Python (Complete Experiment)
🧠 KNN Algorithm from Scratch — Real Image Classification Experiment I recently built a...
Laravel Face Recognition and Authentication
Dev.to · Pius oruko 👁️ Computer Vision 5mo ago
Laravel Face Recognition and Authentication
Introduction A recurring security and usability issue with web applications is passwords. They are...
From Prototype to Production: Building a Multimodal Video Search Engine
Dev.to · Jason Peterson 👁️ Computer Vision 5mo ago
From Prototype to Production: Building a Multimodal Video Search Engine
In my last post, I wrote about the unreasonable effectiveness of model stacking for media...
AI Clothes Changer Models Explained: Diffusion, Segmentation
Dev.to · FreePixel 👁️ Computer Vision 6mo ago
AI Clothes Changer Models Explained: Diffusion, Segmentation
AI clothes changer models are the systems that make realistic outfit swapping in images possible....
Computer vision for code: What PVS-Studio saw in OpenCV
Dev.to · Unicorn Developer 👁️ Computer Vision 6mo ago
Computer vision for code: What PVS-Studio saw in OpenCV
What do computer vision and static analysis have in common? Both seek meaning in data. OpenCV finds...
Building an Event-Driven OCR Service: Challenges and Solutions
Dev.to · Rajesh Pethe 👁️ Computer Vision 6mo ago
Building an Event-Driven OCR Service: Challenges and Solutions
Optical Character Recognition (OCR) is a powerful AI/ML technology that recognizes and extracts text...
How I Built a Computer Vision Chess Board Detector
Dev.to · MD ABUBAKAR 👁️ Computer Vision 6mo ago
How I Built a Computer Vision Chess Board Detector
I Built a Chess Scanner That Converts Any Chess Image Into a FEN + Analyzes Games Like Chess.com 👉...
Building a Unified Benchmarking Pipeline for Computer Vision — Without Rewriting Code for Every Task
Dev.to · Michal S 👁️ Computer Vision 6mo ago
Building a Unified Benchmarking Pipeline for Computer Vision — Without Rewriting Code for Every Task
This project was developed as part of the Extra-Tech Computer Vision Bootcamp, in collaboration with...
Multimodal Agents and Their Applications
Dev.to · pranav s 👁️ Computer Vision 7mo ago
Multimodal Agents and Their Applications
Multimodal Agents and Their Applications Author: Pranav S - 2025-12-01 ...
Why this ESP32-CAM Became My New Favorite Module
Dev.to · Rifat 👁️ Computer Vision 7mo ago
Why this ESP32-CAM Became My New Favorite Module
For the last six months, I have been working with various AI projects, including object detection,...
Build a Face Detection App with Python OOP — From Zero to Pro(part-3)
Dev.to · MohammadReza Mahdian 👁️ Computer Vision 7mo ago
Build a Face Detection App with Python OOP — From Zero to Pro(part-3)
Part 3: OpenCVBase — Designing a Clean Parent Class Why Create a Base...
2025 Complete Guide: In-Depth Analysis of ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model
Dev.to · cz 👁️ Computer Vision 7mo ago
2025 Complete Guide: In-Depth Analysis of ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model
🎯 Key Takeaways (TL;DR) Lightweight & Efficient: Activates only 3B parameters while...
From Text to Live Video: How We Built a Serverless Multimodal Logistics AI on Google Cloud Run
Dev.to · Michael G. Inso 👁️ Computer Vision 7mo ago
From Text to Live Video: How We Built a Serverless Multimodal Logistics AI on Google Cloud Run
The logistics industry runs on information. From tracking numbers on a crumpled label to complex...
Evita el problema N+1 en validaciones de Laravel
Dev.to · Andres Daza 👁️ Computer Vision 7mo ago
Evita el problema N+1 en validaciones de Laravel
El problema oculto detrás de las validaciones masivas Cuando validamos arrays de datos en...
Real-Time Face Recognition Attendance — QR Access & Google Sheets Integration
Dev.to · Akshaya Reddy Annareddy 👁️ Computer Vision 7mo ago
Real-Time Face Recognition Attendance — QR Access & Google Sheets Integration
🚀 This project automates classroom attendance using Face Recognition (MTCNN + FaceNet) integrated...
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AS
Dev.to · Dr. Carlos Ruiz Viquez 👁️ Computer Vision 8mo ago
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AS
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AST), an open-source library that shines in...
Daft vs Ray Data: A Comprehensive Comparison for Multimodal Data Processing
Dev.to · YK Sugi 👁️ Computer Vision 8mo ago
Daft vs Ray Data: A Comprehensive Comparison for Multimodal Data Processing
Multimodal AI workloads break traditional data engines. They need to embed documents, classify...
**The Hidden Pitfall of Multimodal Fusion: Avoid Over-weight
Dev.to · Dr. Carlos Ruiz Viquez 👁️ Computer Vision 8mo ago
**The Hidden Pitfall of Multimodal Fusion: Avoid Over-weight
The Hidden Pitfall of Multimodal Fusion: Avoid Over-weighting a Single Modality When working with...
**The Dark Side of Computer Vision: How Adversarial Examples
Dev.to · Dr. Carlos Ruiz Viquez 👁️ Computer Vision 8mo ago
**The Dark Side of Computer Vision: How Adversarial Examples
The Dark Side of Computer Vision: How Adversarial Examples Can Fool Even the Most Advanced...
Beats as Objects: A Computer Vision Hack for Music Analysis by Arvind Sundararajan
Dev.to · Arvind SundaraRajan 👁️ Computer Vision 8mo ago
Beats as Objects: A Computer Vision Hack for Music Analysis by Arvind Sundararajan
Beats as Objects: A Computer Vision Hack for Music Analysis \Struggling to accurately...
Hello Guys Anyone working live video feed analysis using Computer vision i need help in terms of technical part looking to talk further and discuss.
Dev.to · anujpatel2899 👁️ Computer Vision 8mo ago
Hello Guys Anyone working live video feed analysis using Computer vision i need help in terms of technical part looking to talk further and discuss.
A post by anujpatel2899
What is a Model Serving Framework? A Simple Guide
Dev.to · Sohan Lal 👁️ Computer Vision 8mo ago
What is a Model Serving Framework? A Simple Guide
Have you ever wondered how artificial intelligence (AI) apps work? When you use a face recognition...