Foundations
Computer Vision
Object detection, segmentation, YOLO, CLIP, and vision-language models
Skills in this topic
3 skills — Sign in to track your progress

Dev.to · Shira S
👁️ Computer Vision
4mo ago
Generating SEM Images from Segmentation Masks
Acknowledgements We would like to thank our mentors, Asaf Nisani and Yoav Lebendiker, for...

Dev.to · Paul Robertson
👁️ Computer Vision
4mo ago
Computer Vision for Web Developers: Build an Image Recognition App with TensorFlow.js
Learn to build a complete image recognition web app using TensorFlow.js with real-time webcam classification and object detection. Includes practical code examp

Dev.to · Quincy Oghenetejiri
👁️ Computer Vision
4mo ago
Building a Real-Time Security Dashboard with Stream Vision Agents and YOLO11
Traditional security camera stacks built with OpenCV and Flask often break down under real-world...

Dev.to · 💻 Arpad Kish 💻
👁️ Computer Vision
4mo ago
Exploring conv-kmeans-lab: A C++ Tool for CIELAB Image Color Segmentation
Image segmentation is a fundamental task in computer vision, and grouping pixels by color is one of...

Dev.to · vast cow
👁️ Computer Vision
4mo ago
Audio Segmentation with YAMNet: Detecting Speech, Music, and Silence
This article explains a Python program that analyzes an audio file and automatically segments it into...

Dev.to · Artem Zabarov
👁️ Computer Vision
4mo ago
How to Auto-Label your Segmentation Dataset with SAM3
How to Auto-Label Your Entire Segmentation Dataset Using SAM 3 Text Prompts Stop...

Dev.to · Maulik Sompura
👁️ Computer Vision
4mo ago
Stop Manual Segmentation: Meet NotumAi - An Open-Source AI Annotation Tool
If you've ever built a computer vision model, you know this truth: Data annotation is the slowest,...

Dev.to · Rijul Rajesh
👁️ Computer Vision
4mo ago
Image Classification with CNNs – Part 3: Understanding Max Pooling and Results
In the previous article, we were going through the creation of feature map. In this article we will...

Dev.to · Yuvan Shankar
👁️ Computer Vision
4mo ago
Implementing Tamil OCR Using Python and Tesseract
INTRODUCTION: Optical Character Recognition (OCR) is a technology that converts images containing...

Dev.to · Beck_Moulton
👁️ Computer Vision
4mo ago
Medicine Encyclopedia 2.0: Stop Guessing and Start Scanning with Multimodal RAG
We’ve all been there: staring at a tiny medicine box, squinting at chemical names like Acetaminophen...

Dev.to · Yuvan Shankar
👁️ Computer Vision
4mo ago
EXPLORING OCR MODEL AND BACKEND SUPPORT IN PYTHON
Optical Character Recognition (OCR) is a technology that converts images, scanned documents, or PDFs...

Dev.to · Timothy Fosteman
👁️ Computer Vision
4mo ago
Multimodal Visual Understanding in Swift (aka: "why is this still so hard on-device?")
I’ve been spending a lot of time lately thinking about one thing: how to get good image-to-text...

Dev.to · Resumemind
👁️ Computer Vision
4mo ago
What is OCR? (And 4 Real-World Use Cases)
What is OCR? OCR stands for Optical Character Recognition. In simple terms, it is the...

Dev.to · Sienna
👁️ Computer Vision
4mo ago
2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding
🎯 Core Takeaways (TL;DR) GLM-OCR is a 0.9B-parameter multimodal OCR model built on the...

Dev.to · Alessandro Pignati
👁️ Computer Vision
5mo ago
"Semantic Chaining" Bypasses Multimodal AI Safety Filters
Ever wondered how "unbreakable" AI safety filters actually are? As developers, we’re often told that...

Dev.to · Beck_Moulton
👁️ Computer Vision
5mo ago
Multimodal RAG in Action: Building a Skin Health Assistant with CLIP and Milvus
In the world of AI, we've moved far beyond simple text-based search. But when it comes to healthcare,...

Dev.to · TK Lin
👁️ Computer Vision
5mo ago
🎯 YOLOトレーニング実践
YOLO動物認識トレーニング実践:0から80%精度への完全ガイド 和心村 AI Director 技術ノート #2 🎯...

Dev.to · TK Lin
👁️ Computer Vision
5mo ago
🎯 YOLO訓練實戰
YOLO 動物辨識訓練實戰:從 0 到 80% 準確率的完整指南 和心村 AI Director 技術筆記 #2 🎯 目標:讓 AI...

Dev.to · 💻 Arpad Kish 💻
👁️ Computer Vision
5mo ago
The GreenEyes.AI Vision Stack: A Hybrid Pipeline for Object Labeling and Feature-Based Recognition
Introduction In the rapidly evolving landscape of computer vision, the challenge often...

Dev.to · Debajyati Dey
👁️ Computer Vision
5mo ago
Get Started With Image Classification in Kaggle using Python
WHAT KAGGLE IS Kaggle is a fantastic and great platform for enthusiastic Data Science...

Dev.to · Eyasu Asnake
👁️ Computer Vision
5mo ago
Detecting Objects in Images from Any Text Prompt (Not Fixed Classes)
Most object detection systems assume a fixed label set: train a model on COCO, Open Images, or a...

Dev.to · Jason Peterson
👁️ Computer Vision
5mo ago
Did You Know CLIP Works as an AI Image Detector?
OpenAI's CLIP model was trained to match images with text descriptions. But here's something...

Dev.to · Harris Bashir
👁️ Computer Vision
5mo ago
Building a Production-Ready Traffic Violation Detection System with Computer Vision
Traffic monitoring and violation detection is a classic computer vision problem that looks...

Dev.to · Beck_Moulton
👁️ Computer Vision
5mo ago
Beyond Image Labels: Estimating Food Portions and Calories using Grounding DINO + SAM
Ever tried those calorie tracking apps where you have to manually search for "medium-sized chicken...

Dev.to · SATINATH MONDAL
👁️ Computer Vision
5mo ago
Multimodal AI: Why Text-Only Models Are Already Dead!
Vision, audio, video, and text in a single AI model. Here's why multimodal AI is revolutionizing development and how to build with it today.

Dev.to · Cyrus Tse
👁️ Computer Vision
5mo ago
Why Rust?
"Every programmer remembers the first time their program crashed with a segmentation fault. Or...

Dev.to · Yogender
👁️ Computer Vision
5mo ago
KNN Algorithm from Scratch -Cat vs Dog Image Classification in Python (Complete Experiment)
🧠 KNN Algorithm from Scratch — Real Image Classification Experiment I recently built a...

Dev.to · Pius oruko
👁️ Computer Vision
5mo ago
Laravel Face Recognition and Authentication
Introduction A recurring security and usability issue with web applications is passwords. They are...

Dev.to · Jason Peterson
👁️ Computer Vision
5mo ago
From Prototype to Production: Building a Multimodal Video Search Engine
In my last post, I wrote about the unreasonable effectiveness of model stacking for media...

Dev.to · FreePixel
👁️ Computer Vision
6mo ago
AI Clothes Changer Models Explained: Diffusion, Segmentation
AI clothes changer models are the systems that make realistic outfit swapping in images possible....

Dev.to · Unicorn Developer
👁️ Computer Vision
6mo ago
Computer vision for code: What PVS-Studio saw in OpenCV
What do computer vision and static analysis have in common? Both seek meaning in data. OpenCV finds...

Dev.to · Rajesh Pethe
👁️ Computer Vision
6mo ago
Building an Event-Driven OCR Service: Challenges and Solutions
Optical Character Recognition (OCR) is a powerful AI/ML technology that recognizes and extracts text...

Dev.to · MD ABUBAKAR
👁️ Computer Vision
6mo ago
How I Built a Computer Vision Chess Board Detector
I Built a Chess Scanner That Converts Any Chess Image Into a FEN + Analyzes Games Like Chess.com 👉...

Dev.to · Michal S
👁️ Computer Vision
6mo ago
Building a Unified Benchmarking Pipeline for Computer Vision — Without Rewriting Code for Every Task
This project was developed as part of the Extra-Tech Computer Vision Bootcamp, in collaboration with...

Dev.to · pranav s
👁️ Computer Vision
7mo ago
Multimodal Agents and Their Applications
Multimodal Agents and Their Applications Author: Pranav S - 2025-12-01 ...

Dev.to · Rifat
👁️ Computer Vision
7mo ago
Why this ESP32-CAM Became My New Favorite Module
For the last six months, I have been working with various AI projects, including object detection,...

Dev.to · MohammadReza Mahdian
👁️ Computer Vision
7mo ago
Build a Face Detection App with Python OOP — From Zero to Pro(part-3)
Part 3: OpenCVBase — Designing a Clean Parent Class Why Create a Base...

Dev.to · cz
👁️ Computer Vision
7mo ago
2025 Complete Guide: In-Depth Analysis of ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model
🎯 Key Takeaways (TL;DR) Lightweight & Efficient: Activates only 3B parameters while...

Dev.to · Michael G. Inso
👁️ Computer Vision
7mo ago
From Text to Live Video: How We Built a Serverless Multimodal Logistics AI on Google Cloud Run
The logistics industry runs on information. From tracking numbers on a crumpled label to complex...

Dev.to · Andres Daza
👁️ Computer Vision
7mo ago
Evita el problema N+1 en validaciones de Laravel
El problema oculto detrás de las validaciones masivas Cuando validamos arrays de datos en...

Dev.to · Akshaya Reddy Annareddy
👁️ Computer Vision
7mo ago
Real-Time Face Recognition Attendance — QR Access & Google Sheets Integration
🚀 This project automates classroom attendance using Face Recognition (MTCNN + FaceNet) integrated...

Dev.to · Dr. Carlos Ruiz Viquez
👁️ Computer Vision
8mo ago
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AS
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AST), an open-source library that shines in...

Dev.to · YK Sugi
👁️ Computer Vision
8mo ago
Daft vs Ray Data: A Comprehensive Comparison for Multimodal Data Processing
Multimodal AI workloads break traditional data engines. They need to embed documents, classify...

Dev.to · Dr. Carlos Ruiz Viquez
👁️ Computer Vision
8mo ago
**The Hidden Pitfall of Multimodal Fusion: Avoid Over-weight
The Hidden Pitfall of Multimodal Fusion: Avoid Over-weighting a Single Modality When working with...

Dev.to · Dr. Carlos Ruiz Viquez
👁️ Computer Vision
8mo ago
**The Dark Side of Computer Vision: How Adversarial Examples
The Dark Side of Computer Vision: How Adversarial Examples Can Fool Even the Most Advanced...

Dev.to · Arvind SundaraRajan
👁️ Computer Vision
8mo ago
Beats as Objects: A Computer Vision Hack for Music Analysis by Arvind Sundararajan
Beats as Objects: A Computer Vision Hack for Music Analysis \Struggling to accurately...

Dev.to · anujpatel2899
👁️ Computer Vision
8mo ago
Hello Guys Anyone working live video feed analysis using Computer vision i need help in terms of technical part looking to talk further and discuss.
A post by anujpatel2899

Dev.to · Sohan Lal
👁️ Computer Vision
8mo ago
What is a Model Serving Framework? A Simple Guide
Have you ever wondered how artificial intelligence (AI) apps work? When you use a face recognition...
DeepCamp AI