Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,753
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (608) Articles (272)Blog Posts (137)Tutorials (60)Research Papers (137)News (2)
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 27m ago
Building Anime Lip Sync in ComfyUI: A Detection-Guided Diffusion Pipeline
<img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazon
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 5h ago
Membangun MataBakti: Ketika Computer Vision Belajar Menemukan Cacat pada PCB
Di dunia manufaktur elektronik, sebuah cacat kecil pada Printed Circuit Board (PCB) dapat menyebabkan kerusakan produk secara keseluruhan… Continue reading on M
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 12h ago
The Role of 3D Cuboid Annotation in Autonomous Vehicle Perception
Autonomous vehicles rely on far more than cameras and advanced algorithms to navigate safely. Their ability to recognize pedestrians, estimate vehicle distances
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 15h ago
Vision AI: Transforming Business Operations with Computer Vision AI
Every day companies make thousands of hours of video from security cameras, production lines, warehouses and stores. Most of this video is… Continue reading on
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 15h ago
Vision AI: Transforming Business Operations with Computer Vision AI
Every day companies make thousands of hours of video from security cameras, production lines, warehouses and stores. Most of this video is… Continue reading on
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 16h ago
Thinking Before Retrieving: Robust Zero-Shot Composed Image Retrieval via Strategic Planning and Self-Criticism
arXiv:2606.31222v1 Announce Type: new Abstract: Composed image retrieval requires identifying a target image from a gallery by integrating a reference image wit
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 16h ago
Learning Video Dynamics with Predictive Differentiable Rendering
arXiv:2606.31050v1 Announce Type: cross Abstract: How to accurately predict a high-fidelity future world? While the visual world is inherently continuous, exist
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 16h ago
PruneGround: Plug-and-play Spatial Pruning for 3D Visual Grounding
arXiv:2606.31148v1 Announce Type: cross Abstract: 3D Visual Grounding (3DVG) aims to localize target objects in 3D scenes given natural language descriptions. E
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 16h ago
Temporal Preservation over Processing: Diagnosing and Designing Spatiotemporal Single-Stage Video Detectors
arXiv:2606.31421v1 Announce Type: cross Abstract: Single-stage video object detectors are increasingly deployed in time-critical applications, yet it remains un
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 16h ago
Intrinsic decomposition and editing of 3D Gaussian splats
arXiv:2606.31637v1 Announce Type: cross Abstract: Intrinsic decomposition which expresses image colors as the product of diffuse albedo and shading, possibly au
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 16h ago
JL1-CC&QA: Extending the JL1-CD Benchmark with Change Captioning and Question Answering
arXiv:2606.31745v1 Announce Type: cross Abstract: Remote sensing change detection (CD) traditionally focuses on pixel-level binary segmentation, which identifie
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 16h ago
FMA-Net++: Motion- and Exposure-Aware Joint Video Super-Resolution and Deblurring
arXiv:2512.04390v2 Announce Type: replace-cross Abstract: Joint video super-resolution and deblurring (VSRDB) requires both efficient long-range temporal modeli
OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 22h ago
Cloud-Optimized OpenCV + A Special Surprise Announcement on OpenCV Live
Date & Time: Thursday, July 2nd 2026 @ 9am Pacific time Topic: Cloud-Optimized OpenCV + Special Announcement Guests: Frantz Lohier (AWS) ​Cloud Optimized OpenCV
Memory Tagging (MTE): Hardware That Catches Memory Bugs
Dev.to · Haven Messenger 👁️ Computer Vision ⚡ AI Lesson 1d ago
Memory Tagging (MTE): Hardware That Catches Memory Bugs
Year after year, the major browser and operating-system vendors report the same number: roughly 70...
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models
arXiv:2606.28696v1 Announce Type: new Abstract: Composition is a high-level visual intent that governs where subjects are placed and how a scene is organized, y
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
Fast and Accurate Outlier-Aware LiDAR Super-Resolution for SLAM Applications
arXiv:2606.28607v1 Announce Type: cross Abstract: This work tackles the challenge of enhancing low-resolution LiDAR sensors for SLAM applications through a nove
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
SemDynReg: Semantics-Guided Deformation Regularization for Dynamic 3D Gaussian Splatting
arXiv:2606.28656v1 Announce Type: cross Abstract: Deformable 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for rendering dynamic scenes in a
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
CCRC: A Change-Aware Captioning and Reasoning Chain for Image Change Captioning and Segmentation
arXiv:2606.28724v1 Announce Type: cross Abstract: Understanding and localizing subtle changes between paired images is critical for tasks such as surveillance a
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
MoPe: Motion Permanence for Robust Monocular Gaussian Mapping in Dynamic Environments
arXiv:2606.29237v1 Announce Type: cross Abstract: Robust robot autonomy depends on scene representations that remain stable enough to support localization, navi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
Confidence-feedback-weighted graph matching network: online-offline laser-induced damage site matching under complex interference
arXiv:2606.29255v1 Announce Type: cross Abstract: Online inspection images of final optics in high-power laser facilities contain pseudo-damage sites that close
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
DR-GS: Physically-Based Deformable and Relightable 2D Gaussians
arXiv:2606.29379v1 Announce Type: cross Abstract: Gaussian splatting (GS) has garnered significant attention in VR/AR and digital content creation due to its ex
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
Can Machines Really See Objects in Images? A Study Based on Syntactic Distance and Visual Self-Referential Instances
arXiv:2606.29416v1 Announce Type: cross Abstract: Can a vision model truly see an object, or does it only fit surface-level visual cues? Following Wittgenstein'
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models
arXiv:2606.29600v1 Announce Type: cross Abstract: A faithful 3D world representation should account for layered geometry, where a single camera ray may contain
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
SUMO: Segment and Track Any Motion with Nonlinear State Space Models
arXiv:2606.29861v1 Announce Type: cross Abstract: Visual Object Tracking (VOT) and Moving Object Segmentation (MOS) are two fundamental tasks in computer vision
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
IBRSteG: Learning a Generalizable Steganography Framework for 3D Gaussian Splatting
arXiv:2606.30024v1 Announce Type: cross Abstract: Recent advances in deep learning have notably improved steganographic message hiding. However, designing a gen
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
Efficient RGB-T Object Detection via Sparse Cross-Modality Fusion
arXiv:2606.30215v1 Announce Type: cross Abstract: RGB-T detectors leverage the complementary strengths of visible and thermal infrared modalities, achieving rob
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
FFAvatar: Feed-Forward 4D Head Avatar Reconstruction from Sparse Portrait Images
arXiv:2606.30347v1 Announce Type: cross Abstract: We present FFAvatar, a Transformer-based 3D Gaussian framework for fast construction of high-quality and anima
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
Beyond 2D Matching: A Unified Single-Stage Framework for Geometry-Aware Cross-View Object Geo-Localization
arXiv:2606.30576v1 Announce Type: cross Abstract: Cross-view object geo-localization (CVOGL) aims to locate a target object from a query view (e.g., ground or d
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
Pose-Based Fall Detection System: Efficient Monitoring on Standard CPUs
arXiv:2503.19501v2 Announce Type: replace-cross Abstract: Falls among elderly residents in assisted living homes pose significant health risks, often leading to
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
MGDFIS: Multi-scale Global-detail Feature Integration Strategy for Small Object Detection
arXiv:2506.12697v3 Announce Type: replace-cross Abstract: Small-object detection in Unmanned Aerial Vehicle (UAV) imagery requires preserving weak local evidenc
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
FlatLands: Generative Floormap Completion From a Single Egocentric View
arXiv:2603.16016v2 Announce Type: replace-cross Abstract: A single egocentric image typically captures only a small portion of the floor, yet a complete metric
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago
Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding
arXiv:2603.19054v2 Announce Type: replace-cross Abstract: Recent advances in Streaming Video Understanding has enabled a new interaction paradigm where models r
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1d ago
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
How our team built a real-time Computer Vision system using YOLO, OpenCV, and DeepFace to assist professional certification exams and why… Continue reading on M
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1d ago
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
How our team built a real-time Computer Vision system using YOLO, OpenCV, and DeepFace to assist professional certification exams and why… Continue reading on M
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Cybersecurity 👁️ Computer Vision ⚡ AI Lesson 1d ago
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
How our team built a real-time Computer Vision system using YOLO, OpenCV, and DeepFace to assist professional certification exams and why… Continue reading on M
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 2d ago
Your Face Is About to Become Your Phone Number
Global shift toward biometric identity verification For developers working in computer vision and biometrics, the news out of Indonesia regarding mandatory faci
July 1 — Getting Started with FiftyOne Workshop
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2d ago
July 1 — Getting Started with FiftyOne Workshop
In this session, you’ll learn how to manage large-scale computer vision datasets using open source FiftyOne app. Continue reading on Voxel51 »
TAPe+ML v2: How we make a single recognition model for any computer vision task
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2d ago
TAPe+ML v2: How we make a single recognition model for any computer vision task
In computer vision systems, a separate stack is assembled for each task: one backbone for classification, another for detection, and a… Continue reading on Medi
ETH Zurich’s bidirectional pixel could turn screens into cameras
The Next Web AI 👁️ Computer Vision ⚡ AI Lesson 2d ago
ETH Zurich’s bidirectional pixel could turn screens into cameras
A pixel has always done one job. On a screen it emits light to build a picture. In a camera it absorbs light to record one. A team in Switzerland has now made o
3D Models From Photos: The Python Stack Pros Actually Use
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2d ago
3D Models From Photos: The Python Stack Pros Actually Use
A real-world workflow professionals use to turn photos into usable 3D meshes, not just a toy demo Continue reading on CodeToDeploy »
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2d ago
Not All Relations Rotate Alike: Transformation-Aware Decoupling for Viewpoint-Robust 3D Scene Graph Generation
arXiv:2606.27412v1 Announce Type: cross Abstract: 3D Scene Graph Generation (3DSGG) represents 3D scenes as structured object-relation-object graphs, providing
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2d ago
CoIn: Comprehensive 2D-3D Inpainting with Gaussian Splatting Guidance
arXiv:2606.27584v1 Announce Type: cross Abstract: 3D scene inpainting is essential for reconstructing areas corrupted by occlusions or limited viewpoints. While
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2d ago
Every Step of the Way: Video-based Parkinsonian Turning Step Counting
arXiv:2606.27918v1 Announce Type: cross Abstract: As a prominent symptom of Parkinson's disease (PD), turning impairment is evaluated through parameters such as
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2d ago
Toward Robust In-Context Segmentation via Concept Guidance
arXiv:2606.28149v1 Announce Type: cross Abstract: In-context segmentation (ICS) requires a model to segment target regions in a query image using only a few ref
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2d ago
PreferThinker: Reasoning-based Personalized Image Preference Assessment
arXiv:2511.00609v4 Announce Type: replace Abstract: Personalized image preference assessment aims to evaluate an individual user's image preferences by relying
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2d ago
SRMA-Mamba: Spatial Reverse Mamba Attention Network for Pathological Liver Segmentation in MRI Volumes
arXiv:2508.12410v3 Announce Type: replace-cross Abstract: Liver cirrhosis plays a critical role in the prognosis of chronic liver disease. Early detection and t
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2d ago
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models
arXiv:2602.10179v2 Announce Type: replace-cross Abstract: Recent advances in large image editing models have shifted the paradigm from text-driven instructions
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 3d ago
Your Bank Says You're Not You. Now What?
biometric scaling challenges in high-stakes environments South Africa is currently executing one of the most aggressive biometric rollouts in the Southern Hemis