Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,365
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,220) Articles (392)Blog Posts (262)Tutorials (81)Research Papers (469)News (16)
Inside the 5-Second Facial Scan That Could Replace Your ID at the Bar
Dev.to · CaraComp 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Inside the 5-Second Facial Scan That Could Replace Your ID at the Bar
Implementing biometric verification at scale is no longer a theoretical exercise for high-security...
How Self-Driving Cars Understand Traffic: AI Vision Explained
Dev.to · SUMIT KUMAR MANDAL 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How Self-Driving Cars Understand Traffic: AI Vision Explained
🚗 How Self-Driving Cars Understand Traffic: AI Vision Explained Imagine a car that can drive itself,...
How Self-Driving Cars See the Road: Computer Vision Explained
Dev.to · TAMAL MAJI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How Self-Driving Cars See the Road: Computer Vision Explained
🚗 How Self-Driving Cars See the Road: Computer Vision Explained Imagine sitting inside a car with NO...
Building a License Plate Recognition Engine in C++ — Part 1: Image Loading and Core LPR Data Structures
Dev.to · Edward Obar Cabigting 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Building a License Plate Recognition Engine in C++ — Part 1: Image Loading and Core LPR Data Structures
In this series, I’ll build a License Plate Recognition (LPR) engine step by step in C++. The goal is...
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Your "Biometric Age Check" Isn't Verifying Identity — And Defense Lawyers Know It
Understanding the distinction between biometric age estimation and identity verification For developers in the computer vision and biometrics space, the nuance
Printsight v0.2 — Now Shows Exactly Where Your 3D Print Defects Are
Dev.to · keeper 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Printsight v0.2 — Now Shows Exactly Where Your 3D Print Defects Are
New annotated image output — red circles on stringing, yellow bands on layer issues, blue markers on warped corners
3D Print Stringing: Causes, Fixes, and How to Detect It Automatically
Dev.to · keeper 👁️ Computer Vision ⚡ AI Lesson 1mo ago
3D Print Stringing: Causes, Fixes, and How to Detect It Automatically
Complete guide to understanding and fixing 3D print stringing — from retraction tuning to automated detection with computer vision.
I Built a CLI That Detects 3D Print Defects from a Single Photo — No ML Required
Dev.to · keeper 👁️ Computer Vision ⚡ AI Lesson 1mo ago
I Built a CLI That Detects 3D Print Defects from a Single Photo — No ML Required
Printsight — detect stringing, layer issues, and warping from a photo using pure OpenCV. No training data, no GPU.
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
arXiv:2605.03650v1 Announce Type: cross Abstract: The de facto approach in video object-centric learning maintains temporal consistency through learned dynamics
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Benchmarking ResNet Backbones in RT-DETR: Impact of Depth and Regularization under environmental conditions
arXiv:2605.08136v1 Announce Type: cross Abstract: Visual perception plays a central role in competitive robotics, where environmental variations can directly af
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding
arXiv:2605.08158v1 Announce Type: cross Abstract: Long-video understanding with multimodal language models suffers from three compounding bottlenecks: heavy dec
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Digital Image Forgery Detection Using Transfer Learning
arXiv:2605.08167v1 Announce Type: cross Abstract: The increasing availability of advanced image editing tools has led to a significant rise in manipulated digit
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Optimized Culprit Identification Using Mobilenet and Attention Mechanisms
arXiv:2605.08169v1 Announce Type: cross Abstract: Automated culprit identification in surveillance systems is a critical task that requires high accuracy along
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular Pipeline
arXiv:2605.08222v1 Announce Type: cross Abstract: Handwritten archival tables contain rich historical information, yet transforming them into structured represe
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks
arXiv:2605.08325v1 Announce Type: cross Abstract: Many vision datasets now provide segmentation masks in addition to annotated images to support a wide range of
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Decoupling Endpoint and Semantic Transition Learning for Zero-Shot Composed Image Retrieval
arXiv:2605.08389v1 Announce Type: cross Abstract: Zero-shot composed image retrieval (ZS-CIR) retrieves a target image from a reference image and a text modific
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection
arXiv:2605.08651v1 Announce Type: cross Abstract: Video anomaly detection (VAD) systems often prioritize accuracy while overlooking privacy concerns, limiting t
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Control Your View: High-Resolution Global Semantic Manipulation in Learned Image Compression
arXiv:2605.08727v1 Announce Type: cross Abstract: Learned image compression (LIC) integrates deep neural networks (DNNs) to map high-dimensional images into com
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Curvature-Aware Captioning:Leveraging Geodesic Attention for 3D Scene Understanding
arXiv:2605.08808v1 Announce Type: cross Abstract: Accurate 3D scene description is fundamental to robotic navigation and augmented reality, yet current dense ca
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
DAPE: Dynamic Non-uniform Alignment and Progressive Detail Enhancement Techniques for Improving the Performance of Efficient Visual Language Models
arXiv:2605.08902v1 Announce Type: cross Abstract: In recent years, pre-trained visual-linguistic models have demonstrated tremendous potential, becoming a cruci
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Extrusion Segmentation Strategy to improve CAD Reconstruction from Point Cloud
arXiv:2605.08971v1 Announce Type: cross Abstract: Computer-Aided Design is ubiquitous in todays world, as almost every manufactured object begins as a digital m
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CT-IDP: Segmentation-Derived Quantitative Phenotypes for Interpretable Abdominal CT Disease Classification
arXiv:2605.09002v1 Announce Type: cross Abstract: In this retrospective multi-institutional study, a quantitative phenotyping framework, CT-IDP (CT Image-Derive
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Investigating Anisotropy in Visual Grounding under Controlled Counterfactual Perturbations
arXiv:2605.09090v1 Announce Type: cross Abstract: Visual Grounding benchmarks assume that the object described by a referring expression is always present in th
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Towards Robust Sequential Decomposition for Complex Image Editing
arXiv:2605.09233v1 Announce Type: cross Abstract: Recent advances in visual generative models have enabled high-fidelity image editing guided by human instructi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Perceptual Asymmetry Between Hue Categories: Evidence from Human Color Categorization
arXiv:2605.09339v1 Announce Type: cross Abstract: Human color categories are not uniformly distributed in perceptual space, yet most computational color models
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions
arXiv:2605.09538v1 Announce Type: cross Abstract: While existing methods for reconstructing hand-object interactions have made impressive progress, they either
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
S2P-Net: A Spectral-Spatial Polar Network for Rotation-Invariant Object Recognition in Low-Data Regimes
arXiv:2605.09667v1 Announce Type: cross Abstract: We present S2P-Net (Spectral-Spatial Polar Network), a compact deep learning architecture that achieves mathem
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CrossVL: Complexity-Aware Feature Routing and Paired Curriculum for Cross-View Vision-Language Detection
arXiv:2605.09802v1 Announce Type: cross Abstract: Vision-language models (VLMs) enable text-guided object detection but degrade severely under cross-view scenar
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
MoPO: Incorporating Motion Prior for Occluded Human Mesh Recovery
arXiv:2605.09856v1 Announce Type: cross Abstract: Although recent studies have made remarkable progress in human mesh recovery, they still exhibit limited robus
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding
arXiv:2605.09874v1 Announce Type: cross Abstract: Next-generation visual assistants, such as smart glasses, embodied agents, and always-on life-logging systems,
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis
arXiv:2605.09956v1 Announce Type: cross Abstract: High-quality, real-time talking head synthesis remains a fundamental challenge in computer vision. Existing re
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Geometric 4D Stitching for Grounded 4D Generation
arXiv:2605.09984v1 Announce Type: cross Abstract: Recent 4D generation methods complete scene-level missing information using generative models and reconstruct
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
HYPERPOSE: Hyperbolic Kinematic Phase-Space Attention for 3D Human Pose Estimation
arXiv:2605.10100v1 Announce Type: cross Abstract: We introduce HYPERPOSE, a novel 3D human pose estimation framework that performs spatio-temporal reasoning ent
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality
arXiv:2605.10142v1 Announce Type: cross Abstract: Artificial intelligence models are increasingly scaled to improve predictive accuracy, yet it remains unclear
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
DynGhost: Temporally-Modelled Transformer for Dynamic Ghost Imaging with Quantum Detectors
arXiv:2605.10185v1 Announce Type: cross Abstract: Ghost imaging reconstructs spatial information from a single-pixel bucket detector by correlating structured i
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition
arXiv:2605.10661v1 Announce Type: cross Abstract: Vision Transformers (ViTs) are built by stacking independently parameterized blocks, but it remains unclear ho
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning
arXiv:2605.10732v1 Announce Type: cross Abstract: Automated transit payment analysis is vital for scalable fare auditing and passenger analytics, yet practice s
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio
arXiv:2605.10780v1 Announce Type: cross Abstract: Representation autoencoders that reuse frozen pretrained vision encoders as visual tokenizers have achieved st
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection
arXiv:2605.10833v1 Announce Type: cross Abstract: Industrial anomaly detection is critical for manufacturing quality control, yet existing datasets mainly focus
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Attention-Mamba: A Mamba-Enhanced Multi-Scale Parallel Inference Network for Medical Image Segmentation
arXiv:2402.02286v4 Announce Type: replace-cross Abstract: U-shaped architectures have long dominated the field of medical image segmentation, while Transformers
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
arXiv:2505.23617v3 Announce Type: replace-cross Abstract: Effective video tokenization is critical for scaling transformer models for long videos. Current appro
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Robust Building Damage Detection in Cross-Disaster Settings Using Domain Adaptation
arXiv:2603.14694v2 Announce Type: replace-cross Abstract: Rapid structural damage assessment from remote sensing imagery is essential for timely disaster respon
Mono Sense: Building a Tesla-Inspired Monocular Perception Pipeline
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Mono Sense: Building a Tesla-Inspired Monocular Perception Pipeline
One camera feed. Real Tesla footage. A full 3D world. Continue reading on Medium »
Deploying a Real-Time Object Detection API with YOLOv8 and FastAPI
Dev.to · Lich Priest 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Deploying a Real-Time Object Detection API with YOLOv8 and FastAPI
A step‑by‑step guide to train, containerize, and serve a custom YOLOv8 model with low‑latency FastAPI endpoints, Docker, and GitHub Actions
Validating Passport Photos for 3 of the Strictest Government Portals (India, China, US)
Dev.to · whitetirocket 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Validating Passport Photos for 3 of the Strictest Government Portals (India, China, US)
Validating Passport Photos for 3 of the Strictest Government Portals (India, China,...
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
arXiv:2605.06714v1 Announce Type: cross Abstract: Edge deep learning, a paradigm change reconciling edge computing and deep learning, facilitates real-time deci
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling
arXiv:2605.06927v1 Announce Type: cross Abstract: Object detection on heterogeneous edge devices must satisfy strict energy, latency, and memory constraints whi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
DPG-CD: Depth-Prior-Guided Cross-Modal Joint 2D-3D Change Detection
arXiv:2605.07151v1 Announce Type: cross Abstract: Urban spatial evolution is manifested not only through horizontal expansion but also through vertical structur