Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,365
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,220) Articles (392)Blog Posts (262)Tutorials (81)Research Papers (469)News (16)
Computer Vision Software Development: Applications, Benefits, and Use Cases
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Computer Vision Software Development: Applications, Benefits, and Use Cases
Build intelligent visual systems with advanced Computer Vision Software Development to automate processes, enhance accuracy, and unlock… Continue reading on Med
Sengaja “Merusak” Gambar demi Ilmu: Eksperimen Noise pada Citra Digital
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Sengaja “Merusak” Gambar demi Ilmu: Eksperimen Noise pada Citra Digital
Bagaimana menambahkan gangguan buatan ke gambar bisa menjadi langkah paling penting sebelum komputer belajar “melihat”. Continue reading on Medium »
Edge AI Camera Design: Integrating Vision at the Edge
Dev.to · Silicon Signals 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Edge AI Camera Design: Integrating Vision at the Edge
Rethinking Cameras The conventional camera was meant to record and store video content....
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
BifDet: A 3D Bifurcation Detection Dataset for Airway-Tree Modeling
arXiv:2604.24999v1 Announce Type: cross Abstract: Thoracic Computed Tomography (CT) scans offer detailed insights into the intricate branching network of the ai
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts
arXiv:2604.25799v1 Announce Type: cross Abstract: The convergence of accelerating human spaceflight ambitions and critical terrestrial health monitoring demands
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
No Pedestrian Left Behind: Real-Time Detection and Tracking of Vulnerable Road Users for Adaptive Traffic Signal Control
arXiv:2604.25887v1 Announce Type: cross Abstract: Current pedestrian crossing signals operate on fixed timing without adjustment to pedestrian behavior, which c
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization
arXiv:2410.24116v3 Announce Type: replace-cross Abstract: Image labeling is a critical bottleneck in the development of computer vision technologies, often cons
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
OmniAlpha: Aligning Transparency-Aware Generation via Multi-Task Unified Reinforcement Learning
arXiv:2511.20211v2 Announce Type: replace-cross Abstract: Transparency-aware generation requires modeling not only RGB appearance but also alpha-based opacity a
Building Samaritan: A Multi-Camera Real-Time Face Recognition System in Python — Part 2
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Building Samaritan: A Multi-Camera Real-Time Face Recognition System in Python — Part 2
Build real-time face recognition in Python with OpenCV, DeepFace, ArcFace embeddings, and live webcam-based identity matching. Continue reading on Medium »
April 30 - Best of WACV 2026 (Day 1)
Dev.to · Jimmy Guerrero 👁️ Computer Vision ⚡ AI Lesson 2mo ago
April 30 - Best of WACV 2026 (Day 1)
Join us on April 30 for day one of the Best of WACV 2026 series of virtual events. Register for...
Computer Vision–Based Injury Detection and First-Aid Guidance System
Dev.to · GANGIREDDIGARI MITHUN PRAKASH REDDY 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Computer Vision–Based Injury Detection and First-Aid Guidance System
Introduction In today’s fast-paced world, getting quick medical guidance for minor skin...
The Limits of Image Reconstruction in Low-SNR Settings
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
The Limits of Image Reconstruction in Low-SNR Settings
How ambiguity and noise lead to structured simplification Continue reading on Medium »
The Limits of Image Reconstruction in Low-SNR Settings
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
The Limits of Image Reconstruction in Low-SNR Settings
How ambiguity and noise lead to structured simplification Continue reading on Medium »
The Verge 👁️ Computer Vision ⚡ AI Lesson 2mo ago
The resurrected Commodore 64 is getting a facelift like the original
The creators of the C64 Ultimate, a recreation of the iconic '80s personal computer that uses an FPGA chip to accurately replicate the original, have announced
Customized Object Detection Using Multi-Frame Analysis
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Customized Object Detection Using Multi-Frame Analysis
Enhances object detection using multi-frame analysis and representative frame selection. Continue reading on Tiny Prism Labs Private Limited »
Image Classification for AI: A Practical Guide for 2026
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Image Classification for AI: A Practical Guide for 2026
Practical guide to image classification for AI: learn how to manage datasets, ensure accuracy, and scale your computer vision projects. Continue reading on Medi
Before AI Sees, Optics Decide
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Before AI Sees, Optics Decide
Why optical design determines machine vision performance Continue reading on Medium »
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
WeatherSeg: Weather-Robust Image Segmentation using Teacher-Student Dual Learning and Classifier-Updating Attention
arXiv:2604.22824v2 Announce Type: cross Abstract: WeatherSeg, an advanced semi-supervised segmentation framework, addresses autonomous driving's environmental p
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
MetaEarth3D: Unlocking World-scale 3D Generation with Spatially Scalable Generative Modeling
arXiv:2604.22828v1 Announce Type: cross Abstract: Recent generative AI models have achieved remarkable breakthroughs in language and visual understanding. Howev
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
OAMVOS:2nd Report for 5th PVUW MOSE Track
arXiv:2604.22837v1 Announce Type: cross Abstract: SAM-based dense trackers provide strong short-term mask propagation but remain fragile under long occlusion, f
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation
arXiv:2604.22839v1 Announce Type: cross Abstract: Precise Event Spotting (PES) is essential in fast-paced sports such as tennis, where fine-grained events occur
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Probing Visual Planning in Image Editing Models
arXiv:2604.22868v1 Announce Type: cross Abstract: Visual planning represents a crucial facet of human intelligence, especially in tasks that require complex spa
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Hard to See, Hard to Label: Generative and Symbolic Acquisition for Subtle Visual Phenomena
arXiv:2604.22990v2 Announce Type: cross Abstract: Subtle visual anomalies such as hairline cracks, sub-millimeter voids, and low-contrast inclusions are structu
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
DeepSignature: Digitally Signed, Content-Encoding Watermarks for Robust and Transparent Image Authentication
arXiv:2604.23016v1 Announce Type: cross Abstract: AI-powered generative models have significantly expanded the possibilities for editing, manipulating, and crea
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Sphere-Depth: A Benchmark for Depth Estimation Methods with Varying Spherical Camera Orientations
arXiv:2604.23432v1 Announce Type: cross Abstract: Reliable depth estimation from spherical images is crucial for 360{\deg} vision in robotic navigation and imme
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model
arXiv:2604.23532v1 Announce Type: cross Abstract: Short-term human pose prediction plays a crucial role in interactive systems, assistive robots, and emotion-aw
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
ResAF-Net: An Anchor-Free Attention-Based Network for Tree Detection and Agricultural Mapping in Palestine
arXiv:2604.23653v1 Announce Type: cross Abstract: Reliable agricultural data is essential for food security, land-use planning, and economic resilience, yet in
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Zoom In, Reason Out: Efficient Far-field Anomaly Detection in Expressway Surveillance Videos via Focused VLM Reasoning Guided by Bayesian Inference
arXiv:2604.23724v2 Announce Type: cross Abstract: Expressway video anomaly detection is essential for safety management. However, identifying anomalies across d
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Mapping License Plate Recoverability Under Extreme Viewing Angles for Oppor-tunistic Urban Sensing
arXiv:2604.23814v1 Announce Type: cross Abstract: Urban environments contain many imaging sensors built for specific purposes, including ATM, body-worn, CCTV, a
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Viewport-Unaware Blind Omnidirectional Image Quality Assessment: A Unified and Generalized Approach
arXiv:2604.23953v1 Announce Type: cross Abstract: Blind omnidirectional image quality assessment (BOIQA) presents a great challenge to the visual quality assess
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Unconstrained Multi-view Human Pose Estimation with Algebraic Priors
arXiv:2604.24312v1 Announce Type: cross Abstract: Recovering 3D human pose from multi-view imagery typically relies on precise camera calibration, which is ofte
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review
arXiv:2501.13400v3 Announce Type: replace-cross Abstract: In the field of deep learning-based computer vision, YOLO is revolutionary. With respect to deep learn
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning
arXiv:2506.05425v3 Announce Type: replace-cross Abstract: Understanding social interaction, which encompasses perceiving numerous and subtle multimodal cues, in
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
What Drives Compositional Generalization? The Importance of Continuous Training Objectives in Visual Generative Models
arXiv:2510.03075v3 Announce Type: replace-cross Abstract: Compositional generalization, the ability to generate novel combinations of known concepts, is a key i
Building Samaritan: A Multi-Camera Real-Time Face Recognition System in Python — Part 1
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Building Samaritan: A Multi-Camera Real-Time Face Recognition System in Python — Part 1
Build Samaritan, a Python real-time face recognition system using OpenCV, DeepFace, ArcFace, and multi-camera support. Continue reading on Medium »
YOLOv8 vs RF-DETR: Which Object Detector Should You Use?
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
YOLOv8 vs RF-DETR: Which Object Detector Should You Use?
Real-world evidence from the Waymo Open Dataset Continue reading on Medium »
The First Program Was Not Just Code
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2mo ago
The First Program Was Not Just Code
From algebra to execution: what the first program actually describes Continue reading on Level Up Coding »
Building OCR Solutions That Actually Work in Production (Not Just Demos)
Dev.to · Dixit Angiras 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Building OCR Solutions That Actually Work in Production (Not Just Demos)
Most developers have tried OCR at some point. You pick a library, run it on a PDF, extract text… and...
From Pixels to Production: I Tried FastAPI vs NVIDIA Triton for CV Inference… and the Results…
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
From Pixels to Production: I Tried FastAPI vs NVIDIA Triton for CV Inference… and the Results…
Why your simple model.predict() is not enough when real users start hitting your system Continue reading on Medium »
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
EgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms
arXiv:2604.22036v1 Announce Type: cross Abstract: This paper introduces EgoMAGIC (Medical Assistance, Guidance, Instruction, and Correction), an egocentric medi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
GenMatter: Perceiving Physical Objects with Generative Matter Models
arXiv:2604.22160v1 Announce Type: cross Abstract: Human visual perception offers valuable insights for understanding computational principles of motion-based sc
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
From Global to Local: Rethinking CLIP Feature Aggregation for Person Re-Identification
arXiv:2604.22190v1 Announce Type: cross Abstract: CLIP-based person re-identification (ReID) methods aggregate spatial features into a single global \texttt{[CL
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
OREN: Octree Residual Network for Real-Time Euclidean Signed Distance Mapping
arXiv:2510.18999v2 Announce Type: replace-cross Abstract: Reconstructing signed distance functions (SDFs) from point cloud data benefits many robot autonomy cap
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Lifting Unlabeled Internet-level Data for 3D Scene Understanding
arXiv:2604.01907v2 Announce Type: replace-cross Abstract: Annotated 3D scene data is scarce and expensive to acquire, while abundant unlabeled videos are readil
Is career in computer vision engineering a Dead-end ?
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Is career in computer vision engineering a Dead-end ?
Until end of last year, despite LLMs on track for becoming world class SWE, I was still fairly confident about job security as a computer… Continue reading on M
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 2mo ago
AI photo tagging app
Introducing a newly released AI photo tagging app for the iphone. More details on our website ( https://siwave.io ) and a link to the kickstarter project. We we
Fine-tuning BLIP2 for Prompt-instructed Video Classification
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Fine-tuning BLIP2 for Prompt-instructed Video Classification
Video understanding remains one of the most challenging frontiers in computer vision. Unlike static images, videos exhibit rich temporal… Continue reading on To
Real-Time Face Liveness in React Native: Vision Camera, Worklets, and ML Kit
Dev.to · Deen Jimoh 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Real-Time Face Liveness in React Native: Vision Camera, Worklets, and ML Kit
If you’ve ever shipped a KYC, onboarding, or account-recovery flow, you’ve run into the liveness...