Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,365
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,220) Articles (392)Blog Posts (262)Tutorials (81)Research Papers (469)News (16)
I Spent 6 Months Trying to See Time in Videos. Here's What Finally Worked.
Dev.to · Sourabh Joshi 👁️ Computer Vision ⚡ AI Lesson 2mo ago
I Spent 6 Months Trying to See Time in Videos. Here's What Finally Worked.
Originally published on Medium. Let me start with a confession: my first attempt at building a...
Memulai Pengolahan Citra Digital dengan Python dan scikit-image: Dari Pixel hingga Pengenalan Objek
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Memulai Pengolahan Citra Digital dengan Python dan scikit-image: Dari Pixel hingga Pengenalan Objek
Sebuah perjalanan praktis bagi pemula yang ingin masuk ke dunia computer vision tanpa harus berurusan dengan lisensi mahal atau matematika… Continue reading on
From Factory Floor to Distributed System: Engineering a Real-Time Computer Vision Backend for…
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
From Factory Floor to Distributed System: Engineering a Real-Time Computer Vision Backend for…
Imagine you are on the floor of a battery manufacturing plant. Thousands of battery covers move down a conveyor every shift, each stamped… Continue reading on M
What Re-Learning C Taught Me About the Code I Write Every Day
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
What Re-Learning C Taught Me About the Code I Write Every Day
Each weekend my younger brothers and I join a Discord call for our weekly game nights. Although the primary activity is gaming, a close… Continue reading on Cof
ROI vs AOI in Computer Vision: The Difference Between Looking and Understanding
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
ROI vs AOI in Computer Vision: The Difference Between Looking and Understanding
Abstract Continue reading on Python in Plain English »
ROI vs AOI in Computer Vision: The Difference Between Looking and Understanding
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
ROI vs AOI in Computer Vision: The Difference Between Looking and Understanding
Abstract Continue reading on Python in Plain English »
A Beginner’s Guide to Military Vehicle Detection Using YOLO and Aerial Imagery
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
A Beginner’s Guide to Military Vehicle Detection Using YOLO and Aerial Imagery
In this post, I have prepared a beginner-friendly object detection pipeline using a very basic dataset with YOLO11 and YOLO26… Continue reading on Medium »
Building an Air Canvas with MediaPipe and Turning It into 3D (My Experience)
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Building an Air Canvas with MediaPipe and Turning It into 3D (My Experience)
I started this project with a very simple goal: to draw in the air using hand gestures. Continue reading on Medium »
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Bridging the Training-Deployment Gap: Gated Encoding and Multi-Scale Refinement for Efficient Quantization-Aware Image Enhancement
arXiv:2604.21743v1 Announce Type: new Abstract: Image enhancement models for mobile devices often struggle to balance high output quality with the fast processi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts
arXiv:2604.20851v1 Announce Type: cross Abstract: Modern video-text retrieval (VTR) models excel on in-distribution benchmarks but are highly vulnerable to real
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling
arXiv:2604.21052v1 Announce Type: cross Abstract: We build on the Visual Autoregressive Modeling (VAR) framework and formulate style transfer as conditional dis
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Exploring the Role of Synthetic Data Augmentation in Controllable Human-Centric Video Generation
arXiv:2604.21291v1 Announce Type: cross Abstract: Controllable human video generation aims to produce realistic videos of humans with explicitly guided motions
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
The First Challenge on Remote Sensing Infrared Image Super-Resolution at NTIRE 2026: Benchmark Results and Method Overview
arXiv:2604.21312v1 Announce Type: cross Abstract: This paper presents the NTIRE 2026 Remote Sensing Infrared Image Super-Resolution (x4) Challenge, one of the a
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Causal Disentanglement for Full-Reference Image Quality Assessment
arXiv:2604.21654v1 Announce Type: cross Abstract: Existing deep network-based full-reference image quality assessment (FR-IQA) models typically work by performi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Efficient Logic Gate Networks for Video Copy Detection
arXiv:2604.21694v1 Announce Type: cross Abstract: Video copy detection requires robust similarity estimation under diverse visual distortions while operating at
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Addressing Image Authenticity When Cameras Use Generative AI
arXiv:2604.21879v1 Announce Type: cross Abstract: The ability of generative AI (GenAI) methods to photorealistically alter camera images has raised awareness ab
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Seeing Fast and Slow: Learning the Flow of Time in Videos
arXiv:2604.21931v1 Announce Type: cross Abstract: How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speed
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination
arXiv:2506.21546v4 Announce Type: replace-cross Abstract: Segmentation Vision-Language Models (VLMs) have significantly advanced grounded visual understanding,
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Accelerating Vision Transformers with Adaptive Patch Sizes
arXiv:2510.18091v2 Announce Type: replace-cross Abstract: Vision Transformers (ViTs) partition input images into uniformly sized patches regardless of their con
Seeing Fast and Slow: Learning the Flow of Time in Videos
Dev.to · Bongho Tae 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Seeing Fast and Slow: Learning the Flow of Time in Videos
Seeing Fast and Slow: Learning the Flow of Time in Videos Time is everywhere in video —...
Built a Gesture-Based Air Canvas Using OpenCV and MediaPipe
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Built a Gesture-Based Air Canvas Using OpenCV and MediaPipe
A real-time computer vision project for touchless drawing using hand gestures. Continue reading on Medium »
Computer Vision Isn’t a Model Problem. It’s a Lifecycle Problem.
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Computer Vision Isn’t a Model Problem. It’s a Lifecycle Problem.
Most computer vision systems don’t fail because a model is “bad.” They fail because the system wasn’t designed to actually handle reality. Continue reading on M
EP.17 | I Tore Apart the RC Car and Gave It AI Eyes
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2mo ago
EP.17 | I Tore Apart the RC Car and Gave It AI Eyes
Camera mounted on a servo motor. OpenCV detects blue. The servo follows in real time. No cloud, no server — everything runs inside the… Continue reading on Medi
Can Vision Transformers Replace CNNs (And When They Can’t)
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Can Vision Transformers Replace CNNs (And When They Can’t)
Hey there, in this article we are going to explore and understand Vision Transformer a.k.a ViT in absolute depth and also see if they can… Continue reading on M
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
DistortBench: Benchmarking Vision Language Models on Image Distortion Identification
arXiv:2604.19966v1 Announce Type: cross Abstract: Vision-language models (VLMs) are increasingly used in settings where sensitivity to low-level image degradati
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Beyond ZOH: Advanced Discretization Strategies for Vision Mamba
arXiv:2604.20606v1 Announce Type: cross Abstract: Vision Mamba, as a state space model (SSM), employs a zero-order hold (ZOH) discretization, which assumes that
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
arXiv:2506.00979v5 Announce Type: replace-cross Abstract: The rapid development of Artificial Intelligence Generated Content (AIGC) techniques has enabled the c
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis
arXiv:2510.10417v2 Announce Type: replace-cross Abstract: Gait recognition is an important biometric for human identification at a distance, particularly under
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging
arXiv:2602.07044v2 Announce Type: replace-cross Abstract: Pipeline integrity is critical to industrial safety and environmental protection, with Magnetic Flux L
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 2mo ago
DeepID-Net: multi-stage and deformable deep convolutional neural networks forobject detection
How To Auto-Detect QR Codes, Signatures, and License Plates In The Browser
Dev.to · byeval 👁️ Computer Vision ⚡ AI Lesson 2mo ago
How To Auto-Detect QR Codes, Signatures, and License Plates In The Browser
How to combine BarcodeDetector, OCR heuristics, and pixel analysis into one browser-side reviewable privacy workflow.
Relational Knowledge Distillation in 3D Point Clouds (part 2)
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Relational Knowledge Distillation in 3D Point Clouds (part 2)
Time to zoom in until the pixels turn into geometry and the geometry turns into intuition. Let’s dissect Relational Knowledge Distillation… Continue reading on
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network
arXiv:2604.19240v1 Announce Type: new Abstract: Industrial surface defect detection often suffers from limited defect samples, severe long-tailed distributions,
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive Videos
arXiv:2604.18993v1 Announce Type: cross Abstract: Perception robustness under adverse weather remains a critical challenge for autonomous driving, with the core
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
GOLD-BEV: GrOund and aeriaL Data for Dense Semantic BEV Mapping of Dynamic Scenes
arXiv:2604.19411v1 Announce Type: cross Abstract: Understanding road scenes in a geometrically consistent, scene-centric representation is crucial for planning
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
GAIR: Location-Aware Self-Supervised Contrastive Pre-Training with Geo-Aligned Implicit Representations
arXiv:2503.16683v2 Announce Type: replace-cross Abstract: Vision Transformer (ViT) has been widely used in computer vision tasks with excellent results by provi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
TFusionOcc: T-Primitive Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction
arXiv:2602.06400v2 Announce Type: replace-cross Abstract: The prediction of 3D semantic occupancy enables autonomous vehicles (AVs) to perceive the fine-grained
I Built a Video AI That Sees Like a Human - Not Like a Computer
Dev.to · hemanth kumar 👁️ Computer Vision ⚡ AI Lesson 2mo ago
I Built a Video AI That Sees Like a Human - Not Like a Computer
Most video AI works like this: Look at frame 1 → detect objects → done. Look at frame 2 → detect...
Controlling My PC with Hand Gestures: Physics, Numba, and Computer Vision
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Controlling My PC with Hand Gestures: Physics, Numba, and Computer Vision
What if you could control your mouse cursor without touching anything, but with the smoothness of a high-end gaming mouse? Continue reading on Medium »
Revolutionizing Geospatial Data: Architecting Automated and Real-Time GeoAI Pipelines
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Revolutionizing Geospatial Data: Architecting Automated and Real-Time GeoAI Pipelines
Moving beyond static GIS to build predictive, event-driven spatial systems using advanced Computer Vision, streaming data, and edge… Continue reading on DataEng
Bilgisayarın Gözleri #2 — Görüntülerin Mutfağı: Pikseller, Matrisler ve Kanallar
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Bilgisayarın Gözleri #2 — Görüntülerin Mutfağı: Pikseller, Matrisler ve Kanallar
Bir önceki bölümde görüntü işlemeye hızlı bir giriş yapmış ve OpenCV ile ilk fotoğrafımızı ekrana yansıtmıştık. “Bilgisayar görüntüyü… Continue reading on HUAWE
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Beyond Bounding Boxes: Achieving Cinematic Reframing via YOLOv11 Instance Segmentation
The transition from 16:9 landscape to 9:16 vertical video is often treated as a simple cropping problem. In most automated workflows, the… Continue reading on M
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Efficient Pipeline for Camera Trap Image Review
Computer Vision-Based Worker Safety Compliance
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Computer Vision-Based Worker Safety Compliance
How AI Is Transforming Workplace Safety in Real Time Continue reading on Medium »
Computer Vision-Based Worker Safety Compliance
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Computer Vision-Based Worker Safety Compliance
How AI Is Transforming Workplace Safety in Real Time Continue reading on Medium »
Tesseract for CAPTCHA Recognition: Not a Silver Bullet, But Effective in the Right Context
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2mo ago
Tesseract for CAPTCHA Recognition: Not a Silver Bullet, But Effective in the Right Context
Using Tesseract to verify Captcha Code Continue reading on JIN System Architect »
The Bald Head That Broke Our AI (And What It Taught Me About Building Vision Systems That Actually…
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 2mo ago
The Bald Head That Broke Our AI (And What It Taught Me About Building Vision Systems That Actually…
Why physics-constrained computer vision is the gap between a demo that impresses and a system you can trust Continue reading on Medium »
The Bald Head That Broke Our AI (And What It Taught Me About Building Vision Systems That Actually…
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2mo ago
The Bald Head That Broke Our AI (And What It Taught Me About Building Vision Systems That Actually…
Why physics-constrained computer vision is the gap between a demo that impresses and a system you can trust Continue reading on Medium »