Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,353
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,208) Articles (385)Blog Posts (260)Tutorials (78)Research Papers (469)News (16)
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Efficient Feature-Free Initialization for Monocular Visual-Inertial Systems Using a Feed-Forward 3D Model
arXiv:2605.17327v1 Announce Type: cross Abstract: Fast and reliable initialization is critical for monocular visual-inertial navigation systems (VINS), as it es
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Why We Look Where We Look: Emergent Human-like Fixations of a Foveated Visual Language Model Maximizing Scene Understanding
arXiv:2605.17823v1 Announce Type: cross Abstract: When humans view scenes without a specific task (free-viewing), they initially direct their eye movements towa
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Temporal Aware Pruning for Efficient Diffusion-based Video Generation
arXiv:2605.17837v1 Announce Type: cross Abstract: Video diffusion models have recently enabled high-quality video generation with ViT-based architectures, but r
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Who Generated This 3D Asset? Learning Source Attribution for Generative 3D Models
arXiv:2605.18132v1 Announce Type: cross Abstract: Generative 3D models are deployed in gaming, robotics, and immersive creation, making source attribution criti
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation
arXiv:2605.18184v1 Announce Type: cross Abstract: Commonly available prior information, such as BIM models, floor plans, and remote sensing images, can provide
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Optimising CSRNet with parameter-free attention mechanisms for crowd counting in public transport
arXiv:2605.18349v1 Announce Type: cross Abstract: Occupancy estimation and crowd counting are critical tasks in designing smart and efficient public transport v
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Towards Ubiquitous Mapping and Localization for Dynamic Indoor Environments
arXiv:2605.18385v1 Announce Type: cross Abstract: We present UbiSLAM, an innovative solution for real-time mapping and localization in dynamic indoor environmen
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Adaptive Camera Sensor for Vision Models
arXiv:2503.02170v3 Announce Type: replace-cross Abstract: Domain shift remains a persistent challenge in deep-learning-based computer vision, often requiring ex
Building Samaritan: A Multi-Camera Real-Time Face Recognition System in Python — Part 5
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Building Samaritan: A Multi-Camera Real-Time Face Recognition System in Python — Part 5
Improve Python face recognition speed with frame skipping, IoU face tracking, and smooth real-time identity continuity. Continue reading on Medium »
Como o pensamento computacional me ajudou a estruturar minhas entregas
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Como o pensamento computacional me ajudou a estruturar minhas entregas
Há um bom tempo venho tentando entrar, bem aos poucos, no mundo da programação. Continue reading on Tatiane Marina »
Manchester Code Made Bits Behave
IEEE Spectrum 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Manchester Code Made Bits Behave
In the late 1940s—when computer engineers were grappling with unreliable hardware and noisy transmission environments—a team of engineers inside a modest lab at
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Why Your Image Upload Pipeline Should Check for Physically Impossible Lighting
Why Your Image Upload Pipeline Should Check for Physically Impossible Lighting If you're building user-generated content platforms, marketplace verification sys
Rasterization Using Bresenham Algorithm and Scanline Algorithm
Dev.to · yubin yang 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Rasterization Using Bresenham Algorithm and Scanline Algorithm
1. Overview Bresenham algorithm is the fastest algorithm for drawing straight lines on a...
Detect Faces in Any Image Using Python & OpenCV — A Complete Beginner’s Guide
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Detect Faces in Any Image Using Python & OpenCV — A Complete Beginner’s Guide
Learn how face detection works under the hood, line by line, with real code you can run today. Continue reading on Medium »
Detect Faces in Any Image Using Python & OpenCV — A Complete Beginner’s Guide
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Detect Faces in Any Image Using Python & OpenCV — A Complete Beginner’s Guide
Learn how face detection works under the hood, line by line, with real code you can run today. Continue reading on Medium »
How We Engineered the AI Logic for “Raijin”: The 1st Place Autonomous Car at TMR 2026
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How We Engineered the AI Logic for “Raijin”: The 1st Place Autonomous Car at TMR 2026
Leveraging computer vision and neural networks to drive victory for team Troyan Robotics under intense environmental pressure Continue reading on Medium »
OCR Intelligente per Documenti Aziendali: Architettura e Lezioni dal Campo
Dev.to · Alessandro Binda 👁️ Computer Vision ⚡ AI Lesson 1mo ago
OCR Intelligente per Documenti Aziendali: Architettura e Lezioni dal Campo
L'OCR (Optical Character Recognition) per testo stampato moderno è un problema risolto da decenni....
Computer Vision Yolculuğu — Gün 2: OpenCV ile Frame Üzerine Çizim Yapmak
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Computer Vision Yolculuğu — Gün 2: OpenCV ile Frame Üzerine Çizim Yapmak
Computer Vision projelerinde kameradan görüntü almak yalnızca ilk adımdır. Gerçek sistemlerde asıl önemli nokta, alınan frame’lerin… Continue reading on Medium
I Took a Month Off After My BCA. The World Didn’t Wait — But It’s Not Too Late.
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
I Took a Month Off After My BCA. The World Didn’t Wait — But It’s Not Too Late.
A brutally honest guide for every Computer Application graduate who blinked and found themselves in the middle of the greatest… Continue reading on Readers Club
Who Really Deserves To Be Called The Father Of The Internet
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Who Really Deserves To Be Called The Father Of The Internet
From ARPANET to the World Wide Web the Internet was built by a network of pioneers not one inventor Continue reading on IT Chronicles »
Controlled My Computer Volume Using Only Hand Gestures (Python + OpenCV)
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Controlled My Computer Volume Using Only Hand Gestures (Python + OpenCV)
Build a Real-Time Computer Vision Project That Actually Feels Futuristic Continue reading on Medium »
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Building a License Plate Recognition Engine in C++ — Part 2: Grayscale Image Preprocessing and…
In the previous article, we loaded an image, converted it into grayscale, and introduced the core data structures used by the recognition… Continue reading on M
Why Your Computer Reads Numbers Backwards: Byte Order Explained
Dev.to · hassaan-syed 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Why Your Computer Reads Numbers Backwards: Byte Order Explained
What is Byte Order? Before understanding byte order, we need to understand one thing: A byte = 8...
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
High Speed and Performance
High Speed and Performance C language is very fast because it is a compiled language. It converts code directly into machine language, so programs run quickly a
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
FactorizedHMR: A Hybrid Framework for Video Human Mesh Recovery
arXiv:2605.14854v1 Announce Type: cross Abstract: Human Mesh Recovery (HMR) is fundamentally ambiguous: under occlusion or weak depth cues, multiple 3D bodies c
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image
arXiv:2605.14984v1 Announce Type: cross Abstract: Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current me
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Quantitative Video World Model Evaluation for Geometric-Consistency
arXiv:2605.15185v1 Announce Type: cross Abstract: Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Building a License Plate Recognition Engine in C++ — Part 2: Grayscale Image Preprocessing and Local Contrast Edge Detection
In the previous article, we loaded an image, converted it into grayscale, and introduced the core data structures used by the recognition engine. In this part,
Every Photo You’ve Taken Is a Lie — Here’s the Math That Reconstructs It
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Every Photo You’ve Taken Is a Lie — Here’s the Math That Reconstructs It
From Bayer mosaics to bilinear demosaicing — what happens between photons hitting a sensor and pixels appearing on screen. Part 1 of the… Continue reading on Me
Deepfakes Fooled Your Eyes. They Can't Fool Geometry.
Dev.to · CaraComp 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Deepfakes Fooled Your Eyes. They Can't Fool Geometry.
analyzing the geometric inconsistencies in synthetic imagery For developers in the computer vision...
Recurrent Vision: Seeing Like Humans, Not Just Processing Like Machines
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Recurrent Vision: Seeing Like Humans, Not Just Processing Like Machines
Picture yourself walking into a darkened room. You don’t really get to see what’s inside the first time you look. Continue reading on Medium »
Inside SAM 3D: how Meta turns a single image into 3D
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Inside SAM 3D: how Meta turns a single image into 3D
For about forty years, “3D” in the practical sense meant one thing: triangle meshes. Every game shipped, every animated film rendered… Continue reading on Mediu
Inside SAM 3D: how Meta turns a single image into 3D
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Inside SAM 3D: how Meta turns a single image into 3D
For about forty years, “3D” in the practical sense meant one thing: triangle meshes. Every game shipped, every animated film rendered… Continue reading on Mediu
Demystifying CNNs: How Convolutional Filters and Max-Pooling Actually Work
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Demystifying CNNs: How Convolutional Filters and Max-Pooling Actually Work
If you’ve ever wondered how a computer can look at a photo of a car and instantly know it’s a car, you’re looking at the magic of… Continue reading on Medium »
Why 8GB Might Still Be Enough on Apple’s New MacBook Neo (and When It Isn’t)
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Why 8GB Might Still Be Enough on Apple’s New MacBook Neo (and When It Isn’t)
Apple keeps selling 8 GB Macs in 2026 and calling them “modern.” On paper that sounds ridiculous, yet the real world is more nuanced than… Continue reading on M
Why 8GB Might Still Be Enough on Apple’s New MacBook Neo (and When It Isn’t)
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Why 8GB Might Still Be Enough on Apple’s New MacBook Neo (and When It Isn’t)
Apple keeps selling 8 GB Macs in 2026 and calling them “modern.” On paper that sounds ridiculous, yet the real world is more nuanced than… Continue reading on M
Inside the 5-Second Facial Scan That Could Replace Your ID at the Bar
Dev.to · CaraComp 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Inside the 5-Second Facial Scan That Could Replace Your ID at the Bar
Implementing biometric verification at scale is no longer a theoretical exercise for high-security...
How Self-Driving Cars Understand Traffic: AI Vision Explained
Dev.to · SUMIT KUMAR MANDAL 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How Self-Driving Cars Understand Traffic: AI Vision Explained
🚗 How Self-Driving Cars Understand Traffic: AI Vision Explained Imagine a car that can drive itself,...
How Self-Driving Cars See the Road: Computer Vision Explained
Dev.to · TAMAL MAJI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How Self-Driving Cars See the Road: Computer Vision Explained
🚗 How Self-Driving Cars See the Road: Computer Vision Explained Imagine sitting inside a car with NO...
Building a License Plate Recognition Engine in C++ — Part 1: Image Loading and Core LPR Data Structures
Dev.to · Edward Obar Cabigting 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Building a License Plate Recognition Engine in C++ — Part 1: Image Loading and Core LPR Data Structures
In this series, I’ll build a License Plate Recognition (LPR) engine step by step in C++. The goal is...
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Your "Biometric Age Check" Isn't Verifying Identity — And Defense Lawyers Know It
Understanding the distinction between biometric age estimation and identity verification For developers in the computer vision and biometrics space, the nuance
Printsight v0.2 — Now Shows Exactly Where Your 3D Print Defects Are
Dev.to · keeper 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Printsight v0.2 — Now Shows Exactly Where Your 3D Print Defects Are
New annotated image output — red circles on stringing, yellow bands on layer issues, blue markers on warped corners
3D Print Stringing: Causes, Fixes, and How to Detect It Automatically
Dev.to · keeper 👁️ Computer Vision ⚡ AI Lesson 1mo ago
3D Print Stringing: Causes, Fixes, and How to Detect It Automatically
Complete guide to understanding and fixing 3D print stringing — from retraction tuning to automated detection with computer vision.
I Built a CLI That Detects 3D Print Defects from a Single Photo — No ML Required
Dev.to · keeper 👁️ Computer Vision ⚡ AI Lesson 1mo ago
I Built a CLI That Detects 3D Print Defects from a Single Photo — No ML Required
Printsight — detect stringing, layer issues, and warping from a photo using pure OpenCV. No training data, no GPU.
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
arXiv:2605.03650v1 Announce Type: cross Abstract: The de facto approach in video object-centric learning maintains temporal consistency through learned dynamics
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Benchmarking ResNet Backbones in RT-DETR: Impact of Depth and Regularization under environmental conditions
arXiv:2605.08136v1 Announce Type: cross Abstract: Visual perception plays a central role in competitive robotics, where environmental variations can directly af
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding
arXiv:2605.08158v1 Announce Type: cross Abstract: Long-video understanding with multimodal language models suffers from three compounding bottlenecks: heavy dec
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Digital Image Forgery Detection Using Transfer Learning
arXiv:2605.08167v1 Announce Type: cross Abstract: The increasing availability of advanced image editing tools has led to a significant rise in manipulated digit