Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,353
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,208) Articles (385)Blog Posts (260)Tutorials (78)Research Papers (469)News (16)
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
TECCI: Tricky Edits of Collected and Curated Images
arXiv:2606.01213v1 Announce Type: cross Abstract: Despite tremendous recent progress, current text-guided image editing methods still struggle with many aspects
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
RPCASSM: Robust PCA State Space Model For Infrared Small Target Detection
arXiv:2606.01689v1 Announce Type: cross Abstract: The detection and segmentation of infrared small targets have important application significance in the fields
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Understanding Identity Continuity in Thermal Video through Scene-Level Consistency
arXiv:2606.01694v1 Announce Type: cross Abstract: Thermal pedestrian MOT remains challenging because weak appearance cues and frequent detection interruptions c
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Train, Test, Re-evaluate: Schedule-Sensitive Evaluation of Generative Data for Hand Detection
arXiv:2606.01896v1 Announce Type: cross Abstract: Generated (or synthetic) image data is increasingly used to augment or replace real training datasets when tar
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Ranking vs. Assignment: The Metric Mismatch in Multi-View Object Association
arXiv:2606.02022v1 Announce Type: cross Abstract: Multi-view object association is an important computer vision problem that underlies many multi-camera percept
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image
arXiv:2606.02068v1 Announce Type: cross Abstract: Recently, novel view synthesis has witnessed remarkable progress, with mainstream methods such as Neural Radia
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection
arXiv:2606.02120v1 Announce Type: cross Abstract: In this report, we address the problem of determining whether a user performs an action incorrectly from egoce
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification
arXiv:2606.02242v1 Announce Type: cross Abstract: The joint optimization of image-based (I2I) and text-based (T2I) person re-identification (ReID) is hindered b
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Quantitative Movement Testing: Measuring Patient Movements from a Single Smartphone Video
arXiv:2606.02301v1 Announce Type: cross Abstract: Chronic pain diminishes quality of life by decreasing functional ability, yet objectively measuring this funct
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors
arXiv:2411.17790v3 Announce Type: replace-cross Abstract: Accurate 3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gas
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
A Lightweight Context-Driven Training-Free Network for Scene Text Segmentation and Recognition
arXiv:2503.15639v2 Announce Type: replace-cross Abstract: Modern scene text recognition systems often depend on large end-to-end architectures that require exte
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
A Survey of 3D Reconstruction with Event Cameras
arXiv:2505.08438v4 Announce Type: replace-cross Abstract: Event cameras are rapidly emerging as powerful vision sensors for 3D reconstruction, uniquely capable
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
arXiv:2601.00664v2 Announce Type: replace-cross Abstract: Talking head generation creates lifelike avatars from static portraits for virtual communication and c
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
arXiv:2602.08236v2 Announce Type: replace-cross Abstract: Despite rapid progress in MLLMs, visual spatial reasoning remains unreliable when correct answers depe
Building a Real-Time Fire Detection and People Counting System with InceptionV3 and OpenCV
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Building a Real-Time Fire Detection and People Counting System with InceptionV3 and OpenCV
How transfer learning and classical computer vision can work together on edge hardware to save lives Continue reading on Medium »
What Happens When There’s No Data? Lessons from Building a Real-Time Speed Detection System
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
What Happens When There’s No Data? Lessons from Building a Real-Time Speed Detection System
I am an Electrical Engineering graduate with a focus on deployable AI systems. Reading about these catastrophic numbers, according to the… Continue reading on M
Reddit r/deeplearning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Trained Ultralytics Semantic Segmentation on a Custom Crack Dataset
submitted by /u/Optimal-Length5568 [link] [comments]
My Friend Had a Cameras-On Problem. I Wrote Him a Solution.
Dev.to · Heiner 👁️ Computer Vision ⚡ AI Lesson 1mo ago
My Friend Had a Cameras-On Problem. I Wrote Him a Solution.
Originally published on my blog. GitHub: ScrumSurvivor. My Friend Had a Cameras-On Problem....
How to Migrate From Clarifai to Ximilar: Quick Start Guide
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How to Migrate From Clarifai to Ximilar: Quick Start Guide
Your drop-in replacement for custom classification, detection, and visual search. Continue reading on Medium »
Build a basic CNN Computer Vision model with Pytorch.
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Build a basic CNN Computer Vision model with Pytorch.
Computer vision is the art of teaching a computer to see. Continue reading on Medium »
Our event-camera detector lost 6 mAP to a badly chosen accumulation window
Dev.to · Marco Rinaldi 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Our event-camera detector lost 6 mAP to a badly chosen accumulation window
TL;DR: We spent three weeks chasing a 6 mAP regression in an event-camera object detector. The model...
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Best Computer Courses in Kochi With Placement Assistance
In today’s digital world, computer skills have become essential for building successful careers across industries. Whether you are a… Continue reading on Medium
Household Item Annotation Services for AI & Computer Vision
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Household Item Annotation Services for AI & Computer Vision
Artificial Intelligence systems that understand indoor environments are becoming increasingly important across industries such as real… Continue reading on Medi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
ConTrans: Learning Text-enhanced Local-global Temporal Representations for Zero-shot Temporal Action Localization
arXiv:2605.30689v1 Announce Type: cross Abstract: Zero-shot Temporal Action Localization (ZS-TAL) aims to detect and locate previously unseen actions in untrimm
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Redefining Instance Matching: A Unified Framework for Part-Aware Matching in Panoptic Segmentation Evaluation
arXiv:2605.31094v1 Announce Type: cross Abstract: The Panoptic Quality (PQ) metric is the standard for jointly evaluating instance and semantic segmentation. Ho
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
SWIM: Single-Instance Whole-Body Imitation for swiMming
arXiv:2605.31120v1 Announce Type: cross Abstract: We propose a new method for synthesizing physically-based swimming motions. Physically-based character animati
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
FOCUS: Forcing In-Context Object Localization through Visual Support Constraints and Policy Optimization
arXiv:2605.31145v1 Announce Type: cross Abstract: In-context localization (ICL) seeks to localize a target object specified by a small set of support examples i
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration
arXiv:2605.31196v1 Announce Type: cross Abstract: Safe human--robot collaboration requires more than visual description: a monitor must determine whether the ro
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Feature-Optimized Vision for Adaptive 3D Scene Reconstruction
arXiv:2605.31534v1 Announce Type: cross Abstract: Three-dimensional scene reconstruction depends on local image evidence that is both visually discriminative an
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
RayDer: Scalable Self-Supervised Novel View Synthesis from Real-World Video
arXiv:2605.31535v1 Announce Type: cross Abstract: Self-supervised novel view synthesis (NVS) remains challenging to scale, despite the abundance of video data,
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Joint angle based learning to refine kinematic human pose estimation
arXiv:2507.11075v2 Announce Type: replace-cross Abstract: Marker-free human pose estimation (HPE) has found increasing applications in various fields. Current H
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
arXiv:2510.14904v3 Announce Type: replace-cross Abstract: Dense Video Object Captioning (DVOC) is the task of jointly detecting, tracking, and captioning object
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Computer Vision
Vision Transformers Pytorch code Continue reading on Medium »
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Deepfakes Just Broke Evidence: $893M Gone, 100K Fake Images, First Arrests Land
the evolution of forensic verification in the age of generative noise For developers working in computer vision (CV) and biometrics, the news of $893M in AI-sca
Software Rendering Pipeline with Backface Culling
Dev.to · yubin yang 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Software Rendering Pipeline with Backface Culling
1. Overview In this project, I implemented a simple software renderer using Python and...
OCR Nedir ve Nasıl Kullanılır?
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
OCR Nedir ve Nasıl Kullanılır?
Hepimizin başına gelmiştir: Elimizde uzunca bir belgenin, bir faturanın veya kitaptaki harika bir alıntının fotoğrafı vardır ve o metni… Continue reading on Med
OSI Modeli: Ezberlenen 7 Katmandan Daha Fazlası
Medium · Cybersecurity 👁️ Computer Vision ⚡ AI Lesson 1mo ago
OSI Modeli: Ezberlenen 7 Katmandan Daha Fazlası
Siber güvenlik veya network dünyasına yeni giren herkesin karşısına bir noktada OSI modeli çıkar. Continue reading on Medium »
PySIFT: GPU Accelerated SIFT for Modern Era
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
PySIFT: GPU Accelerated SIFT for Modern Era
In computer vision, the Scale-Invariant Feature Transform (SIFT) algorithm remains a classic foundational standard for keypoint detection… Continue reading on M
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
From filters to feature maps: building networks that actually see. Continue reading on Medium »
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
From filters to feature maps: building networks that actually see. Continue reading on Medium »
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
From filters to feature maps: building networks that actually see. Continue reading on Medium »
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Python for Data Science & AI · Blog 18 of 20 — CNNs for Image Classification
From filters to feature maps: building networks that actually see. Continue reading on Medium »
Programación gráfica desde cero: una introducción a shaders, vértices y fragmentos
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Programación gráfica desde cero: una introducción a shaders, vértices y fragmentos
La programación gráfica permite crear imágenes mediante instrucciones ejecutadas por la tarjeta gráfica. Este campo no solo sirve para… Continue reading on Medi
Reddit r/learnprogramming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
[Question] Need arrow dataset images for shape detection project
Hi everyone, I’m working on a shape detection project where the user draws on a whiteboard/canvas, and the system converts the drawing into a detected shape. Th
Edge Detection From Scratch
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Edge Detection From Scratch
If you would like to follow along you will need Google Colab or Jupyter Notebook and these libraries: Continue reading on Medium »
New Framework Adds 3D Awareness to Video Object Tracking
Dev.to · Eli 👁️ Computer Vision ⚡ AI Lesson 1mo ago
New Framework Adds 3D Awareness to Video Object Tracking
Researchers tackle fundamental gaps in motion detection by grounding segmentation in spatiotemporal coordinates rather than relying on pre-computed 2D approxima
Reddit r/MachineLearning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Query about non-archival workshop at CVPR-2026 [R]
My paper was recently accepted to a workshop at CVPR-2026 as non-archival acceptance. Is it mandatory for me to register to the conference as I won't be able to
How Computer Vision Is Transforming Industries Around the World
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How Computer Vision Is Transforming Industries Around the World
Artificial Intelligence has made remarkable progress over the past decade, but one of the most impactful areas is computer vision. Continue reading on Medium »