Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,357
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,212) Articles (388)Blog Posts (260)Tutorials (79)Research Papers (469)News (16)
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Zero-Shot Test-Time Canonicalization using Out-of-Distribution Scoring
arXiv:2606.24178v1 Announce Type: cross Abstract: Pretrained vision models often misclassify inputs that are rotated, scaled, or sheared, even though these affi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
FLUX3D: High-Fidelity 3D Gaussian Generation with Diffusion-Aligned Sparse Representation
arXiv:2606.24874v1 Announce Type: cross Abstract: Sparse voxel representation has emerged as a scalable foundation for image-to-3D Gaussian Splatting (3DGS) gen
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 1w ago
Real-Time Gesture Controlled Invisibility Cloak in Python (MediaPipe & OpenCV)
Build a Real-Time Gesture Controlled Invisibility Cloak using Python, MediaPipe, and OpenCV! 🪄💻 In this computer vision tutorial, we are creating an interacti
How I Built a Face Detection Web App in Python in Under 20 Minutes (And You Can Too)
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
How I Built a Face Detection Web App in Python in Under 20 Minutes (And You Can Too)
A step-by-step guide to deploying your first Computer Vision AI project using OpenCV and Streamlit. Continue reading on Medium »
How I Built a Face Detection Web App in Python in Under 20 Minutes (And You Can Too)
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1w ago
How I Built a Face Detection Web App in Python in Under 20 Minutes (And You Can Too)
A step-by-step guide to deploying your first Computer Vision AI project using OpenCV and Streamlit. Continue reading on Medium »
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Here’s What Actually Worked Continue reading on Medium »
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1w ago
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Here’s What Actually Worked Continue reading on Medium »
What I Learned About 3D Reconstruction — My First Week at PreserveMy.World
Dev.to · humnaattique4-sys 👁️ Computer Vision ⚡ AI Lesson 1w ago
What I Learned About 3D Reconstruction — My First Week at PreserveMy.World
I just started my internship at PreserveMy.World, a project focused on digitally preserving cultural...
Splitting Face Recognition Across the Edge and the Cloud with AWS IoT Greengrass + Lambda
Dev.to · Saurin Prajapati 👁️ Computer Vision ⚡ AI Lesson 1w ago
Splitting Face Recognition Across the Edge and the Cloud with AWS IoT Greengrass + Lambda
How I built a real-time face recognition pipeline that detects faces at the edge on AWS IoT Greengrass and recognizes them serverlessly with Lambda, glued toget
A VLM gate for generated images, with provider failover via Bifrost
Dev.to · Elise Moreau 👁️ Computer Vision ⚡ AI Lesson 1w ago
A VLM gate for generated images, with provider failover via Bifrost
TL;DR: At Photoroom we run a vision-language model as the last check before a generated product image...
I entered a competition to track objects in light you can't see
Dev.to · Alan Scott Encinas 👁️ Computer Vision ⚡ AI Lesson 1w ago
I entered a competition to track objects in light you can't see
The first entry in a live builder's log. I'm competing in the Hyperspectral Object Tracking Challenge 2026: track one object through video shot in colors the hu
Tiled-MRPNN — real-time photorealistic light transport in participating media
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
Tiled-MRPNN — real-time photorealistic light transport in participating media
Historically, real-time light transport in participating media has mainly been handled with simplified physics simulations and crude… Continue reading on Medium
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Automatic Vehicle Detection using DETR: A Transformer-Based Approach for Navigating Treacherous Roads
arXiv:2502.17843v1 Announce Type: cross Abstract: Automatic Vehicle Detection (AVD) in diverse driving environments presents unique challenges due to varying li
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
GEOPHYS: The Geometry of Physical Plausibility
arXiv:2606.20707v1 Announce Type: cross Abstract: While humans can identify physically implausible events within milliseconds, machine learning approaches addre
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
XmoPipe: A Pipeline for Large-Scale In-the-Wild Human Motion Dataset Construction
arXiv:2606.20731v1 Announce Type: cross Abstract: Large-scale human motion datasets are essential for training robust motion models for analysis, synthesis, and
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
UniSLAD: A Unified Framework for Structural and Logical Industrial Visual Anomaly Detection
arXiv:2606.20768v1 Announce Type: cross Abstract: Visual anomaly detection is a fundamental task in industrial automation. While existing approaches have achiev
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
GroundShot: Visually Consistent Multi-Shot Long Video Generation via Entity-Grounded Shot Scheduling
arXiv:2606.20799v1 Announce Type: cross Abstract: Generating visually consistent multi-shot videos remains an open challenge. As videos span more shots, inconsi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
MS-rPPG: Multi-spectral State Space Model for Remote Photoplethysmography in Driver Monitoring Systems
arXiv:2606.21115v1 Announce Type: cross Abstract: Remote photoplethysmography (rPPG) is a camera-based technique for measuring physiological signals, particular
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
ACE-GS: Acing the Trade-off with Accurate, Compact and Efficient 3D Gaussian Splatting
arXiv:2606.21244v1 Announce Type: cross Abstract: 3D Gaussian Splatting achieves exceptional real-time rendering, but its substantial computational and storage
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Scene-Level Heterogeneous Physics Simulation with 3D Gaussian Splats
arXiv:2606.21753v1 Announce Type: cross Abstract: 3D Gaussian Splatting (3DGS) has achieved state-of-the-art photorealistic rendering, but the representation ga
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Dual-Stream EEG Decoding for 3D Visual Perception
arXiv:2606.22182v1 Announce Type: cross Abstract: This paper explores a novel brain decoding model for 3D shape perception through a dual pathway architecture m
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
DreamUV: Unwrap Artist-like UV by End-to-End Flow Matching
arXiv:2606.22445v1 Announce Type: cross Abstract: UV parameterization is a fundamental step in 3D content creation, yet producing production-ready UV layouts re
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Training-Free Semantic Correction for Autoregressive Visual Models
arXiv:2606.22550v1 Announce Type: cross Abstract: Autoregressive visual models (AVMs) based on next-scale prediction have emerged as a prominent paradigm for im
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
The Power of Light: Improving Synthetic-to-Real Domain Adaptation through Physically-Based Indirect Illumination
arXiv:2606.22574v1 Announce Type: cross Abstract: While synthetic data generation resolves the manual labeling bottleneck in computer vision, minimizing the syn
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
DBT-Bleed: Dual-Branch Temporal Modeling with Key-Frame Selection for Surgical Bleeding Detection
arXiv:2606.22829v1 Announce Type: cross Abstract: Intraoperative Adverse Events (IAEs) detection is critical for improving surgical safety, with bleeding being
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
OrthoMotion:Disentangling Camera and Subject Motion via Geometry Semantics Orthogonal Attention
arXiv:2606.22835v1 Announce Type: cross Abstract: Controllable video generation demands independent command of the camera and the subject, yet 2D conditioning e
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
MotionHalluc: Diagnosing Kinematic Hallucinations in Fine-Grained Motion Reasoning
arXiv:2606.23061v1 Announce Type: cross Abstract: Motion instruction generation in cross-video comparison aims to produce corrective feedback that describes the
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Rethinking Object-Centric Representations for Video Dynamics Modeling
arXiv:2606.23436v1 Announce Type: cross Abstract: Unsupervised video object tracking aims to decompose dynamic scenes into persistent, object-centric entities w
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering
arXiv:2505.17338v2 Announce Type: replace-cross Abstract: Photorealistic volumetric rendering of CT scans greatly benefits clinical workflows, yet neural approa
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
HaineiFRDM: Structure-Preserving Diffusion for Film Restoration under Fast Motion and Diverse Defects
arXiv:2512.24946v2 Announce Type: replace-cross Abstract: Existing film-restoration methods frequently fail under fast motion, producing limb disappearance and
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
CAOA -- Completion-Assisted Object-CAD Alignment
arXiv:2606.18429v2 Announce Type: replace-cross Abstract: Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central chall
SmartDefectAI: Industrial Surface Defect Detection using Vision Transformers and Hybrid…
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
SmartDefectAI: Industrial Surface Defect Detection using Vision Transformers and Hybrid…
Computer Vision, CNN, EfficientNet, Vision Transformer (ViT), Deep Learning, Attention Mechanism, Transfer Learning, OpenCV, TensorFlow… Continue reading on Med
How AI-Powered Computer Vision Is Transforming Retail Compliance
Forbes Innovation 👁️ Computer Vision ⚡ AI Lesson 1w ago
How AI-Powered Computer Vision Is Transforming Retail Compliance
In industries where compliance violations can directly impact customer health, this becomes incredibly important.
RF-DETR: A Smaller Model That Beats the Biggest YOLO
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
RF-DETR: A Smaller Model That Beats the Biggest YOLO
How Roboflow’s RF-DETR transformer beats YOLO11 on COCO, skips NMS, and where the benchmark numbers deserve a closer look. Continue reading on Towards Deep Lear
Your Car's Paint Has a Cache Invalidation Problem — Here Is What That Means in Jaipur
Dev.to · CarCare 👁️ Computer Vision ⚡ AI Lesson 1w ago
Your Car's Paint Has a Cache Invalidation Problem — Here Is What That Means in Jaipur
cache invalidation is one of the genuinely hard problems in computer science, mostly because the...
Reddit r/MachineLearning 👁️ Computer Vision ⚡ AI Lesson 1w ago
[ECCV 2026] Paper Decision Appeals Discussion [D]
With the release of meta-reviews, ECCV sent out a google form for dissatisfied authors to submit an appeal for the following reasons: Policy errors, e.g., revie
Roblox Promised "No Friction." Parents Got Locked Out — and $6.7B Vanished.
Dev.to · CaraComp 👁️ Computer Vision ⚡ AI Lesson 1w ago
Roblox Promised "No Friction." Parents Got Locked Out — and $6.7B Vanished.
The engineering reality of biometric friction For developers building in the computer vision and...
What is Remote Sensing?
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
What is Remote Sensing?
This blog is inspired from iirs -distance learning programme by isro and iirs ( Institute of remote sensing Dehraun) which i had attended. Continue reading on M
Clifford Vortex Filaments: Rendering Chaotic Attractors in 3D
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1w ago
Clifford Vortex Filaments: Rendering Chaotic Attractors in 3D
“Clifford Vortex Filaments” is a generative media art piece that dives deep into chaos theory and non-linear dynamics, visualizing a… Continue reading on Medium
Reddit r/programming 👁️ Computer Vision ⚡ AI Lesson 2w ago
I Stored a Website in a Favicon
A small experiment of mine :) Happy to hear your thought about this submitted by /u/soupgasm [link] [comments]
The First Computer Bug Was a Real Moth
Dev.to · fluidwire 👁️ Computer Vision ⚡ AI Lesson 2w ago
The First Computer Bug Was a Real Moth
In 1947 a moth jammed in Harvard's Mark II became the first computer bug ever logged. Here is the real story and why debugging still defines IoT and embedded wo
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
TeleMorpher: Toward Robust Simultaneous Motion-Location Editing
arXiv:2606.19676v1 Announce Type: cross Abstract: Diffusion models have achieved remarkable success in image and video generation and editing. While recent stud
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval
arXiv:2606.19733v1 Announce Type: cross Abstract: Efficiently retrieving specific 3D instances from large-scale scenes via natural language prompts remains a fo
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
ParaScale: Scale-Calibrated Camera-Motion Transfer via a Gauge-Invariant Parallax Number
arXiv:2606.19805v1 Announce Type: cross Abstract: Transferring the camera motion of a reference video to a freshly generated one lets creators reuse cinematic m
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement
arXiv:2606.19867v1 Announce Type: cross Abstract: Computed Tomography (CT) is essential for diagnosing pediatric craniofacial abnormalities, yet poses radiation
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models
arXiv:2606.19932v1 Announce Type: cross Abstract: Mamba demonstrates strong efficiency in modeling long visual sequences. However, when token reduction is appli
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
See-and-Reach: Precise Vision-Language Navigation for UAVs within the Field of View
arXiv:2606.20045v1 Announce Type: cross Abstract: UAV Vision-Language Navigation (UAV-VLN) is typically formulated as a holistic search-and-reach problem, where
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
arXiv:2602.23172v2 Announce Type: replace-cross Abstract: Capturing 4D spatiotemporal scene structure is crucial for the safe and reliable operation of robots i