Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,353
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,208) Articles (385)Blog Posts (260)Tutorials (78)Research Papers (469)News (16)
How I Built a Face Detection Web App in Python in Under 20 Minutes (And You Can Too)
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1w ago
How I Built a Face Detection Web App in Python in Under 20 Minutes (And You Can Too)
A step-by-step guide to deploying your first Computer Vision AI project using OpenCV and Streamlit. Continue reading on Medium »
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Here’s What Actually Worked Continue reading on Medium »
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1w ago
I Had 54 Underwater Images and Needed a Reliable Corrosion Detector
Here’s What Actually Worked Continue reading on Medium »
What I Learned About 3D Reconstruction — My First Week at PreserveMy.World
Dev.to · humnaattique4-sys 👁️ Computer Vision ⚡ AI Lesson 1w ago
What I Learned About 3D Reconstruction — My First Week at PreserveMy.World
I just started my internship at PreserveMy.World, a project focused on digitally preserving cultural...
Splitting Face Recognition Across the Edge and the Cloud with AWS IoT Greengrass + Lambda
Dev.to · Saurin Prajapati 👁️ Computer Vision ⚡ AI Lesson 1w ago
Splitting Face Recognition Across the Edge and the Cloud with AWS IoT Greengrass + Lambda
How I built a real-time face recognition pipeline that detects faces at the edge on AWS IoT Greengrass and recognizes them serverlessly with Lambda, glued toget
A VLM gate for generated images, with provider failover via Bifrost
Dev.to · Elise Moreau 👁️ Computer Vision ⚡ AI Lesson 1w ago
A VLM gate for generated images, with provider failover via Bifrost
TL;DR: At Photoroom we run a vision-language model as the last check before a generated product image...
I entered a competition to track objects in light you can't see
Dev.to · Alan Scott Encinas 👁️ Computer Vision ⚡ AI Lesson 1w ago
I entered a competition to track objects in light you can't see
The first entry in a live builder's log. I'm competing in the Hyperspectral Object Tracking Challenge 2026: track one object through video shot in colors the hu
Tiled-MRPNN — real-time photorealistic light transport in participating media
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
Tiled-MRPNN — real-time photorealistic light transport in participating media
Historically, real-time light transport in participating media has mainly been handled with simplified physics simulations and crude… Continue reading on Medium
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Automatic Vehicle Detection using DETR: A Transformer-Based Approach for Navigating Treacherous Roads
arXiv:2502.17843v1 Announce Type: cross Abstract: Automatic Vehicle Detection (AVD) in diverse driving environments presents unique challenges due to varying li
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
GEOPHYS: The Geometry of Physical Plausibility
arXiv:2606.20707v1 Announce Type: cross Abstract: While humans can identify physically implausible events within milliseconds, machine learning approaches addre
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
XmoPipe: A Pipeline for Large-Scale In-the-Wild Human Motion Dataset Construction
arXiv:2606.20731v1 Announce Type: cross Abstract: Large-scale human motion datasets are essential for training robust motion models for analysis, synthesis, and
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
UniSLAD: A Unified Framework for Structural and Logical Industrial Visual Anomaly Detection
arXiv:2606.20768v1 Announce Type: cross Abstract: Visual anomaly detection is a fundamental task in industrial automation. While existing approaches have achiev
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
GroundShot: Visually Consistent Multi-Shot Long Video Generation via Entity-Grounded Shot Scheduling
arXiv:2606.20799v1 Announce Type: cross Abstract: Generating visually consistent multi-shot videos remains an open challenge. As videos span more shots, inconsi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
MS-rPPG: Multi-spectral State Space Model for Remote Photoplethysmography in Driver Monitoring Systems
arXiv:2606.21115v1 Announce Type: cross Abstract: Remote photoplethysmography (rPPG) is a camera-based technique for measuring physiological signals, particular
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
ACE-GS: Acing the Trade-off with Accurate, Compact and Efficient 3D Gaussian Splatting
arXiv:2606.21244v1 Announce Type: cross Abstract: 3D Gaussian Splatting achieves exceptional real-time rendering, but its substantial computational and storage
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Scene-Level Heterogeneous Physics Simulation with 3D Gaussian Splats
arXiv:2606.21753v1 Announce Type: cross Abstract: 3D Gaussian Splatting (3DGS) has achieved state-of-the-art photorealistic rendering, but the representation ga
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Dual-Stream EEG Decoding for 3D Visual Perception
arXiv:2606.22182v1 Announce Type: cross Abstract: This paper explores a novel brain decoding model for 3D shape perception through a dual pathway architecture m
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
DreamUV: Unwrap Artist-like UV by End-to-End Flow Matching
arXiv:2606.22445v1 Announce Type: cross Abstract: UV parameterization is a fundamental step in 3D content creation, yet producing production-ready UV layouts re
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Training-Free Semantic Correction for Autoregressive Visual Models
arXiv:2606.22550v1 Announce Type: cross Abstract: Autoregressive visual models (AVMs) based on next-scale prediction have emerged as a prominent paradigm for im
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
The Power of Light: Improving Synthetic-to-Real Domain Adaptation through Physically-Based Indirect Illumination
arXiv:2606.22574v1 Announce Type: cross Abstract: While synthetic data generation resolves the manual labeling bottleneck in computer vision, minimizing the syn
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
DBT-Bleed: Dual-Branch Temporal Modeling with Key-Frame Selection for Surgical Bleeding Detection
arXiv:2606.22829v1 Announce Type: cross Abstract: Intraoperative Adverse Events (IAEs) detection is critical for improving surgical safety, with bleeding being
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
OrthoMotion:Disentangling Camera and Subject Motion via Geometry Semantics Orthogonal Attention
arXiv:2606.22835v1 Announce Type: cross Abstract: Controllable video generation demands independent command of the camera and the subject, yet 2D conditioning e
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
MotionHalluc: Diagnosing Kinematic Hallucinations in Fine-Grained Motion Reasoning
arXiv:2606.23061v1 Announce Type: cross Abstract: Motion instruction generation in cross-video comparison aims to produce corrective feedback that describes the
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Rethinking Object-Centric Representations for Video Dynamics Modeling
arXiv:2606.23436v1 Announce Type: cross Abstract: Unsupervised video object tracking aims to decompose dynamic scenes into persistent, object-centric entities w
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering
arXiv:2505.17338v2 Announce Type: replace-cross Abstract: Photorealistic volumetric rendering of CT scans greatly benefits clinical workflows, yet neural approa
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
HaineiFRDM: Structure-Preserving Diffusion for Film Restoration under Fast Motion and Diverse Defects
arXiv:2512.24946v2 Announce Type: replace-cross Abstract: Existing film-restoration methods frequently fail under fast motion, producing limb disappearance and
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
CAOA -- Completion-Assisted Object-CAD Alignment
arXiv:2606.18429v2 Announce Type: replace-cross Abstract: Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central chall
SmartDefectAI: Industrial Surface Defect Detection using Vision Transformers and Hybrid…
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
SmartDefectAI: Industrial Surface Defect Detection using Vision Transformers and Hybrid…
Computer Vision, CNN, EfficientNet, Vision Transformer (ViT), Deep Learning, Attention Mechanism, Transfer Learning, OpenCV, TensorFlow… Continue reading on Med
How AI-Powered Computer Vision Is Transforming Retail Compliance
Forbes Innovation 👁️ Computer Vision ⚡ AI Lesson 1w ago
How AI-Powered Computer Vision Is Transforming Retail Compliance
In industries where compliance violations can directly impact customer health, this becomes incredibly important.
RF-DETR: A Smaller Model That Beats the Biggest YOLO
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
RF-DETR: A Smaller Model That Beats the Biggest YOLO
How Roboflow’s RF-DETR transformer beats YOLO11 on COCO, skips NMS, and where the benchmark numbers deserve a closer look. Continue reading on Towards Deep Lear
Your Car's Paint Has a Cache Invalidation Problem — Here Is What That Means in Jaipur
Dev.to · CarCare 👁️ Computer Vision ⚡ AI Lesson 1w ago
Your Car's Paint Has a Cache Invalidation Problem — Here Is What That Means in Jaipur
cache invalidation is one of the genuinely hard problems in computer science, mostly because the...
Reddit r/MachineLearning 👁️ Computer Vision ⚡ AI Lesson 1w ago
[ECCV 2026] Paper Decision Appeals Discussion [D]
With the release of meta-reviews, ECCV sent out a google form for dissatisfied authors to submit an appeal for the following reasons: Policy errors, e.g., revie
Roblox Promised "No Friction." Parents Got Locked Out — and $6.7B Vanished.
Dev.to · CaraComp 👁️ Computer Vision ⚡ AI Lesson 1w ago
Roblox Promised "No Friction." Parents Got Locked Out — and $6.7B Vanished.
The engineering reality of biometric friction For developers building in the computer vision and...
What is Remote Sensing?
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1w ago
What is Remote Sensing?
This blog is inspired from iirs -distance learning programme by isro and iirs ( Institute of remote sensing Dehraun) which i had attended. Continue reading on M
Clifford Vortex Filaments: Rendering Chaotic Attractors in 3D
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1w ago
Clifford Vortex Filaments: Rendering Chaotic Attractors in 3D
“Clifford Vortex Filaments” is a generative media art piece that dives deep into chaos theory and non-linear dynamics, visualizing a… Continue reading on Medium
Reddit r/programming 👁️ Computer Vision ⚡ AI Lesson 1w ago
I Stored a Website in a Favicon
A small experiment of mine :) Happy to hear your thought about this submitted by /u/soupgasm [link] [comments]
The First Computer Bug Was a Real Moth
Dev.to · fluidwire 👁️ Computer Vision ⚡ AI Lesson 1w ago
The First Computer Bug Was a Real Moth
In 1947 a moth jammed in Harvard's Mark II became the first computer bug ever logged. Here is the real story and why debugging still defines IoT and embedded wo
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
TeleMorpher: Toward Robust Simultaneous Motion-Location Editing
arXiv:2606.19676v1 Announce Type: cross Abstract: Diffusion models have achieved remarkable success in image and video generation and editing. While recent stud
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval
arXiv:2606.19733v1 Announce Type: cross Abstract: Efficiently retrieving specific 3D instances from large-scale scenes via natural language prompts remains a fo
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
ParaScale: Scale-Calibrated Camera-Motion Transfer via a Gauge-Invariant Parallax Number
arXiv:2606.19805v1 Announce Type: cross Abstract: Transferring the camera motion of a reference video to a freshly generated one lets creators reuse cinematic m
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
PSCT-Net: Geometry-Aware Pediatric Skull CT Reconstruction via Differentiable Back-Projection and Attention-Guided Refinement
arXiv:2606.19867v1 Announce Type: cross Abstract: Computed Tomography (CT) is essential for diagnosing pediatric craniofacial abnormalities, yet poses radiation
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Spatial-Aware Reduction Framework: Towards Efficient and Faithful Visual State Space Models
arXiv:2606.19932v1 Announce Type: cross Abstract: Mamba demonstrates strong efficiency in modeling long visual sequences. However, when token reduction is appli
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
See-and-Reach: Precise Vision-Language Navigation for UAVs within the Field of View
arXiv:2606.20045v1 Announce Type: cross Abstract: UAV Vision-Language Navigation (UAV-VLN) is typically formulated as a holistic search-and-reach problem, where
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1w ago
Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
arXiv:2602.23172v2 Announce Type: replace-cross Abstract: Capturing 4D spatiotemporal scene structure is crucial for the safe and reliable operation of robots i
Herkes İçin Ağ Temelleri: OSI ve TCP/IP Nedir?
Medium · Cybersecurity 👁️ Computer Vision ⚡ AI Lesson 1w ago
Herkes İçin Ağ Temelleri: OSI ve TCP/IP Nedir?
Gün içinde binlerce veri paketini alıp gönderiyoruz; bir web sitesine giriyor, arkadaşımıza mesaj atıyor veya bir video izliyoruz. Ekranda… Continue reading on
Microsoft’s .dxdmp Exposes the Refund Risk Behind Pretty Demos
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2w ago
Microsoft’s .dxdmp Exposes the Refund Risk Behind Pretty Demos
Microsoft’s early-June 2026 DirectX Dump Files preview turns GPU crashes into evidence, forcing 3D founders to sell recovery, not pixels. Continue reading on KA
How to Erase Unwanted Video Objects FREE
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2w ago
How to Erase Unwanted Video Objects FREE
Imagine capturing a near-perfect cinematic shot, only to notice an ugly corporate logo, a distracting background object, or an accidental… Continue reading on M
Best Face Tracking SDKs for Real-Time Video Conferencing in 2026
Medium · Startup 👁️ Computer Vision ⚡ AI Lesson 2w ago
Best Face Tracking SDKs for Real-Time Video Conferencing in 2026
AR filters, virtual backgrounds, and beautification for live video calls: open-source and commercial options compared across iOS, Android… Continue reading on M