Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,538
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (393) Articles (216)Blog Posts (116)Tutorials (47)Research Papers (13)News (1)
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation
arXiv:2606.11670v1 Announce Type: cross Abstract: Subject-preserving video generation is not solved by frontal-face similarity alone: a generated person must re
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning
arXiv:2606.11683v1 Announce Type: cross Abstract: Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrai
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Multi-View In-Cabin Monitoring System for Public Transport Vehicles
arXiv:2606.11739v1 Announce Type: cross Abstract: We introduce a multi-view in-cabin monitoring dataset for public transportation with synchronized RGB and dept
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
AnchorEdit: Maintaining Temporal Consistency in Multi-turn Image Editing via Causal Memory
arXiv:2606.11751v1 Announce Type: cross Abstract: Multi-turn image editing is essential for iterative design, yet current models often struggle with identity dr
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
TextHOI-3D: Text-to-3D Hand-Object Interaction via Discrete Multi-View Generation and Joint Mesh Optimization
arXiv:2606.11805v1 Announce Type: cross Abstract: Text-conditioned 3D generation has progressed rapidly for images and isolated objects, but producing a hand-ob
Reddit r/MachineLearning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]
Hello everyone. The new dataset is named MONET, is Apache 2.0 and available on HF: https://huggingface.co/datasets/jasperai/monet MONET is open, Apache 2.0-lice
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models
arXiv:2605.28067v1 Announce Type: new Abstract: The remarkable generation quality of modern diffusion models often comes at the cost of massive parameter counts
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
A Camera-Cooperative ISAC Framework for Multimodal Non-Cooperative UAVs Sensing
arXiv:2605.22090v1 Announce Type: new Abstract: The detection of non-cooperative unmanned aerial vehicles (UAVs) presents significant challenges for Integrated
Manchester Code Made Bits Behave
IEEE Spectrum 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Manchester Code Made Bits Behave
In the late 1940s—when computer engineers were grappling with unreliable hardware and noisy transmission environments—a team of engineers inside a modest lab at
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections
arXiv:2605.05402v1 Announce Type: new Abstract: Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study int
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
ReflectCAP: Detailed Image Captioning with Reflective Memory
arXiv:2604.12357v1 Announce Type: new Abstract: Detailed image captioning demands both factual grounding and fine-grained coverage, yet existing methods have st
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
Intelligent ROI-Based Vehicle Counting Framework for Automated Traffic Monitoring
arXiv:2604.12470v1 Announce Type: new Abstract: Accurate vehicle counting through video surveillance is crucial for efficient traffic management. However, achie
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago
ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On
arXiv:2509.25749v2 Announce Type: cross Abstract: Virtual try-on (VITON) aims to generate realistic images of a person wearing a target garment, requiring preci