Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,538

lessons

Skills in this topic

3 skills — Sign in to track your progress

View full skill map →

Classify images with a pre-trained CNN

Modern CV Models

Run YOLO for real-time object detection

Build a Stable Diffusion inference pipeline

Videos 1,145 Reads 393

All Reads (393) Articles (216)Blog Posts (116)Tutorials (47)Research Papers (13)News (1)

Level: All Beginner Intermediate Advanced

Newest Popular Oldest

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago

ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation

arXiv:2606.11670v1 Announce Type: cross Abstract: Subject-preserving video generation is not solved by frontal-face similarity alone: a generated person must re

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

arXiv:2606.11683v1 Announce Type: cross Abstract: Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrai

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago

Multi-View In-Cabin Monitoring System for Public Transport Vehicles

arXiv:2606.11739v1 Announce Type: cross Abstract: We introduce a multi-view in-cabin monitoring dataset for public transportation with synchronized RGB and dept

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago

AnchorEdit: Maintaining Temporal Consistency in Multi-turn Image Editing via Causal Memory

arXiv:2606.11751v1 Announce Type: cross Abstract: Multi-turn image editing is essential for iterative design, yet current models often struggle with identity dr

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago

TextHOI-3D: Text-to-3D Hand-Object Interaction via Discrete Multi-View Generation and Joint Mesh Optimization

arXiv:2606.11805v1 Announce Type: cross Abstract: Text-conditioned 3D generation has progressed rapidly for images and isolated objects, but producing a hand-ob

Reddit r/MachineLearning 👁️ Computer Vision ⚡ AI Lesson 1mo ago

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Hello everyone. The new dataset is named MONET, is Apache 2.0 and available on HF: https://huggingface.co/datasets/jasperai/monet MONET is open, Apache 2.0-lice

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago

BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

arXiv:2605.28067v1 Announce Type: new Abstract: The remarkable generation quality of modern diffusion models often comes at the cost of massive parameter counts

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago

A Camera-Cooperative ISAC Framework for Multimodal Non-Cooperative UAVs Sensing

arXiv:2605.22090v1 Announce Type: new Abstract: The detection of non-cooperative unmanned aerial vehicles (UAVs) presents significant challenges for Integrated

Manchester Code Made Bits Behave

IEEE Spectrum 👁️ Computer Vision ⚡ AI Lesson 1mo ago

Manchester Code Made Bits Behave

In the late 1940s—when computer engineers were grappling with unreliable hardware and noisy transmission environments—a team of engineers inside a modest lab at

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago

Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections

arXiv:2605.05402v1 Announce Type: new Abstract: Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study int

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

ReflectCAP: Detailed Image Captioning with Reflective Memory

arXiv:2604.12357v1 Announce Type: new Abstract: Detailed image captioning demands both factual grounding and fine-grained coverage, yet existing methods have st

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Intelligent ROI-Based Vehicle Counting Framework for Automated Traffic Monitoring

arXiv:2604.12470v1 Announce Type: new Abstract: Accurate vehicle counting through video surveillance is crucial for efficient traffic management. However, achie

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On

arXiv:2509.25749v2 Announce Type: cross Abstract: Virtual try-on (VITON) aims to generate realistic images of a person wearing a target garment, requiring preci