Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,899
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (754) Articles (322)Blog Posts (146)Tutorials (70)Research Papers (212)News (4)
Google Unleashes “Transformers” in Vision!
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 1w ago
Google Unleashes “Transformers” in Vision!
For nearly a decade, computer programs designed to recognize images (like identifying a dog in a photo) were built using a specific type… Continue reading on Me
Which GPU Should You Choose for Computer Vision Training?
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Which GPU Should You Choose for Computer Vision Training?
A practical comparison of the H100, A100, L4, T4, and Google TPUs for PyTorch, TensorFlow, and JAX workloads. Continue reading on Towards Dev »
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 2w ago
Transforming Industrial Operations with Vision AI: The Future of Intelligent Automation
Organizations in manufacturing, warehousing, logistics, retail and energy are spending a lot of money on automation to make things more… Continue reading on Med
Visual Search System: Complete ML System Design
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Visual Search System: Complete ML System Design
A visual search system enables users to discover images that are visually similar to a selected image. Platforms such as Pinterest use… Continue reading on Medi
You Only Look Once: How One Idea Taught Machines to See in Real Time
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
You Only Look Once: How One Idea Taught Machines to See in Real Time
A friendly tour of YOLO — what it is, how it works, and why it quietly powers everything from self-driving cars to the camera that counts… Continue reading on M
Medium · NLP 👁️ Computer Vision ⚡ AI Lesson 2w ago
Computer Vision vs. NLP-Based AI Development: Use Case Fit, Infrastructure, and Cost Compared
When companies start evaluating artificial intelligence development services, one of the first real decisions they face is not about which… Continue reading on
Rethinking Retail Shelf Intelligence in the Age of Vision AI
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Rethinking Retail Shelf Intelligence in the Age of Vision AI
A curiosity-driven exploration of a real-world retail problem, what I learned, and where it might go next. Continue reading on Medium »
Rethinking Retail Shelf Intelligence in the Age of Vision AI
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Rethinking Retail Shelf Intelligence in the Age of Vision AI
A curiosity-driven exploration of a real-world retail problem, what I learned, and where it might go next. Continue reading on Medium »
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Geometric Consistency Protocol for Foundation Model Features in Multi-View Satellite Imagery
arXiv:2606.17564v1 Announce Type: cross Abstract: Standardized evaluation protocols are indispensable for robust benchmarking in remote sensing, particularly as
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
SkillMoV: Mixture-of-View Routing with Prototype-Conditioned Gating for Unified Multi-View Proficiency Estimation
arXiv:2606.17615v1 Announce Type: cross Abstract: Estimating human proficiency from video is a key challenge for automated skill assessment, with applications i
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
ReAge3D: Re-Aging 3D Faces with View Consistency
arXiv:2606.18156v1 Announce Type: cross Abstract: We present a novel framework for realistic and controllable 3D face re-aging which produces highly detailed, i
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ cameras
arXiv:2510.23798v2 Announce Type: replace-cross Abstract: The proliferation of floating anthropogenic debris in rivers has emerged as a pressing environmental c
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing
arXiv:2601.18252v2 Announce Type: replace-cross Abstract: Wireframe parsing aims to recover line segments and their junctions to form a structured geometric rep
The 1980s Code That Rules the Geometry of Modern Video Games
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 2w ago
The 1980s Code That Rules the Geometry of Modern Video Games
If you’ve ever climbed a mountain in Minecraft, sailed across a realistic ocean in a video game, or flown over an endless landscape that… Continue reading on Me
Reddit r/learnprogramming 👁️ Computer Vision ⚡ AI Lesson 2w ago
How do you learn multiple languages?
I'm taking a computer science major but have never done any programming prior to taking it. After getting over the first year courses, I can confidently say tha
Reddit r/learnprogramming 👁️ Computer Vision ⚡ AI Lesson 2w ago
[INFO] New Typeset PDF of SICP (based on previous community works)
For those interested, the LaTeX source of the SICP (Structure and Interpretation of Computer Programs) book was updated for better typesetting and QoL improveme
Building V-INTELLIGENCE: An AI-Powered Traffic & Vehicle Management System
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2w ago
Building V-INTELLIGENCE: An AI-Powered Traffic & Vehicle Management System
How I built a full-stack automated vehicle compliance system to track plates, process video feeds, and issue automated warnings. Continue reading on Medium »
Teaching a Computer to Tell a Sneaker from a Sandal: A Beginner’s Guide to CNNs
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 2w ago
Teaching a Computer to Tell a Sneaker from a Sandal: A Beginner’s Guide to CNNs
A hands-on walkthrough of building, training, and evaluating a Convolutional Neural Network (CNN) on the Fashion-MNIST dataset using… Continue reading on Medium
Teaching a Computer to Tell a Sneaker from a Sandal: A Beginner’s Guide to CNNs
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Teaching a Computer to Tell a Sneaker from a Sandal: A Beginner’s Guide to CNNs
A hands-on walkthrough of building, training, and evaluating a Convolutional Neural Network (CNN) on the Fashion-MNIST dataset using… Continue reading on Medium
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference
arXiv:2606.14724v1 Announce Type: cross Abstract: Video anomaly detection in surveillance settings must balance detection accuracy against real-time throughput,
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion
arXiv:2606.14732v1 Announce Type: cross Abstract: Autoregressive video diffusion models enable streaming generation but often degrade over long rollouts: static
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Automated 3D Kinematic Monitoring for Circadian Activity and Anomaly Detection in Juvenile Fish
arXiv:2606.14749v1 Announce Type: cross Abstract: Precision aquaculture faces a "phenotyping bottleneck" in tracking high-resolution behavioral traits, as conve
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Beyond Self-Attention: Sub-Quadratic Vision Transformers for Fast Image Captioning
arXiv:2606.14753v1 Announce Type: cross Abstract: Image captioning is a challenging and significant task that aims to generate coherent and semantically meaning
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Sub-Semantic Image Segmentation
arXiv:2606.14754v1 Announce Type: cross Abstract: Images can be segmented based on visual cues (i.e., texture segmentation) or into objects (i.e., semantic segm
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Where Does Texture Evidence Live in SAM? Features, Proposal Masks, and Texture Segmentation
arXiv:2606.14755v1 Announce Type: cross Abstract: Texture segmentation stresses foundation segmentation because meaningful regions are defined by material or re
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
An Empirical Analysis of Optimization Dynamics and Sparsity Boundaries in Large-Scale Pedestrian Attribute Recognition
arXiv:2606.14770v1 Announce Type: cross Abstract: Pedestrian Attribute Recognition (PAR) is critical for video surveillance, enabling forensic search and re-ide
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Double-Helix Vision (DH-V2): A Geometry-Based Visual Sampler for Bandwidth-Constrained Perception
arXiv:2606.14773v1 Announce Type: cross Abstract: We present Double-Helix Vision (DH), a geometry-based visual sampler that compresses 2D images into compact 1D
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Selective Synergistic Learning for Video Object-Centric Learning
arXiv:2606.15527v1 Announce Type: cross Abstract: Typical video object-centric learning (VOCL) approaches employ slot-based frameworks that rely on reconstructi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
MADAR: An Address-Free Processor
arXiv:2606.15535v1 Announce Type: cross Abstract: In a modern processor, computing is the cheap part. Most of its area and energy go to \emph{addressing} -- mov
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
PURe: A Plug-and-Play Product-Unit Residual Module for Vision Networks
arXiv:2505.04397v2 Announce Type: replace-cross Abstract: Modern vision networks are dominated by additive local transformations, whereas explicit multiplicativ
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation
arXiv:2602.07343v2 Announce Type: replace-cross Abstract: Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow condition
Reddit r/deeplearning 👁️ Computer Vision ⚡ AI Lesson 2w ago
How to automatically mask real people but ignore paintings/statues/mannequins?
submitted by /u/Beginning_Street_375 [link] [comments]
SignTax: A Computer Vision Approach to Advertising Sign Tax Assessment
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
SignTax: A Computer Vision Approach to Advertising Sign Tax Assessment
SignTax คือระบบ AI สำหรับ ตรวจสอบป้ายโฆษณาและประเมินภาษีป้าย โดยมีเป้าหมายเพื่อช่วยหน่วยงานท้องถิ่น ในการคำนวณค่าภาษีป้ายโฆษณา ที่มีขอบเขต… Continue reading on
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Morphology-Aware Sample Assignment: Overcoming IoU Insensitivity for Surface Defect Detection
arXiv:2606.13723v1 Announce Type: cross Abstract: Intersection-over-Union (IoU), as a pivotal metric for evaluating the spatial alignment between candidate prop
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
How do Self-Supervised Remote Sensing Vision Models Transfer to Downstream Tasks?
arXiv:2606.13896v1 Announce Type: cross Abstract: Self-supervised geospatial foundation models (GeoFMs) learn transferable representations from remote sensing d
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
FEMOT: Multi-Object Tracking using Frame and Event Cameras
arXiv:2606.14094v1 Announce Type: cross Abstract: Conventional RGB cameras have been widely used in multi-object tracking due to their ability to capture rich a
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2w ago
Vanishing Depth: Training Generalized Depth Adapters with Sinusoidal Depth Preprocessing for Pretrained RGB Encoders
arXiv:2503.19947v2 Announce Type: replace-cross Abstract: Generalized metric depth understanding is critical for precise vision-guided robotics, which current s
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 2w ago
Your Face Got Mapped by Apple. 6 Million People Are Suing.
The evolving legal standards for biometric data processing For developers building computer vision (CV) applications, the recent federal class certification in
Journaling Cache
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2w ago
Journaling Cache
Cache is a concept in browsers/applications in which data that is frequently used is fetched from cache memory (RAM) instead of being… Continue reading on Mediu
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 2w ago
Cops Lost His Kids Over an 85% Guess — Your Face Could Be Next
Why reliance on similarity scores is a developer's nightmare For computer vision engineers and developers working with biometrics, the news of another wrongful
Dev.to AI 👁️ Computer Vision ⚡ AI Lesson 2w ago
You Verified Your Kid's Age. A Stranger Now Has Your Face.
the technical risk of third-party identity pipelines For developers working in computer vision and biometrics, the recent shift by major platforms like PlayStat
Deep Model for Vision
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Deep Model for Vision
Neural Networks for CV and NLP — EP03 Continue reading on Medium »
Deep Model for Vision
Medium · NLP 👁️ Computer Vision ⚡ AI Lesson 2w ago
Deep Model for Vision
Neural Networks for CV and NLP — EP03 Continue reading on Medium »
Someone Split Your Computer In Half: And Somehow It Still Works
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2w ago
Someone Split Your Computer In Half: And Somehow It Still Works
Why are the CPU and RAM two different chips? Genuinely. Who decided that? At some point someone drew a line down the middle of a computer… Continue reading on M
Can AI Change an Entire Outfit in a Video at Once?
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 2w ago
Can AI Change an Entire Outfit in a Video at Once?
Paper: OmniTryOn: Video Try-On Anything at Once! Continue reading on Medium »
Understanding Modern CNN Design: Comparing CNN, ResNet, DenseNet, and DLA
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Understanding Modern CNN Design: Comparing CNN, ResNet, DenseNet, and DLA
CNNs have been the strongest pillars of Computer Vision since its inception. Not only were they a profound concept when they were… Continue reading on Medium »
Reddit r/deeplearning 👁️ Computer Vision ⚡ AI Lesson 2w ago
Join us for 1 day virtual session on fundamentals of computer vision
Hello everyone, I'm going to conduct a one-day virtual session on the fundamentals of Computer Vision, where I'll primarily discuss concepts directly from the o
CPU Basics for Hackers & Developers: Understanding Memory, Stack, Heap & Registers Like a Pro
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 2w ago
CPU Basics for Hackers & Developers: Understanding Memory, Stack, Heap & Registers Like a Pro
A beginner-friendly but deep guide to how programs actually live inside memory — with storytelling, visuals, and hands-on labs Continue reading on Medium »