Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

2,353
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (1,208) Articles (385)Blog Posts (260)Tutorials (78)Research Papers (469)News (16)
How I Built a Perceptual Color Quantization Engine for LEGO Mosaics
Dev.to · BMBrick 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How I Built a Perceptual Color Quantization Engine for LEGO Mosaics
The Problem Converting a photo into a LEGO mosaic sounds simple: resize the image, find...
Unified Video Action (UVA) Model
Medium · LLM 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Unified Video Action (UVA) Model
Seminar #5 (Paper review) Continue reading on Medium »
# CNN vs Vision Transformer on CIFAR-10: A Beginner-Friendly Experiment
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
# CNN vs Vision Transformer on CIFAR-10: A Beginner-Friendly Experiment
## Why I wrote this experiment Continue reading on Medium »
From Pixels to Predictions: How CNNs Actually Work
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
From Pixels to Predictions: How CNNs Actually Work
Understanding how Convolutional Neural Networks transform raw pixel data into intelligent predictions. Continue reading on Medium »
VXN-RAMNet (VisionX Routine Adaptive Memory Network)
Dev.to · 𝗔𝗷𝗮𝘆 𝗦𝗼𝗻𝗶 👁️ Computer Vision ⚡ AI Lesson 1mo ago
VXN-RAMNet (VisionX Routine Adaptive Memory Network)
What if navigation systems could remember routes visually instead of depending entirely on...
Computer Vision Is Rebuilding the Fitting Room
Medium · AI 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Computer Vision Is Rebuilding the Fitting Room
The models, the stack, the ROI — no fluff Continue reading on Medium »
Why Most Tools Fail at Table Extraction (And How I Built a Vision-First Solution)
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Why Most Tools Fail at Table Extraction (And How I Built a Vision-First Solution)
Conquering the nightmare of Borderless, Scanned, and Merged-Cell Tables with a Hybrid AI Pipeline Continue reading on Medium »
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Intelligent CCTV for Urban Design: AI-Based Analysis of Soft Infrastructure at Intersections
arXiv:2605.05402v1 Announce Type: new Abstract: Artificial intelligence (AI) and computer vision are transforming transportation data collection. This study int
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
MolRecBench-Wild: A Real-World Benchmark for Optical Chemical Structure Recognition
arXiv:2605.05832v1 Announce Type: new Abstract: Optical Chemical Structure Recognition (OCSR) aims to translate molecular diagrams in scientific literature into
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Tamaththul3D: High-Fidelity 3D Saudi Sign Language Avatars from Monocular Video
arXiv:2605.05367v1 Announce Type: cross Abstract: Arabic Sign Language (ArSL) and its dialects serve approximately 400 million Arabic speakers worldwide, yet th
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CFE-PPAR: Compression-friendly encryption for privacy-preserving action recognition leveraging video transformers
arXiv:2605.05692v1 Announce Type: cross Abstract: Privacy-preserving action recognition (PPAR) enables machines to understand human activities in videos without
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
The autoPET3 Challenge -- Automated Lesion Segmentation in Whole-Body PET/CT - Multitracer Multicenter Generalization
arXiv:2605.05775v1 Announce Type: cross Abstract: We report the design and results of the third autoPET challenge (MICCAI 2024), which benchmarked automated les
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
VideoRouter: Query-Adaptive Dual Routing for Efficient Long-Video Understanding
arXiv:2605.05848v1 Announce Type: cross Abstract: Video large multimodal models increasingly face a scalability bottleneck: long videos produce excessively long
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
iPhoneBlur: A Difficulty-Stratified Benchmark for Consumer Device Motion Deblurring
arXiv:2605.05990v1 Announce Type: cross Abstract: Motion blur restoration on consumer mobile devices is typically evaluated using aggregate metrics that obscure
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models
arXiv:2605.06010v1 Announce Type: cross Abstract: Purely RGB-based vision models often fail to provide reliable cues in challenging scenarios such as nighttime
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Dynamic Pondering Sparsity-aware Mixture-of-Experts Transformer for Event Stream based Visual Object Tracking
arXiv:2605.06112v1 Announce Type: cross Abstract: Despite significant progress, RGB-based trackers remain vulnerable to challenging imaging conditions, such as
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Autoregressive Visual Generation Needs a Prologue
arXiv:2605.06137v1 Announce Type: cross Abstract: In this work, we propose Prologue, an approach to bridging the reconstruction-generation gap in autoregressive
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation
arXiv:2605.06667v1 Announce Type: cross Abstract: For artistic applications, video generation requires fine-grained control over both performance and cinematogr
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Multi-Scale Spectral Attention Module-based Hyperspectral Segmentation in Autonomous Driving Scenarios
arXiv:2506.18682v2 Announce Type: replace-cross Abstract: Recent advances in autonomous driving (AD) have highlighted the potential of hyperspectral imaging (HS
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
SoccerMaster: A Vision Foundation Model for Soccer Understanding
arXiv:2512.11016v2 Announce Type: replace-cross Abstract: Soccer understanding has recently garnered growing research interest due to its domain-specific comple
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval
arXiv:2601.03728v2 Announce Type: replace-cross Abstract: Composed Image Retrieval (CIR) enables users to search for target images using both a reference image
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
DC-DiT: Adaptive Compute and Elastic Inference for Visual Generation via Dynamic Chunking
arXiv:2603.06351v2 Announce Type: replace-cross Abstract: Diffusion Transformers rely on static patchify tokenization, assigning the same token budget to smooth
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control
arXiv:2603.14209v2 Announce Type: replace-cross Abstract: A pictorial chart is an effective medium for visual storytelling, seamlessly integrating visual elemen
Number systems conversion for dummies
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Number systems conversion for dummies
There are four widely used number systems: decimal (10), binary (2), octal (8), and hexadecimal (16). As humans, we use the decimal system. Continue reading on
Panduan Praktis Optimasi Pencahayaan Citra Digital dengan Python
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Panduan Praktis Optimasi Pencahayaan Citra Digital dengan Python
Mengapa Pencahayaan Itu Krusial? ​Pernahkah Anda mengambil foto di kondisi minim cahaya dan mendapati hasilnya sangat gelap hingga… Continue reading on Medium »
Efficiency vs. Precision: A Python Deep Dive into Faster R-CNN and SSD PyTorch
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Efficiency vs. Precision: A Python Deep Dive into Faster R-CNN and SSD PyTorch
In the rapidly evolving landscape of artificial intelligence, selecting the optimal architecture for computer vision is rarely a simple… Continue reading on Obj
Computer Vision Fundamentals: CNN Architectures
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Computer Vision Fundamentals: CNN Architectures
The landmark designs that shaped modern computer vision, from LeNet to EfficientNet. Continue reading on Medium »
What If You Could Find Films by How They Feel Visually?
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
What If You Could Find Films by How They Feel Visually?
There’s a scene in Life of Pi where the ocean at night fills with bioluminescent green light. The whole frame glows. It’s one of the most… Continue reading on M
Membangun Sistem Deteksi Helm Pengendara Motor Menggunakan YOLOv8
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Membangun Sistem Deteksi Helm Pengendara Motor Menggunakan YOLOv8
Keselamatan lalu lintas merupakan salah satu isu penting, khususnya bagi pengguna sepeda motor. Continue reading on Medium »
Membangun Sistem Deteksi Helm Pengendara Motor Menggunakan YOLOv8
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Membangun Sistem Deteksi Helm Pengendara Motor Menggunakan YOLOv8
Keselamatan lalu lintas merupakan salah satu isu penting, khususnya bagi pengguna sepeda motor. Continue reading on Medium »
Part 1:
Medium · Machine Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Part 1:
From Fish Classification to Vision Transformers: How Machines Learned to See Continue reading on Medium »
Part 1:
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Part 1:
From Fish Classification to Vision Transformers: How Machines Learned to See Continue reading on Medium »
Part 1:
Medium · Programming 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Part 1:
From Fish Classification to Vision Transformers: How Machines Learned to See Continue reading on Medium »
Eksplorasi Deteksi Tepi pada Citra Digital Menggunakan Python
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Eksplorasi Deteksi Tepi pada Citra Digital Menggunakan Python
Pendahuluan Continue reading on Medium »
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Dari Pixel ke AI: Bagaimana Komputer Memahami Sebuah Gambar
“Sebuah eksplorasi sederhana tentang bagaimana gambar digital diubah menjadi informasi yang dapat dipahami oleh Artificial Intelligence.”… Continue reading on M
Teaching a Random Forest to Tell Walking from Running: A Computer Vision Pipeline with Hand-Built...
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Teaching a Random Forest to Tell Walking from Running: A Computer Vision Pipeline with Hand-Built...
How a 56-feature baseline became a 240-feature classifier at 86% accuracy, with per-class SHAP guiding every feature engineering decision. Continue reading on M
Vision Transformers Under Extreme Latency: Particle Tracking at the LHC
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Vision Transformers Under Extreme Latency: Particle Tracking at the LHC
Particle physics has always been a data problem disguised as a physics problem and the LHC is now pushing us to rethink tracking as a… Continue reading on Data
How Your Phone Unlocks in the Dark With Your Face
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
How Your Phone Unlocks in the Dark With Your Face
Thirty thousand invisible dots, a neural engine, and some surprisingly elegant geometry — all in the time it takes you to glance at your… Continue reading on Co
Cara Mudah Deteksi Tepi Gambar Menggunakan Algoritma Sobel di Python
Medium · Data Science 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Cara Mudah Deteksi Tepi Gambar Menggunakan Algoritma Sobel di Python
Dalam dunia Computer Vision, deteksi tepi (edge detection) adalah salah satu teknik fundamental yang digunakan untuk mengidentifikasi… Continue reading on Mediu
Cara Mudah Deteksi Tepi Gambar Menggunakan Algoritma Sobel di Python
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Cara Mudah Deteksi Tepi Gambar Menggunakan Algoritma Sobel di Python
Dalam dunia Computer Vision, deteksi tepi (edge detection) adalah salah satu teknik fundamental yang digunakan untuk mengidentifikasi… Continue reading on Mediu
Implementasi YOLO26 untuk Deteksi Kesehatan Kelapa Sawit Melalui Citra Digital
Medium · Deep Learning 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Implementasi YOLO26 untuk Deteksi Kesehatan Kelapa Sawit Melalui Citra Digital
Indonesia merupakan salah satu produsen kelapa sawit terbesar di dunia. Berdasarkan laporan Analisis Kinerja Perdagangan Kelapa Sawit… Continue reading on Mediu
Mengenal Lebih Dekat Deteksi Tepi Canny Pada Pengolahan Citra Digital dengan python dan opencv
Medium · Python 👁️ Computer Vision ⚡ AI Lesson 1mo ago
Mengenal Lebih Dekat Deteksi Tepi Canny Pada Pengolahan Citra Digital dengan python dan opencv
Dalam dunia pengolahan citra digital, mendeteksi batas suatu objek merupakan hal yang sangat penting. Continue reading on Medium »
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Modeling Subjective Urban Perception with Human Gaze
arXiv:2605.00764v1 Announce Type: cross Abstract: Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experie
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
StableI2I: Spotting Unintended Changes in Image-to-Image Transition
arXiv:2605.04453v1 Announce Type: cross Abstract: In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction followi
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Example-Based Object Detection
arXiv:2605.04501v1 Announce Type: cross Abstract: In recent years, object detection has achieved significant progress, especially in the field of open-vocabular
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Efficient Geometry-Controlled High-Resolution Satellite Image Synthesis
arXiv:2605.04557v1 Announce Type: cross Abstract: High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation
arXiv:2605.04590v1 Announce Type: cross Abstract: Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering
ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1mo ago
Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness
arXiv:2605.04606v1 Announce Type: cross Abstract: Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high