Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,539

lessons

Skills in this topic

3 skills — Sign in to track your progress

View full skill map →

Classify images with a pre-trained CNN

Modern CV Models

Run YOLO for real-time object detection

Build a Stable Diffusion inference pipeline

Videos 1,145 Reads 394

All Reads (394) Articles (216)Blog Posts (117)Tutorials (47)Research Papers (13)News (1)

Level: All Beginner Intermediate Advanced

Newest Popular Oldest

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment

arXiv:2603.19609v1 Announce Type: cross Abstract: We present LoD-Loc v3, a novel method for generalized aerial visual localization in dense urban environments.

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

Toward High-Fidelity Visual Reconstruction: From EEG-Based Conditioned Generation to Joint-Modal Guided Rebuilding

arXiv:2603.19667v1 Announce Type: cross Abstract: Human visual reconstruction aims to reconstruct fine-grained visual stimuli based on subject-provided descript

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

RAM: Recover Any 3D Human Motion in-the-Wild

arXiv:2603.19929v1 Announce Type: cross Abstract: RAM incorporates a motion-aware semantic tracker with adaptive Kalman filtering to achieve robust identity ass

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

arXiv:2603.20193v1 Announce Type: cross Abstract: Existing tampering detection benchmarks largely rely on object masks, which severely misalign with the true ed

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers

arXiv:2507.16214v3 Announce Type: replace-cross Abstract: Accurate and robust relative pose estimation is crucial for enabling challenging Active Debris Removal

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

3D-Consistent Multi-View Editing by Correspondence Guidance

arXiv:2511.22228v2 Announce Type: replace-cross Abstract: Recent advancements in diffusion and flow models have greatly improved text-based image editing, yet m

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors

arXiv:2603.18782v2 Announce Type: replace-cross Abstract: Recent progress in 3D generation has been driven largely by models conditioned on images or text, whil

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 3mo ago

CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference Customization

arXiv:2603.19121v2 Announce Type: replace-cross Abstract: The creation of high-fidelity, customizable 3D indoor scene textures remains a significant challenge.

Is Apple's New MacBook Pro Right For You?

Forbes Innovation 👁️ Computer Vision ⚡ AI Lesson 3mo ago

Is Apple's New MacBook Pro Right For You?

Apple's new MacBook Pro M5 Pro and M5 Max models are the latest iteration of the macOS laptops. But are they stopgaps until the real revolution in 2027.

ZDNet AI 👁️ Computer Vision ⚡ AI Lesson 3mo ago

Yes, 8GB of RAM really is enough for a MacBook in 2026 - here's why

If you're worried the Neo - or any other modern-day MacBook with 8GB of RAM - doesn't have enough memory, maybe you're not looking in the right place.

Forget Blender Skills: This AI Generates Complete 3D Objects for You

Hackernoon 👁️ Computer Vision ⚡ AI Lesson 3mo ago

Forget Blender Skills: This AI Generates Complete 3D Objects for You

GET3D is an AI system that generates complete 3D models—geometry and textures—from simple 2D images. Unlike older methods, it produces ready-to-use assets compa

ZDNet AI 👁️ Computer Vision ⚡ AI Lesson 3mo ago

What is MoCA 2.5? How this low-cost networking can replace Wi-Fi and fix dead zones

MoCA 2.5 leverages old coaxial cables to enable high-speed internet. I break down the technology and why it's a viable alternative to Wi-Fi.

ZDNet AI 👁️ Computer Vision ⚡ AI Lesson 3mo ago

I wore the Whoop 5.0 for a month - it combines the best of the Oura Ring and Apple Watch

The Whoop 5.0 has several medical-minded tools, like ECG and blood pressure monitoring. Here's how I actually used them.

Blurring a Name Doesn't Anonymise a Face: What GDPR Actually Says

Dev.to · CaraComp 👁️ Computer Vision 3mo ago

Blurring a Name Doesn't Anonymise a Face: What GDPR Actually Says

Think your facial datasets are anonymized? Think again. For developers building computer vision (CV)...

More Than Meets the Eye: NVIDIA RTX-Accelerated Computers Now Connect Directly to Apple Vision Pro

NVIDIA AI Blog 👁️ Computer Vision ⚡ AI Lesson 3mo ago

More Than Meets the Eye: NVIDIA RTX-Accelerated Computers Now Connect Directly to Apple Vision Pro

NVIDIA and Apple’s collaboration brings native integration of NVIDIA CloudXR 6.0 to visionOS, securely delivering NVIDIA RTX-powered simulators and professional

Standard RAG Is Blind — Building Multimodal RAG in .NET to Fix It

Dev.to · Argha Sarkar 👁️ Computer Vision 3mo ago

Standard RAG Is Blind — Building Multimodal RAG in .NET to Fix It

The Scenario A developer builds a RAG system. A user uploads a 60-page service manual —...

Multimodal Biometrics: Why Face + Fingerprint + Voice Defeats Deepfakes

Dev.to · CaraComp 👁️ Computer Vision 3mo ago

Multimodal Biometrics: Why Face + Fingerprint + Voice Defeats Deepfakes

How multimodal fusion rewrites the rules of biometric probability As developers building identity...

Introducing a Simple, High-Performance 3D Visualization Tool in Python for Robotics, SLAM, and Computer Vision Applications

Dev.to · Roman Dubrovin 👁️ Computer Vision 3mo ago

Introducing a Simple, High-Performance 3D Visualization Tool in Python for Robotics, SLAM, and Computer Vision Applications

Introduction: The 3D Visualization Gap in Python In the world of robotics, SLAM, and...

AI Facial Recognition Sent an Innocent Grandmother to Jail

Dev.to · CaraComp 👁️ Computer Vision 3mo ago

AI Facial Recognition Sent an Innocent Grandmother to Jail

the technical failure points of automated identification For developers working in computer vision...

The Face Recognition Error That's Wrecking Investigations

Dev.to · CaraComp 👁️ Computer Vision 3mo ago

The Face Recognition Error That's Wrecking Investigations

the mathematical gap between open-world search and closed-set verification The accuracy ceiling for...

Towards Data Science 👁️ Computer Vision ⚡ AI Lesson 3mo ago

The Current Status of The Quantum Software Stack

How do we program quantum computers today? The post The Current Status of The Quantum Software Stack appeared first on Towards Data Science .

Law Enforcement Isn't Abandoning Face Tech — It's Regulating It

Dev.to · CaraComp 👁️ Computer Vision 3mo ago

Law Enforcement Isn't Abandoning Face Tech — It's Regulating It

The hidden shift in biometric regulation For developers building in the computer vision and...

When 99% Accurate Still Means Thousands of Wrong Arrests

Dev.to · CaraComp 👁️ Computer Vision 3mo ago

When 99% Accurate Still Means Thousands of Wrong Arrests

Biometric Accuracy vs. Investigative Reality For developers working in computer vision (CV) and...

Tomorrow: March 12 - MCP, Agents and Skills Meetup

Dev.to · Jimmy Guerrero 👁️ Computer Vision 3mo ago

Tomorrow: March 12 - MCP, Agents and Skills Meetup

Join us tomorrow on March 12 at 9 AM Pacific for a special edition of the AI, ML and Computer Vision...

Hunting Einstein Rings: Achieving 0.994 mAP in Deep-Space Detection with RT-DETR

Dev.to · jinghao-ai 👁️ Computer Vision 3mo ago

Hunting Einstein Rings: Achieving 0.994 mAP in Deep-Space Detection with RT-DETR

1.Introduction: The Needle in a Haystack Detecting Strong Gravitational Lensing(e.g., Einstein...

Building a Real-Time Posture Monitoring System in the Browser (MediaPipe + PiP)

Dev.to · Manan Verma 👁️ Computer Vision 3mo ago

Building a Real-Time Posture Monitoring System in the Browser (MediaPipe + PiP)

Browsers Kill Background Tabs. Here’s How I Kept My Computer Vision Engine Alive. Most...

YOLO vs Cloud API for Object Detection — Which One Should You Actually Use?

Dev.to · AI Engine 👁️ Computer Vision 3mo ago

YOLO vs Cloud API for Object Detection — Which One Should You Actually Use?

You need object detection in your app. You have two paths: run YOLO on your own GPU, or call a cloud...

Stop Losing Your Medical Records: Build a Multimodal Health RAG with LlamaIndex & Qdrant 🩺

Dev.to · wellallyTech 👁️ Computer Vision 3mo ago

Stop Losing Your Medical Records: Build a Multimodal Health RAG with LlamaIndex & Qdrant 🩺

We’ve all been there: staring at a pile of blood test results, crumpled physical therapy notes, and...

AI-Based Green Light Optimization using Computer Vision

Dev.to · Naitik Verma 👁️ Computer Vision 3mo ago

AI-Based Green Light Optimization using Computer Vision

Urban traffic systems still rely largely on fixed timer traffic lights. These timers do not adapt to...

March 19 - Women in AI Meetup

Dev.to · Jimmy Guerrero 👁️ Computer Vision 3mo ago

March 19 - Women in AI Meetup

Hear talks from experts on cutting-edge topics in AI, ML, and computer vision at the Women in AI...

How to Build AI iOS Apps: Complete CoreML Guide

Dev.to · Iniyarajan 👁️ Computer Vision 3mo ago

How to Build AI iOS Apps: Complete CoreML Guide

Learn how to build AI iOS apps with CoreML, Vision, and Natural Language frameworks. Complete Swift code examples for image recognition and text analysis.

March 12 - MCP, Skills and Agents AI Meetup

Dev.to · Jimmy Guerrero 👁️ Computer Vision 3mo ago

March 12 - MCP, Skills and Agents AI Meetup

Join us on March 12 for a special edition of the AI, ML and Computer Vision Meetup where we will...

March 5 - AI, ML and Computer Vision Meetup

Dev.to · Jimmy Guerrero 👁️ Computer Vision 4mo ago

March 5 - AI, ML and Computer Vision Meetup

Join us on March 5 for the virtual AI, ML and Computer Vision Meetup. Register for the...

I Built a Real JARVIS in Python with Knowledge Graphs, BERT Emotion Detection, Face Recognition and NASA API

Dev.to · Konstantinos 👁️ Computer Vision 4mo ago

I Built a Real JARVIS in Python with Knowledge Graphs, BERT Emotion Detection, Face Recognition and NASA API

Ever watched Iron Man and thought — could I actually build that? I did, and after months of work,...

OpenCV Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago

Preview The Embedded Vision Summit 2026 Conference On OpenCV Live

Join the organizers of the Embedded Vision Summit on this preview webinar for an insider look at the premier conference on practical computer vision and edge AI

Gandalf Vision

Dev.to · Andrey 👁️ Computer Vision 4mo ago

Hey! So I spent yesterday diving into that Gandalf Vision library you mentioned—the computer vision...

Building a Custom Augmented Reality Marker Detector with OpenCV

Dev.to · 💻 Arpad Kish 💻 👁️ Computer Vision 4mo ago

Building a Custom Augmented Reality Marker Detector with OpenCV

Augmented Reality (AR) bridges the gap between the physical and digital worlds. A foundational step...

How to use OpenCV in Python, Make Your Hand Invisible Using OpenCV Magic Effect

Dev.to · Shafqat Awan 👁️ Computer Vision 4mo ago

How to use OpenCV in Python, Make Your Hand Invisible Using OpenCV Magic Effect

As we move into 2026, the demand for real-time computer vision manipulation has shifted from simple filters to seamless augmented reality integrations...

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

DeepMind Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

Our latest image generation model offers advanced world knowledge, production ready specs, subject consistency and more, all at Flash speed.

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

Google AI Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

Our latest image generation model offers advanced world knowledge, production-ready specs, subject consistency and more, all at Flash speed.

Build with Nano Banana 2, our best image generation and editing model

Google AI Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago

Build with Nano Banana 2, our best image generation and editing model

Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-level intelligence and fidelity for all image applications.

From Metrics to Action: Turning Embedding Analysis into Sprint Tickets

Dev.to · Itay Eylath 👁️ Computer Vision 4mo ago

From Metrics to Action: Turning Embedding Analysis into Sprint Tickets

In an agile Computer Vision startup, global accuracy is a vanity metric. It tells you the model is...

Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions

Microsoft Research 👁️ Computer Vision ⚡ AI Lesson 4mo ago

Media Authenticity Methods in Practice: Capabilities, Limitations, and Directions

As synthetic media grows, verifying what’s real, and the origin of content, matters more than ever. Our latest report explores media integrity and authenticatio

Food Image Recognition: How AI Identifies What's on Your Plate

Dev.to · albert nahas 👁️ Computer Vision 4mo ago

Food Image Recognition: How AI Identifies What's on Your Plate

Discover how AI-powered food image recognition accurately identifies and estimates your meals. Explore tech insights and boost your app’s accuracy today!

Building FridgeChef: What I Learned Training a Custom Computer Vision Model with Roboflow

Dev.to · Jacob Nastaskin 👁️ Computer Vision 4mo ago

Building FridgeChef: What I Learned Training a Custom Computer Vision Model with Roboflow

I spend too much time staring at my fridge trying to figure out what to make for dinner. So I built...

Recraft V4: image generation with design taste

Replicate Blog 👁️ Computer Vision ⚡ AI Lesson 4mo ago

Recraft V4: image generation with design taste

Recraft V4 generates art-directed images — and actual editable SVGs — with strong composition, accurate text rendering, and what the Recraft team calls "design

Generating SEM Images from Segmentation Masks

Dev.to · Shira S 👁️ Computer Vision 4mo ago

Generating SEM Images from Segmentation Masks

Acknowledgements We would like to thank our mentors, Asaf Nisani and Yoav Lebendiker, for...

Computer Vision for Web Developers: Build an Image Recognition App with TensorFlow.js

Dev.to · Paul Robertson 👁️ Computer Vision 4mo ago

Computer Vision for Web Developers: Build an Image Recognition App with TensorFlow.js

Learn to build a complete image recognition web app using TensorFlow.js with real-time webcam classification and object detection. Includes practical code examp