Foundations

Computer Vision

Object detection, segmentation, YOLO, CLIP, and vision-language models

1,539
lessons
Skills in this topic
View full skill map →
CV Basics
beginner
Classify images with a pre-trained CNN
Modern CV Models
intermediate
Run YOLO for real-time object detection
Generative CV
advanced
Build a Stable Diffusion inference pipeline
All Reads (394) Articles (216)Blog Posts (117)Tutorials (47)Research Papers (13)News (1)
Teaching AI to see the world more like we do
DeepMind Blog 👁️ Computer Vision ⚡ AI Lesson 7mo ago
Teaching AI to see the world more like we do
Our new paper analyzes the important ways AI systems organize the visual world differently from humans.
From Text to Live Video: How We Built a Serverless Multimodal Logistics AI on Google Cloud Run
Dev.to · Michael G. Inso 👁️ Computer Vision 7mo ago
From Text to Live Video: How We Built a Serverless Multimodal Logistics AI on Google Cloud Run
The logistics industry runs on information. From tracking numbers on a crumpled label to complex...
Evita el problema N+1 en validaciones de Laravel
Dev.to · Andres Daza 👁️ Computer Vision 7mo ago
Evita el problema N+1 en validaciones de Laravel
El problema oculto detrás de las validaciones masivas Cuando validamos arrays de datos en...
Real-Time Face Recognition Attendance — QR Access & Google Sheets Integration
Dev.to · Akshaya Reddy Annareddy 👁️ Computer Vision 7mo ago
Real-Time Face Recognition Attendance — QR Access & Google Sheets Integration
🚀 This project automates classroom attendance using Face Recognition (MTCNN + FaceNet) integrated...
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AS
Dev.to · Dr. Carlos Ruiz Viquez 👁️ Computer Vision 8mo ago
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AS
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AST), an open-source library that shines in...
Daft vs Ray Data: A Comprehensive Comparison for Multimodal Data Processing
Dev.to · YK Sugi 👁️ Computer Vision 8mo ago
Daft vs Ray Data: A Comprehensive Comparison for Multimodal Data Processing
Multimodal AI workloads break traditional data engines. They need to embed documents, classify...
**The Hidden Pitfall of Multimodal Fusion: Avoid Over-weight
Dev.to · Dr. Carlos Ruiz Viquez 👁️ Computer Vision 8mo ago
**The Hidden Pitfall of Multimodal Fusion: Avoid Over-weight
The Hidden Pitfall of Multimodal Fusion: Avoid Over-weighting a Single Modality When working with...
Image editing in Gemini just got a major upgrade
DeepMind Blog 👁️ Computer Vision ⚡ AI Lesson 8mo ago
Image editing in Gemini just got a major upgrade
Transform images in amazing new ways with updated native image editing in the Gemini app.
**The Dark Side of Computer Vision: How Adversarial Examples
Dev.to · Dr. Carlos Ruiz Viquez 👁️ Computer Vision 8mo ago
**The Dark Side of Computer Vision: How Adversarial Examples
The Dark Side of Computer Vision: How Adversarial Examples Can Fool Even the Most Advanced...
Beats as Objects: A Computer Vision Hack for Music Analysis by Arvind Sundararajan
Dev.to · Arvind SundaraRajan 👁️ Computer Vision 8mo ago
Beats as Objects: A Computer Vision Hack for Music Analysis by Arvind Sundararajan
Beats as Objects: A Computer Vision Hack for Music Analysis \Struggling to accurately...
Hello Guys Anyone working live video feed analysis using Computer vision i need help in terms of technical part looking to talk further and discuss.
Dev.to · anujpatel2899 👁️ Computer Vision 8mo ago
Hello Guys Anyone working live video feed analysis using Computer vision i need help in terms of technical part looking to talk further and discuss.
A post by anujpatel2899
What is a Model Serving Framework? A Simple Guide
Dev.to · Sohan Lal 👁️ Computer Vision 8mo ago
What is a Model Serving Framework? A Simple Guide
Have you ever wondered how artificial intelligence (AI) apps work? When you use a face recognition...
How to Solve it With Code course now available
Fast.ai Blog 👁️ Computer Vision ⚡ AI Lesson 8mo ago
How to Solve it With Code course now available
tl/dr: This is a copy of a one-off email I sent to all fast.ai forum users, with a long-overdue update. I had planned to send this email a year ago to let you k
Face Detection in Python Using OpenCV HAAR CASCADE Method
Dev.to · Debajyati Dey 👁️ Computer Vision 8mo ago
Face Detection in Python Using OpenCV HAAR CASCADE Method
Let's learn about face detection in Python using the OpenCV library. Introduction OpenCV...
Quantum Weaving: Supercharging Multimodal AI Without the Exponential Overhead
Dev.to · Arvind SundaraRajan 👁️ Computer Vision 8mo ago
Quantum Weaving: Supercharging Multimodal AI Without the Exponential Overhead
Quantum Weaving: Supercharging Multimodal AI Without the Exponential Overhead Imagine...
Quantum Weaving: A New Era of Multimodal AI by Arvind Sundararajan
Dev.to · Arvind SundaraRajan 👁️ Computer Vision 8mo ago
Quantum Weaving: A New Era of Multimodal AI by Arvind Sundararajan
Quantum Weaving: A New Era of Multimodal AI Imagine trying to understand a movie scene...
😮‍💨 I created my own face recognition system
Dev.to · techtech 👁️ Computer Vision 9mo ago
😮‍💨 I created my own face recognition system
Building a Privacy-First Face Recognition System That Actually Works 🔍 ...
Persian OCR with YOLO + CRNN: Building a Custom Text Recognition Pipeline
Dev.to · mahmoudabbasi 👁️ Computer Vision 9mo ago
Persian OCR with YOLO + CRNN: Building a Custom Text Recognition Pipeline
Running OCR for Persian text is tricky. Unlike English, Persian (and Arabic) scripts are...
Which image editing model should I use?
Replicate Blog 👁️ Computer Vision ⚡ AI Lesson 9mo ago
Which image editing model should I use?
Here is the ultimate comparison post on all the latest image editing models.
OpenAI News 👁️ Computer Vision ⚡ AI Lesson 9mo ago
Outbound coordinated vulnerability disclosure policy
Outbound coordinated vulnerability disclosure policy
GitHub Engineering 👁️ Computer Vision ⚡ AI Lesson 9mo ago
Post-quantum security for SSH access on GitHub
GitHub is introducing post-quantum secure key exchange methods for SSH access to better protect Git data in transit. The post Post-quantum security for SSH acce
Illuminating the Dark: Next-Gen Object Detection from Raw Sensor Data by Arvind Sundararajan
Dev.to · Arvind Sundara Rajan 👁️ Computer Vision 9mo ago
Illuminating the Dark: Next-Gen Object Detection from Raw Sensor Data by Arvind Sundararajan
Illuminating the Dark: Next-Gen Object Detection from Raw Sensor Data Imagine a...
Building an iOS ID Scanner with Face, Document, OCR and MRZ Detection
Dev.to · Xiao Ling 👁️ Computer Vision 9mo ago
Building an iOS ID Scanner with Face, Document, OCR and MRZ Detection
Apple's vision framework provides APIs for performing computer vision tasks such as face detection,...
[Boost]
Dev.to · Ravinthiran Partheepan 👁️ Computer Vision 10mo ago
[Boost]
Furniture Image Classification Using TypeScript + BilberryDB...
Building an Edge AI Sound Classifier (Part 2): Feature Extraction & Training
Dev.to · Ertugrul 👁️ Computer Vision 10mo ago
Building an Edge AI Sound Classifier (Part 2): Feature Extraction & Training
In Part 1, we prepared a balanced dataset of short audio snippets. In Part 2, we’ll turn those...
Build a Fruit Detection AI with Azure Custom Vision: A Step-by-Step Guide
Dev.to · Olalekan Oladiran 👁️ Computer Vision 11mo ago
Build a Fruit Detection AI with Azure Custom Vision: A Step-by-Step Guide
Introduction The Azure AI Custom Vision service enables you to create computer vision...
Extract Text Like Magic: Build an OCR App with Azure AI Vision in Python
Dev.to · Olalekan Oladiran 👁️ Computer Vision 11mo ago
Extract Text Like Magic: Build an OCR App with Azure AI Vision in Python
Introduction Optical character recognition (OCR) is a subset of computer vision that deals...
The Possibility of Training a Multimodal AI for Cryptocurrency Auto-Trading Decisions
Dev.to · Muhammed Shafin P 👁️ Computer Vision 11mo ago
The Possibility of Training a Multimodal AI for Cryptocurrency Auto-Trading Decisions
By Muhammed Shafin P (hejhdiss) In the evolving landscape of financial technology, cryptocurrency...
How I Built an AI-Powered Face Recognition App from Scratch
Dev.to · F.SAHFEERUL WASIHF 👁️ Computer Vision 11mo ago
How I Built an AI-Powered Face Recognition App from Scratch
🚀 Introduction: Inspired by how streaming platforms measure actor screen time, I built a...
Generate consistent characters
Replicate Blog 👁️ Computer Vision ⚡ AI Lesson 11mo ago
Generate consistent characters
We compare the best image models for generating consistent characters from a single reference image.
A Deep Dive into Clustering for Customer Segmentation
Dev.to · Gruhesh Sri Sai Karthik Kurra 👁️ Computer Vision 11mo ago
A Deep Dive into Clustering for Customer Segmentation
Explore K-Means, Hierarchical, DBSCAN, and GMM clustering to segment customers in this hands-on Python guide.
Extract Invoice Data Automatically Using LangChain
Dev.to · Mohamed Radwan 👁️ Computer Vision 11mo ago
Extract Invoice Data Automatically Using LangChain
In this article, I’m sharing an app I built to automate invoice processing using image recognition...
Stop Gluing Data Infrastructure Tools: Build Multimodal AI Workloads and Application with One Declarative Python SDK
Dev.to · Pierre Brunelle 👁️ Computer Vision 11mo ago
Stop Gluing Data Infrastructure Tools: Build Multimodal AI Workloads and Application with One Declarative Python SDK
Introducing Pixeltable open-source data infrastructure, that unifies your data store, transformation,...
Integrating OpenCV (C++) with Visual Studio 2019 - the proper way
Dev.to · aposb 👁️ Computer Vision 12mo ago
Integrating OpenCV (C++) with Visual Studio 2019 - the proper way
In this post, I will set up OpenCV v4.10.0 on Windows 10 and create a demo C++ project to demonstrate...
BAIR Blog 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 12mo ago
Whole-Body Conditioned Egocentric Video Prediction
.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,
OpenAI News 👁️ Computer Vision ⚡ AI Lesson 1y ago
Introducing our latest image generation model in the API
Our latest image generation model is now available in the API via ‘gpt-image-1’—enabling developers and businesses to build professional-grade, customizable vis
OpenAI News 👁️ Computer Vision ⚡ AI Lesson 1y ago
Thinking with images
OpenAI o3 and o4-mini represent a significant breakthrough in visual perception by reasoning with images in their chain of thought.
Flask-Powered Object Detection for Real-Time Analysis
Dev.to · Doyin Elugbadebo 👁️ Computer Vision 1y ago
Flask-Powered Object Detection for Real-Time Analysis
Computer vision is revolutionizing industries, from autonomous driving to real-time surveillance and...
Replicate Blog 👁️ Computer Vision ⚡ AI Lesson 2y ago
Replicate Intelligence #2
Faster image generation, AI-powered world simulator, insights on AI dataset complexity
Weaviate Blog 👁️ Computer Vision ⚡ AI Lesson 2y ago
Using Weaviate to Find Waldo
Dive into using Weaviate for image recognition to find the "needle in a haystack"!
Building an Image Recognition Website with SvelteKit and TensorFlow.js
Dev.to · Estelle-K 👁️ Computer Vision 2y ago
Building an Image Recognition Website with SvelteKit and TensorFlow.js
Introduction In this article, I'll show you how to build a simple website that allows...
CartoonSpace with Complete Python Flask-OpenCV Dev Environment
Dev.to · Atharva Shirdhankar 👁️ Computer Vision 3y ago
CartoonSpace with Complete Python Flask-OpenCV Dev Environment
What I built The Github Codespace and Github Actions has became one of my favourite tools...
Hugging Face Blog 👁️ Computer Vision ⚡ AI Lesson 3y ago
A Dive into Text-to-Video Models
Hugging Face Blog 👁️ Computer Vision ⚡ AI Lesson 3y ago
Universal Image Segmentation with Mask2Former and OneFormer
Weaviate Blog 👁️ Computer Vision ⚡ AI Lesson 3y ago
How to build an Image Search Application with Weaviate
Learn how to use build an image search application using the Img2vec-neural module in Weaviate.
Hugging Face Blog 👁️ Computer Vision ⚡ AI Lesson 3y ago
Image Classification with AutoTrain
Replicate Blog 👁️ Computer Vision ⚡ AI Lesson 3y ago
Automating image collection
Using CLIP and LAION5B to collect thousands of captioned images.
Creating a Colour Picker App using Flask & Azure Computer Vision Service
Dev.to · Pikachu⚡ 👁️ Computer Vision 4y ago
Creating a Colour Picker App using Flask & Azure Computer Vision Service
I recently did a talk on how you can analyse images with the Azure Computer Vision Service. While...