Foundations
Computer Vision
Object detection, segmentation, YOLO, CLIP, and vision-language models
Skills in this topic
3 skills — Sign in to track your progress
DeepMind Blog
👁️ Computer Vision
⚡ AI Lesson
7mo ago
Teaching AI to see the world more like we do
Our new paper analyzes the important ways AI systems organize the visual world differently from humans.

Dev.to · Michael G. Inso
👁️ Computer Vision
7mo ago
From Text to Live Video: How We Built a Serverless Multimodal Logistics AI on Google Cloud Run
The logistics industry runs on information. From tracking numbers on a crumpled label to complex...

Dev.to · Andres Daza
👁️ Computer Vision
7mo ago
Evita el problema N+1 en validaciones de Laravel
El problema oculto detrás de las validaciones masivas Cuando validamos arrays de datos en...

Dev.to · Akshaya Reddy Annareddy
👁️ Computer Vision
7mo ago
Real-Time Face Recognition Attendance — QR Access & Google Sheets Integration
🚀 This project automates classroom attendance using Face Recognition (MTCNN + FaceNet) integrated...

Dev.to · Dr. Carlos Ruiz Viquez
👁️ Computer Vision
8mo ago
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AS
⚡ I'd like to recommend the 'Audio Segmentation Toolkit' (AST), an open-source library that shines in...

Dev.to · YK Sugi
👁️ Computer Vision
8mo ago
Daft vs Ray Data: A Comprehensive Comparison for Multimodal Data Processing
Multimodal AI workloads break traditional data engines. They need to embed documents, classify...

Dev.to · Dr. Carlos Ruiz Viquez
👁️ Computer Vision
8mo ago
**The Hidden Pitfall of Multimodal Fusion: Avoid Over-weight
The Hidden Pitfall of Multimodal Fusion: Avoid Over-weighting a Single Modality When working with...
DeepMind Blog
👁️ Computer Vision
⚡ AI Lesson
8mo ago
Image editing in Gemini just got a major upgrade
Transform images in amazing new ways with updated native image editing in the Gemini app.

Dev.to · Dr. Carlos Ruiz Viquez
👁️ Computer Vision
8mo ago
**The Dark Side of Computer Vision: How Adversarial Examples
The Dark Side of Computer Vision: How Adversarial Examples Can Fool Even the Most Advanced...

Dev.to · Arvind SundaraRajan
👁️ Computer Vision
8mo ago
Beats as Objects: A Computer Vision Hack for Music Analysis by Arvind Sundararajan
Beats as Objects: A Computer Vision Hack for Music Analysis \Struggling to accurately...

Dev.to · anujpatel2899
👁️ Computer Vision
8mo ago
Hello Guys Anyone working live video feed analysis using Computer vision i need help in terms of technical part looking to talk further and discuss.
A post by anujpatel2899

Dev.to · Sohan Lal
👁️ Computer Vision
8mo ago
What is a Model Serving Framework? A Simple Guide
Have you ever wondered how artificial intelligence (AI) apps work? When you use a face recognition...
Fast.ai Blog
👁️ Computer Vision
⚡ AI Lesson
8mo ago
How to Solve it With Code course now available
tl/dr: This is a copy of a one-off email I sent to all fast.ai forum users, with a long-overdue update. I had planned to send this email a year ago to let you k

Dev.to · Debajyati Dey
👁️ Computer Vision
8mo ago
Face Detection in Python Using OpenCV HAAR CASCADE Method
Let's learn about face detection in Python using the OpenCV library. Introduction OpenCV...

Dev.to · Arvind SundaraRajan
👁️ Computer Vision
8mo ago
Quantum Weaving: Supercharging Multimodal AI Without the Exponential Overhead
Quantum Weaving: Supercharging Multimodal AI Without the Exponential Overhead Imagine...

Dev.to · Arvind SundaraRajan
👁️ Computer Vision
8mo ago
Quantum Weaving: A New Era of Multimodal AI by Arvind Sundararajan
Quantum Weaving: A New Era of Multimodal AI Imagine trying to understand a movie scene...

Dev.to · techtech
👁️ Computer Vision
9mo ago
😮💨 I created my own face recognition system
Building a Privacy-First Face Recognition System That Actually Works 🔍 ...

Dev.to · mahmoudabbasi
👁️ Computer Vision
9mo ago
Persian OCR with YOLO + CRNN: Building a Custom Text Recognition Pipeline
Running OCR for Persian text is tricky. Unlike English, Persian (and Arabic) scripts are...

Replicate Blog
👁️ Computer Vision
⚡ AI Lesson
9mo ago
Which image editing model should I use?
Here is the ultimate comparison post on all the latest image editing models.
OpenAI News
👁️ Computer Vision
⚡ AI Lesson
9mo ago
Outbound coordinated vulnerability disclosure policy
Outbound coordinated vulnerability disclosure policy
GitHub Engineering
👁️ Computer Vision
⚡ AI Lesson
9mo ago
Post-quantum security for SSH access on GitHub
GitHub is introducing post-quantum secure key exchange methods for SSH access to better protect Git data in transit. The post Post-quantum security for SSH acce

Dev.to · Arvind Sundara Rajan
👁️ Computer Vision
9mo ago
Illuminating the Dark: Next-Gen Object Detection from Raw Sensor Data by Arvind Sundararajan
Illuminating the Dark: Next-Gen Object Detection from Raw Sensor Data Imagine a...

Dev.to · Xiao Ling
👁️ Computer Vision
9mo ago
Building an iOS ID Scanner with Face, Document, OCR and MRZ Detection
Apple's vision framework provides APIs for performing computer vision tasks such as face detection,...
![[Boost]](https://media2.dev.to/dynamic/image/width=1000,height=500,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmanjzo8y0iqgcaspfv0l.png)
Dev.to · Ravinthiran Partheepan
👁️ Computer Vision
10mo ago
[Boost]
Furniture Image Classification Using TypeScript + BilberryDB...

Dev.to · Ertugrul
👁️ Computer Vision
10mo ago
Building an Edge AI Sound Classifier (Part 2): Feature Extraction & Training
In Part 1, we prepared a balanced dataset of short audio snippets. In Part 2, we’ll turn those...

Dev.to · Olalekan Oladiran
👁️ Computer Vision
11mo ago
Build a Fruit Detection AI with Azure Custom Vision: A Step-by-Step Guide
Introduction The Azure AI Custom Vision service enables you to create computer vision...

Dev.to · Olalekan Oladiran
👁️ Computer Vision
11mo ago
Extract Text Like Magic: Build an OCR App with Azure AI Vision in Python
Introduction Optical character recognition (OCR) is a subset of computer vision that deals...

Dev.to · Muhammed Shafin P
👁️ Computer Vision
11mo ago
The Possibility of Training a Multimodal AI for Cryptocurrency Auto-Trading Decisions
By Muhammed Shafin P (hejhdiss) In the evolving landscape of financial technology, cryptocurrency...

Dev.to · F.SAHFEERUL WASIHF
👁️ Computer Vision
11mo ago
How I Built an AI-Powered Face Recognition App from Scratch
🚀 Introduction: Inspired by how streaming platforms measure actor screen time, I built a...

Replicate Blog
👁️ Computer Vision
⚡ AI Lesson
11mo ago
Generate consistent characters
We compare the best image models for generating consistent characters from a single reference image.

Dev.to · Gruhesh Sri Sai Karthik Kurra
👁️ Computer Vision
11mo ago
A Deep Dive into Clustering for Customer Segmentation
Explore K-Means, Hierarchical, DBSCAN, and GMM clustering to segment customers in this hands-on Python guide.

Dev.to · Mohamed Radwan
👁️ Computer Vision
11mo ago
Extract Invoice Data Automatically Using LangChain
In this article, I’m sharing an app I built to automate invoice processing using image recognition...

Dev.to · Pierre Brunelle
👁️ Computer Vision
11mo ago
Stop Gluing Data Infrastructure Tools: Build Multimodal AI Workloads and Application with One Declarative Python SDK
Introducing Pixeltable open-source data infrastructure, that unifies your data store, transformation,...

Dev.to · aposb
👁️ Computer Vision
12mo ago
Integrating OpenCV (C++) with Visual Studio 2019 - the proper way
In this post, I will set up OpenCV v4.10.0 on Windows 10 and create a demo C++ project to demonstrate...
BAIR Blog
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
12mo ago
Whole-Body Conditioned Egocentric Video Prediction
.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,
OpenAI News
👁️ Computer Vision
⚡ AI Lesson
1y ago
Introducing our latest image generation model in the API
Our latest image generation model is now available in the API via ‘gpt-image-1’—enabling developers and businesses to build professional-grade, customizable vis
OpenAI News
👁️ Computer Vision
⚡ AI Lesson
1y ago
Thinking with images
OpenAI o3 and o4-mini represent a significant breakthrough in visual perception by reasoning with images in their chain of thought.

Dev.to · Doyin Elugbadebo
👁️ Computer Vision
1y ago
Flask-Powered Object Detection for Real-Time Analysis
Computer vision is revolutionizing industries, from autonomous driving to real-time surveillance and...
Replicate Blog
👁️ Computer Vision
⚡ AI Lesson
2y ago
Replicate Intelligence #2
Faster image generation, AI-powered world simulator, insights on AI dataset complexity
Weaviate Blog
👁️ Computer Vision
⚡ AI Lesson
2y ago
Using Weaviate to Find Waldo
Dive into using Weaviate for image recognition to find the "needle in a haystack"!

Dev.to · Estelle-K
👁️ Computer Vision
2y ago
Building an Image Recognition Website with SvelteKit and TensorFlow.js
Introduction In this article, I'll show you how to build a simple website that allows...

Dev.to · Atharva Shirdhankar
👁️ Computer Vision
3y ago
CartoonSpace with Complete Python Flask-OpenCV Dev Environment
What I built The Github Codespace and Github Actions has became one of my favourite tools...
Hugging Face Blog
👁️ Computer Vision
⚡ AI Lesson
3y ago
A Dive into Text-to-Video Models
Hugging Face Blog
👁️ Computer Vision
⚡ AI Lesson
3y ago
Universal Image Segmentation with Mask2Former and OneFormer
Weaviate Blog
👁️ Computer Vision
⚡ AI Lesson
3y ago
How to build an Image Search Application with Weaviate
Learn how to use build an image search application using the Img2vec-neural module in Weaviate.
Hugging Face Blog
👁️ Computer Vision
⚡ AI Lesson
3y ago
Image Classification with AutoTrain
Replicate Blog
👁️ Computer Vision
⚡ AI Lesson
3y ago
Automating image collection
Using CLIP and LAION5B to collect thousands of captioned images.

Dev.to · Pikachu⚡
👁️ Computer Vision
4y ago
Creating a Colour Picker App using Flask & Azure Computer Vision Service
I recently did a talk on how you can analyse images with the Azure Computer Vision Service. While...
DeepCamp AI