Exploring The COCO Dataset

Roboflow · Intermediate ·👁️ Computer Vision ·5y ago

Skills: CV Basics90%Modern CV Models80%Generative CV60%

Key Takeaways

The video explores the Microsoft Common Objects in Context Dataset (COCO) for object detection and understanding common objects in context, using tools such as EfficientDet, Detective RS, and Roboflow, and discussing concepts like average precision metric and class balance.

Full Transcript

greetings this is jacob from roboflow here today to talk about the common objects in context data set or the coco data set before we dive into a little bit deeper data set introspection we're going to see here just an example of exactly what the coco dataset allows us to do and what the cocodataset allows us to train models to be able to see and understand so here you can see that i have an object detector running based on the cocoa dataset detecting a person so here you can see that the the model is detecting basically the bounding box in which it sees me as a person let's see what else it can detect so here i'm going to bring in a new object oh there we go we can see that it's detecting a book and it's doing a decent job of that bounding a box around where it sees the book is let's try it on a different object here i'm going to try to see if it can identify my cell phone oh yeah it's doing a pretty good job there and now let's see if we can stump it um here i have another object uh this one it's missing it's there we go i thought it was a sandwich there for a second sports ball oh wow frisbee um okay so you can see it's not doing so good at detecting a yellow cup there uh so basically the motivation of the cocoa data set is to see how well we can get these models to understand common objects in context like books cell phones people and cups and the better and better that models start doing on the coco data set is what's defining the state of the art and computer vision so now jumping in and taking a look at the state of the art here we have the leaderboard for the cocoa dataset so this is what really is all the buzz in computer vision these days is uh it's it's a lot of object detection and it's a lot of how well are these models doing on the coco data set so here we have the leaderboard and we can see basically here the box ap which is a measure of performance of how well these models have been doing over time and you can see that it is just a rising tide the models are getting better and better they're getting bigger they're getting more efficient they're getting the convolutional networks that form the backbone of these detectors are improving the neck of the object detector where these features are being pulled is getting researched and improved and the way that the training techniques are deployed is also really pushing the state of the art in in object detection on the coco data set so here most recently we can see that the leader the number one model on the coco data dataset is efficient debt d7x and this is a recent release of efficient debt because you can see here that it used to be down here but now the d7x is really the highest one beating out detective rs which is you know really exciting news and you can see here that they're measuring all these based on box ap so below we actually have a link for a little more details on what map is but for now you can just think of it as a measure of performance but the goal of this video is going to be really to understand what is going into the data set that forms this average precision metric and what are we exactly measuring when when we're looking um at this metric and what what is the data set and how can we uh maybe improve uh performance on this data set just by knowing more about it um so now kind of moving along here you can see the maximum box ap so this is just the very best that any model can do on the coco data set but in reality you're going to be more seeing graphs like this one which is basically how well is a model doing on the cocoa dataset relative to how long it takes to do inference so the small the faster the inference probably the smaller the model and the faster to train and it's a little bit more tractable so smaller models are definitely better and we want to prefer this and you'll see the cocoa ap valve on the on the y-axis so that is how well is the model doing on the coco validation set um and this is done for a lot of preliminary research before people actually test on the testing set which is which is held back by by the microsoft coco dataset providers so here we can see we have a couple models compared this is the efficient debt lineage which is the smaller efficient debt models compared to yellow v5 which is a state-of-the-art object detector for fast fast and performant inference another important thing even though there's more details in the map video below to know about the average precision metric is that it's average across all class labels in the coco data set so as we go and dive deeper into what the data set is it's going to be important to remember that this object detector is is measured as a mean across all class labels so no matter how well populated the class this metric is going to be measuring across all of those so if you really want to be beating this metric you're going to have to be able to recognize all class labels including yellow cups or rather just generally cups so now taking a look at what is uh what all is in the coco data set so here i have the most recent microsoft coco or one of one of the original coco papers and here they segment out some of the goals of what they're trying to annotate in the data set so let's say you have an image there's a variety of ways you can actually extract data out of this image and try to train models to extract data out of the image so the first way is classification this is just saying basically what is in the data set so uh in this upper left image here you see that there's people there's sheep and there's a dog um so you could just kind of keep it at that very base classification level um and then you could also go further and try to localize objects and localize them with boxes computers are are are good at kind of making floating point estimates and uh therefore the box is a convenient way to to do object localization um and then you localize that and then you attach a class label to it um as we saw with the detector as it was recognizing that i was a person and that this was a book now you can go further from just plain box detection and you can do actually semantic segmentation which is defining the outline of objects and then also they went even further and they determined you know rather than just doing some segmentation around objects they segmented those out into distinct objects so now diving in a little bit deeper into what all is in the data set so this is the coco dataset explorer here you can see that there's uh over a hundred thousand images which is a very large data set it takes a long time to train and it is it is a very powerful data set uh and over actually 800 000 instances of objects so in the cocoa explorer you can uh come in at cocodataset.org hashtag explorer and you can get a feel for kind of what is what are the objects in in the data set so here in this map we can see that there's a variety of class labels and all these class labels are kind of broken out here in natural category groups so here we can see that we have food we have electronics and these are kind of like everyday objects and with the explorer we can click on an object and we can actually see examples of that so a fun one might be elephants let's see what kind of elephants are in the data set okay so there's 2 000 results of images that actually have elephants and there we go you can see here they have the masks over the elephants so you can see where the semantic segmentation of the elephants lies now another cool thing about this is you can see instances where the objects actually appear in tandem so let's say i want to see all examples of uh let's say elephants and laptops see what we get back oh it's actually zero results so there actually are no images where elephants are appearing in the context of laptops that's not too surprising what about tvs let's see it's taking a little while to run it means there might be some examples okay there's five results of elephants and tvs oh and it looks like maybe the elephant's on screen there or the elephant's a figurine and oh here we go we have looks like an elephant in an airport bay anyways this is a good example of what this dataset is it's just all kinds of random objects in context and they've tried to choose maybe the 80 most prevalent objects to bend into classes and then and then to annotate from there so now going a little deeper we're going to take a look at the microsoft coco data set within rebel flow to use the roboflow dataset health check to see how deeply we can actually analyze a data set from that point of view so now i've loaded the microsoft cocoa data set into rebel flow which roboflow can indeed handle data sets of this size and it's a very powerful tool to be looking at your data so here in the dataset front page there's a few things we can do as we create data set versions we can add pre-processing steps so we can actually kind of morph the data set and do some things like grayscaling and tiling and resizing to go into a model we can modify the classes so if i decided there was a class that i didn't want in the coco data set i could remove it or i could rename it and then finally there's uh an ability to make augmentation so here you can actually vastly increase the size of your data set uh by doing things like rotation or adding noise or adding blur or using mosaic which actually is a very interesting example where you can be putting objects in different corners of the image this helps make your data set a lot larger with the coco dataset this is a little bit less important because you already have a lot of examples but in a sparser dataset augmentation is going to be a very important thing to bring your detector up to the performance you need without having to go gather and label more data and then a very powerful tool once your data is loaded into rebel flow is by looking at the dataset health check we can really determine what is going on inside of the dataset so here we have the cocodataset the validation set looking at the dataset healthcheck so first of all we can see there's 5000 images in the validation set for the coco 2017 object detection data set we can see that there's 36 000 annotations and we can see that there's a class balance here here we can see that you know of those 36 000 annotations 10 000 of them are people which means that the coco dataset is actually really predominantly a people data set and dropping off from there we can see the objects dropping off in prevalence so here we have underrepresented upper underrepresented classes labeled in red which actually um is sort of indicating that this could be a dangerous area because it's very hard for an object detector to learn examples that it hasn't seen that many examples of so it's it's going to be very hard for our object detector to learn toaster for example because there's only nine there's only nine examples in the validation set so if the training set is 20 times that size that means that there's probably only going to be 180 toasters to learn from in the midst of say 200 000 people so according to the last function it's definitely going to be optimizing be guessing people that's going to be the default assumption and it's going to be very hard to generalize down uh all the way to the underrepresented toaster oven or stop sign or bear or snowboard these are going to be hard class labels and as we were talking before with the map metric that is averaging across all classes so it's going to be very difficult for the model to be getting good scores across the board on all these different classes especially when it's when it's very underrepresented so that's a modeling challenge to be able to get through a sparsely labeled data set like this another thing that the health check allows you to do is look at the dimension insights of the data set so here we can see the different sizes of the images in the data set so this basically the big takeaway here is uh that the coco data set is not a uniform size the images are kind of all over the place and this is important to remember as you're resizing images and keeping track of your annotations these are all things that the roboflow platform does for you and then the last feature here on the rebel slow dataset health check is the annotation heat map so this lets you kind of start to look at the localization of your annotations notably um at a data large level it makes sense here that most objects are actually occurring in the middle of the data set but you can see that there's less actually occurring on the corners and the edges and then you can filter by class label so one class label of interest is actually umbrella so we can see here that most umbrellas are actually occurring in the top half of the coco dataset images and that might be something that you want you might say that hey you know actually umbrellas are normally appearing in the upper half of the image so therefore this is a safe assumption and our model might as well start to kind of learn this localization but you also might sort of think to yourself and say you know what maybe that's something that i didn't want to happen and i don't want to miss umbrellas that are say laying on the ground and then you would want to be going back to collect more images and to help even out this distribution of where the localization of your images is occurring so that's all for the dataset health check and that was a pretty comprehensive tour of the coco data set the last thing i want to talk about is using the cocoa data set for pre-training and starting pre-chain checkpoints based on the cocoa data set so one of the most powerful things about the coco dataset is that it allows researchers to train very large models like efficient debt d7x and then we can take those pre-trained checkpoints from the cocoa dataset crystallize in model weights and then start on a new task and this is called transfer learning and it's a very powerful way to utilize the coco dataset and a lot of models that you're going to look at from object detection models are going to ship you this pre-trained checkpoint and it's important to know what it is so it's basically a model has been trained all the way through the coco data set the weights have been saved and now you're using it to go on to your next task so the model has already learned how to identify different features and has learned generally the sense of what an object is it's learned a vast array of objects so the object you're detecting might be something very similar so for example the cocoa dataset can identify people which is uh going to be able to identify me so if i'm training a detector to tell whether i have a mask or not um it might be very good to start with the cocoa pre-trained checkpoint because it's already used to identifying people and has a sense of what a face is and then we can build our model from there so lastly i'm going to show an example of using the coco dataset as a pre-trained checkpoint so here if we go back over to mass.ai you can see we're at backslash coco before and now we're going to go over to backslash mask and here it will take a second for the model to load but here you can see that even with just a few epochs on a very small data set of 100 images i've already started to get the model to understand what it might look like to actually not be wearing a mask or wearing a mask so here you can see it's identifying me and saying no mask in red and that is uh that is a bad sign given these times and here's an example of using the coco dataset to launch in to uh the next uh the next detection of any object in the world and this is really the way to leverage the cocoa data set to move into any object past the 80 classes that they have given you in the data set is you want to take the pre-trade checkpoints use them as as a starting point and get a custom data set and then you'll be able to move much faster than if you had just started from scratch so thanks for listening today that was a deep dive on the coco dataset and happy detecting

Original Description

In this video, we take a deep dive into the Microsoft Common Objects in Context Dataset (COCO). We show a COCO object detector live, COCO benchmark results, COCO example images, COCO class distribution, and more! Documentation on mAP: https://blog.roboflow.ai/what-is-mean-average-precision-object-detection/ COCO Leaderboard: https://paperswithcode.com/sota/object-detection-on-coco COCO Explorer: https://cocodataset.org/#explore Roboflow Dataset Health Check: https://blog.roboflow.ai/resize-images-with-dimension-insights/ ✅ Subscribe: https://bit.ly/rf-yt-sub Follow us on Twitter: https://twitter.com/roboflowAI

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Roboflow · Roboflow · 10 of 60

← Previous Next →

YOLOv3 PyTorch Notebook Tutorial

YOLOv3 PyTorch Notebook Tutorial

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv5 on a Custom Dataset

How to Train YOLOv5 on a Custom Dataset

How to Use the Roboflow Dataset Health Check

How to Use the Roboflow Dataset Health Check

What is Mean Average Precision (mAP)?

What is Mean Average Precision (mAP)?

How to Use the Roboflow Model Library

How to Use the Roboflow Model Library

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

Ask the Roboflow Team Anything - Episode 1

Ask the Roboflow Team Anything - Episode 1

Exploring The COCO Dataset

Exploring The COCO Dataset

Community Spotlight: Improving Uno with Computer Vision

Community Spotlight: Improving Uno with Computer Vision

Mosaic Data Augmentation - Deep Dive

Mosaic Data Augmentation - Deep Dive

Hands on with the OAK-1

Hands on with the OAK-1

Glenn Jocher: What is New in YOLO v5?

Glenn Jocher: What is New in YOLO v5?

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

Tackling the Small Object Problem in Object Detection

Tackling the Small Object Problem in Object Detection

Fast.ai v2 Released - What's New?

Fast.ai v2 Released - What's New?

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

How to Train a Custom Resnet34 Image Classification Model

How to Train a Custom Resnet34 Image Classification Model

How to Label Images for Object Detection with CVAT

How to Label Images for Object Detection with CVAT

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Getting Started with VoTT - Computer Vision Annotation

Getting Started with VoTT - Computer Vision Annotation

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Train YOLOv4 on a Custom Dataset in Darknet

How to Train YOLOv4 on a Custom Dataset in Darknet

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Getting Started with Image Data Augmentation

Getting Started with Image Data Augmentation

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

GA Hosts Roboflow - Healthcare and AI

GA Hosts Roboflow - Healthcare and AI

How do self driving cars know when to stop?

How do self driving cars know when to stop?

What is PASCAL VOC XML?

What is PASCAL VOC XML?

AutoML Showdown: Google vs Amazon vs Microsoft

AutoML Showdown: Google vs Amazon vs Microsoft

How is computer vision changing manufacturing?

How is computer vision changing manufacturing?

The Alphabet in American Sign Language

The Alphabet in American Sign Language

Luxonis OAK-D: Computer Vision on Device

Luxonis OAK-D: Computer Vision on Device

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

TensorFlow vs PyTorch: Fireside

TensorFlow vs PyTorch: Fireside

Occlusion Techniques in Computer Vision

Occlusion Techniques in Computer Vision

A Customizable Web Application for Your Computer Vision Model

A Customizable Web Application for Your Computer Vision Model

Model Tradeoffs and the Future of Computer Vision

Model Tradeoffs and the Future of Computer Vision

Designing an Augmented Reality Board Game App

Designing an Augmented Reality Board Game App

YOLOv4 - Advanced Tactics

YOLOv4 - Advanced Tactics

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

Fireside Chat: Computer Vision in Agriculture

Fireside Chat: Computer Vision in Agriculture

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

What is Image Preprocessing?

What is Image Preprocessing?

Building a Community of Creators with BlkArthouse and Von Deon

Building a Community of Creators with BlkArthouse and Von Deon

How to Train Scaled-YOLOv4 to Detect Custom Objects

How to Train Scaled-YOLOv4 to Detect Custom Objects

Intro to Computer Vision: Fireside

Intro to Computer Vision: Fireside

The Best Way to Annotate Images for Object Detection

The Best Way to Annotate Images for Object Detection

The Computer Vision Process: Fireside

The Computer Vision Process: Fireside

How to Annotate Images with Your Team Using Roboflow

How to Annotate Images with Your Team Using Roboflow

Introducing the Roboflow Object Count Histogram

Introducing the Roboflow Object Count Histogram

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

CLIP: OpenAI's amazing new zero-shot image classifier

CLIP: OpenAI's amazing new zero-shot image classifier

How I hacked my Nest camera to run custom models

How I hacked my Nest camera to run custom models

Getting Started with the Roboflow Inference API

Getting Started with the Roboflow Inference API

Transfer Learning in Computer Vision | What, How, Why

Transfer Learning in Computer Vision | What, How, Why

The COCO dataset is a benchmark for object detection and understanding common objects in context, and can be used as a starting point for custom object detection tasks. The video explores the dataset, its class distribution, and how to use pre-trained checkpoints for object detection.

Key Takeaways

Explore the COCO dataset and its class distribution
Use Roboflow's dataset health check to analyze the dataset
Fine-tune a pre-trained model for custom object detection tasks
Use the COCO dataset as a starting point for object detection tasks
Evaluate the performance of the model using the average precision metric

💡 The COCO dataset has a class balance issue, with underrepresented classes being labeled in red, and using pre-trained checkpoints can be beneficial for object detection tasks.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

Related AI Lessons

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Learn how to build an AI-powered exam monitoring system using Computer Vision and DeepFace to assist professional certification exams

Medium · Python

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance professional certification exams

Medium · Deep Learning

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance exam security and integrity

Medium · Cybersecurity

Your Face Is About to Become Your Phone Number

Indonesia's mandatory facial verification for SIM cards is a massive test for biometric identity verification at scale, with implications for developers in computer vision and biometrics

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan