Exploring The COCO Dataset
Key Takeaways
The video explores the Microsoft Common Objects in Context Dataset (COCO) for object detection and understanding common objects in context, using tools such as EfficientDet, Detective RS, and Roboflow, and discussing concepts like average precision metric and class balance.
Full Transcript
greetings this is jacob from roboflow here today to talk about the common objects in context data set or the coco data set before we dive into a little bit deeper data set introspection we're going to see here just an example of exactly what the coco dataset allows us to do and what the cocodataset allows us to train models to be able to see and understand so here you can see that i have an object detector running based on the cocoa dataset detecting a person so here you can see that the the model is detecting basically the bounding box in which it sees me as a person let's see what else it can detect so here i'm going to bring in a new object oh there we go we can see that it's detecting a book and it's doing a decent job of that bounding a box around where it sees the book is let's try it on a different object here i'm going to try to see if it can identify my cell phone oh yeah it's doing a pretty good job there and now let's see if we can stump it um here i have another object uh this one it's missing it's there we go i thought it was a sandwich there for a second sports ball oh wow frisbee um okay so you can see it's not doing so good at detecting a yellow cup there uh so basically the motivation of the cocoa data set is to see how well we can get these models to understand common objects in context like books cell phones people and cups and the better and better that models start doing on the coco data set is what's defining the state of the art and computer vision so now jumping in and taking a look at the state of the art here we have the leaderboard for the cocoa dataset so this is what really is all the buzz in computer vision these days is uh it's it's a lot of object detection and it's a lot of how well are these models doing on the coco data set so here we have the leaderboard and we can see basically here the box ap which is a measure of performance of how well these models have been doing over time and you can see that it is just a rising tide the models are getting better and better they're getting bigger they're getting more efficient they're getting the convolutional networks that form the backbone of these detectors are improving the neck of the object detector where these features are being pulled is getting researched and improved and the way that the training techniques are deployed is also really pushing the state of the art in in object detection on the coco data set so here most recently we can see that the leader the number one model on the coco data dataset is efficient debt d7x and this is a recent release of efficient debt because you can see here that it used to be down here but now the d7x is really the highest one beating out detective rs which is you know really exciting news and you can see here that they're measuring all these based on box ap so below we actually have a link for a little more details on what map is but for now you can just think of it as a measure of performance but the goal of this video is going to be really to understand what is going into the data set that forms this average precision metric and what are we exactly measuring when when we're looking um at this metric and what what is the data set and how can we uh maybe improve uh performance on this data set just by knowing more about it um so now kind of moving along here you can see the maximum box ap so this is just the very best that any model can do on the coco data set but in reality you're going to be more seeing graphs like this one which is basically how well is a model doing on the cocoa dataset relative to how long it takes to do inference so the small the faster the inference probably the smaller the model and the faster to train and it's a little bit more tractable so smaller models are definitely better and we want to prefer this and you'll see the cocoa ap valve on the on the y-axis so that is how well is the model doing on the coco validation set um and this is done for a lot of preliminary research before people actually test on the testing set which is which is held back by by the microsoft coco dataset providers so here we can see we have a couple models compared this is the efficient debt lineage which is the smaller efficient debt models compared to yellow v5 which is a state-of-the-art object detector for fast fast and performant inference another important thing even though there's more details in the map video below to know about the average precision metric is that it's average across all class labels in the coco data set so as we go and dive deeper into what the data set is it's going to be important to remember that this object detector is is measured as a mean across all class labels so no matter how well populated the class this metric is going to be measuring across all of those so if you really want to be beating this metric you're going to have to be able to recognize all class labels including yellow cups or rather just generally cups so now taking a look at what is uh what all is in the coco data set so here i have the most recent microsoft coco or one of one of the original coco papers and here they segment out some of the goals of what they're trying to annotate in the data set so let's say you have an image there's a variety of ways you can actually extract data out of this image and try to train models to extract data out of the image so the first way is classification this is just saying basically what is in the data set so uh in this upper left image here you see that there's people there's sheep and there's a dog um so you could just kind of keep it at that very base classification level um and then you could also go further and try to localize objects and localize them with boxes computers are are are good at kind of making floating point estimates and uh therefore the box is a convenient way to to do object localization um and then you localize that and then you attach a class label to it um as we saw with the detector as it was recognizing that i was a person and that this was a book now you can go further from just plain box detection and you can do actually semantic segmentation which is defining the outline of objects and then also they went even further and they determined you know rather than just doing some segmentation around objects they segmented those out into distinct objects so now diving in a little bit deeper into what all is in the data set so this is the coco dataset explorer here you can see that there's uh over a hundred thousand images which is a very large data set it takes a long time to train and it is it is a very powerful data set uh and over actually 800 000 instances of objects so in the cocoa explorer you can uh come in at cocodataset.org hashtag explorer and you can get a feel for kind of what is what are the objects in in the data set so here in this map we can see that there's a variety of class labels and all these class labels are kind of broken out here in natural category groups so here we can see that we have food we have electronics and these are kind of like everyday objects and with the explorer we can click on an object and we can actually see examples of that so a fun one might be elephants let's see what kind of elephants are in the data set okay so there's 2 000 results of images that actually have elephants and there we go you can see here they have the masks over the elephants so you can see where the semantic segmentation of the elephants lies now another cool thing about this is you can see instances where the objects actually appear in tandem so let's say i want to see all examples of uh let's say elephants and laptops see what we get back oh it's actually zero results so there actually are no images where elephants are appearing in the context of laptops that's not too surprising what about tvs let's see it's taking a little while to run it means there might be some examples okay there's five results of elephants and tvs oh and it looks like maybe the elephant's on screen there or the elephant's a figurine and oh here we go we have looks like an elephant in an airport bay anyways this is a good example of what this dataset is it's just all kinds of random objects in context and they've tried to choose maybe the 80 most prevalent objects to bend into classes and then and then to annotate from there so now going a little deeper we're going to take a look at the microsoft coco data set within rebel flow to use the roboflow dataset health check to see how deeply we can actually analyze a data set from that point of view so now i've loaded the microsoft cocoa data set into rebel flow which roboflow can indeed handle data sets of this size and it's a very powerful tool to be looking at your data so here in the dataset front page there's a few things we can do as we create data set versions we can add pre-processing steps so we can actually kind of morph the data set and do some things like grayscaling and tiling and resizing to go into a model we can modify the classes so if i decided there was a class that i didn't want in the coco data set i could remove it or i could rename it and then finally there's uh an ability to make augmentation so here you can actually vastly increase the size of your data set uh by doing things like rotation or adding noise or adding blur or using mosaic which actually is a very interesting example where you can be putting objects in different corners of the image this helps make your data set a lot larger with the coco dataset this is a little bit less important because you already have a lot of examples but in a sparser dataset augmentation is going to be a very important thing to bring your detector up to the performance you need without having to go gather and label more data and then a very powerful tool once your data is loaded into rebel flow is by looking at the dataset health check we can really determine what is going on inside of the dataset so here we have the cocodataset the validation set looking at the dataset healthcheck so first of all we can see there's 5000 images in the validation set for the coco 2017 object detection data set we can see that there's 36 000 annotations and we can see that there's a class balance here here we can see that you know of those 36 000 annotations 10 000 of them are people which means that the coco dataset is actually really predominantly a people data set and dropping off from there we can see the objects dropping off in prevalence so here we have underrepresented upper underrepresented classes labeled in red which actually um is sort of indicating that this could be a dangerous area because it's very hard for an object detector to learn examples that it hasn't seen that many examples of so it's it's going to be very hard for our object detector to learn toaster for example because there's only nine there's only nine examples in the validation set so if the training set is 20 times that size that means that there's probably only going to be 180 toasters to learn from in the midst of say 200 000 people so according to the last function it's definitely going to be optimizing be guessing people that's going to be the default assumption and it's going to be very hard to generalize down uh all the way to the underrepresented toaster oven or stop sign or bear or snowboard these are going to be hard class labels and as we were talking before with the map metric that is averaging across all classes so it's going to be very difficult for the model to be getting good scores across the board on all these different classes especially when it's when it's very underrepresented so that's a modeling challenge to be able to get through a sparsely labeled data set like this another thing that the health check allows you to do is look at the dimension insights of the data set so here we can see the different sizes of the images in the data set so this basically the big takeaway here is uh that the coco data set is not a uniform size the images are kind of all over the place and this is important to remember as you're resizing images and keeping track of your annotations these are all things that the roboflow platform does for you and then the last feature here on the rebel slow dataset health check is the annotation heat map so this lets you kind of start to look at the localization of your annotations notably um at a data large level it makes sense here that most objects are actually occurring in the middle of the data set but you can see that there's less actually occurring on the corners and the edges and then you can filter by class label so one class label of interest is actually umbrella so we can see here that most umbrellas are actually occurring in the top half of the coco dataset images and that might be something that you want you might say that hey you know actually umbrellas are normally appearing in the upper half of the image so therefore this is a safe assumption and our model might as well start to kind of learn this localization but you also might sort of think to yourself and say you know what maybe that's something that i didn't want to happen and i don't want to miss umbrellas that are say laying on the ground and then you would want to be going back to collect more images and to help even out this distribution of where the localization of your images is occurring so that's all for the dataset health check and that was a pretty comprehensive tour of the coco data set the last thing i want to talk about is using the cocoa data set for pre-training and starting pre-chain checkpoints based on the cocoa data set so one of the most powerful things about the coco dataset is that it allows researchers to train very large models like efficient debt d7x and then we can take those pre-trained checkpoints from the cocoa dataset crystallize in model weights and then start on a new task and this is called transfer learning and it's a very powerful way to utilize the coco dataset and a lot of models that you're going to look at from object detection models are going to ship you this pre-trained checkpoint and it's important to know what it is so it's basically a model has been trained all the way through the coco data set the weights have been saved and now you're using it to go on to your next task so the model has already learned how to identify different features and has learned generally the sense of what an object is it's learned a vast array of objects so the object you're detecting might be something very similar so for example the cocoa dataset can identify people which is uh going to be able to identify me so if i'm training a detector to tell whether i have a mask or not um it might be very good to start with the cocoa pre-trained checkpoint because it's already used to identifying people and has a sense of what a face is and then we can build our model from there so lastly i'm going to show an example of using the coco dataset as a pre-trained checkpoint so here if we go back over to mass.ai you can see we're at backslash coco before and now we're going to go over to backslash mask and here it will take a second for the model to load but here you can see that even with just a few epochs on a very small data set of 100 images i've already started to get the model to understand what it might look like to actually not be wearing a mask or wearing a mask so here you can see it's identifying me and saying no mask in red and that is uh that is a bad sign given these times and here's an example of using the coco dataset to launch in to uh the next uh the next detection of any object in the world and this is really the way to leverage the cocoa data set to move into any object past the 80 classes that they have given you in the data set is you want to take the pre-trade checkpoints use them as as a starting point and get a custom data set and then you'll be able to move much faster than if you had just started from scratch so thanks for listening today that was a deep dive on the coco dataset and happy detecting
Original Description
In this video, we take a deep dive into the Microsoft Common Objects in Context Dataset (COCO).
We show a COCO object detector live, COCO benchmark results, COCO example images, COCO class distribution, and more!
Documentation on mAP: https://blog.roboflow.ai/what-is-mean-average-precision-object-detection/
COCO Leaderboard: https://paperswithcode.com/sota/object-detection-on-coco
COCO Explorer: https://cocodataset.org/#explore
Roboflow Dataset Health Check: https://blog.roboflow.ai/resize-images-with-dimension-insights/
✅ Subscribe: https://bit.ly/rf-yt-sub
Follow us on Twitter: https://twitter.com/roboflowAI
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Roboflow · Roboflow · 10 of 60
1
2
3
4
5
6
7
8
9
▶
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
YOLOv3 PyTorch Notebook Tutorial
Roboflow
How to Train YOLOv4 on a Custom Dataset (PyTorch)
Roboflow
How to Train YOLOv5 on a Custom Dataset
Roboflow
How to Use the Roboflow Dataset Health Check
Roboflow
What is Mean Average Precision (mAP)?
Roboflow
How to Use the Roboflow Model Library
Roboflow
How to Train EfficientDet in TensorFlow 2 Object Detection
Roboflow
How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset
Roboflow
Ask the Roboflow Team Anything - Episode 1
Roboflow
Exploring The COCO Dataset
Roboflow
Community Spotlight: Improving Uno with Computer Vision
Roboflow
Mosaic Data Augmentation - Deep Dive
Roboflow
Hands on with the OAK-1
Roboflow
Glenn Jocher: What is New in YOLO v5?
Roboflow
How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model
Roboflow
An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect
Roboflow
How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)
Roboflow
Tackling the Small Object Problem in Object Detection
Roboflow
Fast.ai v2 Released - What's New?
Roboflow
Teaser: Roboflow Train (1-Click Computer Vision AutoML)
Roboflow
How to Train a Custom Resnet34 Image Classification Model
Roboflow
How to Label Images for Object Detection with CVAT
Roboflow
Deploy YOLOv5 to Jetson Xavier NX at 30 FPS
Roboflow
Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz
Roboflow
Getting Started with VoTT - Computer Vision Annotation
Roboflow
How to Manage Classes in Object Detection (Rename, Combine, Balance)
Roboflow
How to Train YOLOv4 on a Custom Dataset in Darknet
Roboflow
Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?
Roboflow
Getting Started with Image Data Augmentation
Roboflow
Glenn Jocher: Image Augmentation in YOLO v5 and Beyond
Roboflow
GA Hosts Roboflow - Healthcare and AI
Roboflow
How do self driving cars know when to stop?
Roboflow
What is PASCAL VOC XML?
Roboflow
AutoML Showdown: Google vs Amazon vs Microsoft
Roboflow
How is computer vision changing manufacturing?
Roboflow
The Alphabet in American Sign Language
Roboflow
Luxonis OAK-D: Computer Vision on Device
Roboflow
How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset
Roboflow
TensorFlow vs PyTorch: Fireside
Roboflow
Occlusion Techniques in Computer Vision
Roboflow
A Customizable Web Application for Your Computer Vision Model
Roboflow
Model Tradeoffs and the Future of Computer Vision
Roboflow
Designing an Augmented Reality Board Game App
Roboflow
YOLOv4 - Advanced Tactics
Roboflow
How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection
Roboflow
Fireside Chat: Computer Vision in Agriculture
Roboflow
Scaled-YOLOv4 Tops EfficientDet: Research Rundown
Roboflow
What is Image Preprocessing?
Roboflow
Building a Community of Creators with BlkArthouse and Von Deon
Roboflow
How to Train Scaled-YOLOv4 to Detect Custom Objects
Roboflow
Intro to Computer Vision: Fireside
Roboflow
The Best Way to Annotate Images for Object Detection
Roboflow
The Computer Vision Process: Fireside
Roboflow
How to Annotate Images with Your Team Using Roboflow
Roboflow
Introducing the Roboflow Object Count Histogram
Roboflow
How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips
Roboflow
CLIP: OpenAI's amazing new zero-shot image classifier
Roboflow
How I hacked my Nest camera to run custom models
Roboflow
Getting Started with the Roboflow Inference API
Roboflow
Transfer Learning in Computer Vision | What, How, Why
Roboflow
More on: CV Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Python
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Deep Learning
When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…
Medium · Cybersecurity
Your Face Is About to Become Your Phone Number
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI