How to Use the Roboflow Dataset Health Check

Roboflow · Intermediate ·👁️ Computer Vision ·5y ago

Skills: CV Basics80%Modern CV Models60%

Key Takeaways

The video demonstrates how to use Roboflow's Dataset Health Check to improve the quality of computer vision datasets, including analyzing annotation quality, class balance, and image size.

Full Transcript

hey there it's Joseph from Roma flow I'm gonna show you how you can use Robo flows data set health check to get the most out of your computer vision data sets now for today's example we're gonna be walking through the hardhat workers data set if you want to follow along you can actually find this data set on public dot Robo flow dot AI so if you go to public dot rubble fills out a I presented with all these free public image data sets and there's the hardhat workers data set I already have this data set in my account and I've pulled up the hardhat workers data set health check let's dive on in first things first I have seven thousand and thirty five images in my data set I don't have any missing annotations meaning all of my image files have a matching annotation file I also don't have any null examples 0 no example a null example is when I have an image that doesn't contain any of the objects that I wanted to detect in that given image so for example I don't have an image of a construction site that doesn't contain a worker a person or someone that doesn't have a hard head on I have twenty seven thousand and thirty nine annotations across my seven thousand images or approximately 3.8 per image that's good I mean it's pretty good feature richness across my three classes I also generally have pretty small images 0.17 megapixels my smallest one is point zero three megapixels my largest is 0.67 I'd actually click on that if I want to see this itty bitty small image and see its dimensions inside my dataset and here I see this this image here that's is 167 by 154 which is which is helpful okay now my class balance so having balanced classes is important in computer vision because we want our model to learn evenly across different objects that we want to teach it to recognize in this data set I might have some problems I have a pretty good representation of helmets nineteen thousand seven hundred forty-seven helmet examples I have about six thousand six hundred some examples of heads people without helmets but I only have six hundred fifteen examples of persons meaning people that don't have a helmet or a or just their face instead now I can zoom in and see show me Robo flow show me my data set every single image that contains a person remember there is 615 annotated people but that doesn't mean that there's 615 images of people why because there could be multiple people in a single image and in fact that's what we see here there's 209 images but a lot of these images have multiple people annotated like this one or actually this is a really good example here where I have all these people annotated and in the hard hat around those people as well okay now that's my health check so I also have here the size information me size matters because it helps inform the resize decision I want to make if I resize my images to square as most models require our my model is gonna be or our my image is gonna be stretched down or stretched up and it kind of depends in this data set it looks like my median average size is 500 by 333 so I might not want to go much bigger than 333 most the time you know we resize anywhere between 300 by 300 to 640 by 640 or somewhere in between sometimes bigger sometimes smaller it all depends of course on the context of your problem but I wouldn't want to go much bigger than 333 pixels here for the the height because that would stretch out my pixels more than I want the width is generally 500 pixels wide so maybe a good resize decision would be 300 by 300 perhaps a kind of depends again on the context to your problem I also see that the aspect ratio here a lot of my images are wider than they are square in fact you see here I have this line here where it goes directly across if an image is just as tall as it is wide that means it is a perfect square if it's wider than it is tall and it's a wide image and if it's taller than it is wide then it's a tall image I could have images that are very tall or very wide meaning if I stretch things to be square it might mess up their aspect ratio but conversely if I could preserve the aspect ratio it might create a lot of white or black padding in my resize decision rebel flow shows me those previews my pre-processing steps if I wanted to have a look so this is all useful information to inform my resize decision in particular now I also have the annotation heat map to understand generally in my image where are my objects appearing the helmets are generally across the my image the heads also generate across the top and persons are actually emanating from the bottom of my images that kind of makes sense right so this is like a gut check are my objects generally where I expect them to be across my images and if I were to make a resize decision or crop or change the size of my objects would they be in positions where they're in the frame of my image this helps make sure that you have a very visual quick check without manually combing through surfacing all these individual annotations in a quick one-stop way so that's kind of an overview of the things that are available to you and your health check now be sure to LIKE and subscribe to the Robo flow channel to learn more computer vision tips we cannot wait to see what you build using rubble flow to incorporate computer vision into your problems thanks so much for watching

Original Description

Roboflow enables developers to use computer vision, and computer vision engineers to get the most out of their data. In this walkthrough, we show you how to use the Dataset Health Check to improve the quality of your annotations, inform resize and augmentation decisions, and ensure your data is the best it can be.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Roboflow · Roboflow · 4 of 60

← Previous Next →

YOLOv3 PyTorch Notebook Tutorial

YOLOv3 PyTorch Notebook Tutorial

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv5 on a Custom Dataset

How to Train YOLOv5 on a Custom Dataset

How to Use the Roboflow Dataset Health Check

How to Use the Roboflow Dataset Health Check

What is Mean Average Precision (mAP)?

What is Mean Average Precision (mAP)?

How to Use the Roboflow Model Library

How to Use the Roboflow Model Library

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

Ask the Roboflow Team Anything - Episode 1

Ask the Roboflow Team Anything - Episode 1

Exploring The COCO Dataset

Exploring The COCO Dataset

Community Spotlight: Improving Uno with Computer Vision

Community Spotlight: Improving Uno with Computer Vision

Mosaic Data Augmentation - Deep Dive

Mosaic Data Augmentation - Deep Dive

Hands on with the OAK-1

Hands on with the OAK-1

Glenn Jocher: What is New in YOLO v5?

Glenn Jocher: What is New in YOLO v5?

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

Tackling the Small Object Problem in Object Detection

Tackling the Small Object Problem in Object Detection

Fast.ai v2 Released - What's New?

Fast.ai v2 Released - What's New?

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

How to Train a Custom Resnet34 Image Classification Model

How to Train a Custom Resnet34 Image Classification Model

How to Label Images for Object Detection with CVAT

How to Label Images for Object Detection with CVAT

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Getting Started with VoTT - Computer Vision Annotation

Getting Started with VoTT - Computer Vision Annotation

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Train YOLOv4 on a Custom Dataset in Darknet

How to Train YOLOv4 on a Custom Dataset in Darknet

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Getting Started with Image Data Augmentation

Getting Started with Image Data Augmentation

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

GA Hosts Roboflow - Healthcare and AI

GA Hosts Roboflow - Healthcare and AI

How do self driving cars know when to stop?

How do self driving cars know when to stop?

What is PASCAL VOC XML?

What is PASCAL VOC XML?

AutoML Showdown: Google vs Amazon vs Microsoft

AutoML Showdown: Google vs Amazon vs Microsoft

How is computer vision changing manufacturing?

How is computer vision changing manufacturing?

The Alphabet in American Sign Language

The Alphabet in American Sign Language

Luxonis OAK-D: Computer Vision on Device

Luxonis OAK-D: Computer Vision on Device

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

TensorFlow vs PyTorch: Fireside

TensorFlow vs PyTorch: Fireside

Occlusion Techniques in Computer Vision

Occlusion Techniques in Computer Vision

A Customizable Web Application for Your Computer Vision Model

A Customizable Web Application for Your Computer Vision Model

Model Tradeoffs and the Future of Computer Vision

Model Tradeoffs and the Future of Computer Vision

Designing an Augmented Reality Board Game App

Designing an Augmented Reality Board Game App

YOLOv4 - Advanced Tactics

YOLOv4 - Advanced Tactics

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

Fireside Chat: Computer Vision in Agriculture

Fireside Chat: Computer Vision in Agriculture

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

What is Image Preprocessing?

What is Image Preprocessing?

Building a Community of Creators with BlkArthouse and Von Deon

Building a Community of Creators with BlkArthouse and Von Deon

How to Train Scaled-YOLOv4 to Detect Custom Objects

How to Train Scaled-YOLOv4 to Detect Custom Objects

Intro to Computer Vision: Fireside

Intro to Computer Vision: Fireside

The Best Way to Annotate Images for Object Detection

The Best Way to Annotate Images for Object Detection

The Computer Vision Process: Fireside

The Computer Vision Process: Fireside

How to Annotate Images with Your Team Using Roboflow

How to Annotate Images with Your Team Using Roboflow

Introducing the Roboflow Object Count Histogram

Introducing the Roboflow Object Count Histogram

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

CLIP: OpenAI's amazing new zero-shot image classifier

CLIP: OpenAI's amazing new zero-shot image classifier

How I hacked my Nest camera to run custom models

How I hacked my Nest camera to run custom models

Getting Started with the Roboflow Inference API

Getting Started with the Roboflow Inference API

Transfer Learning in Computer Vision | What, How, Why

Transfer Learning in Computer Vision | What, How, Why

The video teaches how to use Roboflow's Dataset Health Check to analyze and improve computer vision datasets, covering topics such as annotation quality, class balance, and image size. This is important because high-quality datasets are essential for training accurate computer vision models. By following the steps outlined in the video, viewers can learn how to use the Dataset Health Check to identify and address issues in their own datasets.

Key Takeaways

Upload a dataset to Roboflow
Run the Dataset Health Check
Analyze annotation quality and class balance
Evaluate image size and aspect ratio
Use the annotation heat map to understand object placement
Make informed resize decisions based on the analysis

💡 The Dataset Health Check provides a comprehensive overview of a computer vision dataset, allowing users to identify and address issues that could impact model performance.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

Related AI Lessons

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Learn how to build an AI-powered exam monitoring system using Computer Vision and DeepFace to assist professional certification exams

Medium · Python

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance professional certification exams

Medium · Deep Learning

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance exam security and integrity

Medium · Cybersecurity

Your Face Is About to Become Your Phone Number

Indonesia's mandatory facial verification for SIM cards is a massive test for biometric identity verification at scale, with implications for developers in computer vision and biometrics

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan