Mosaic Data Augmentation - Deep Dive

Roboflow · Intermediate ·🛠️ AI Tools & Apps ·5y ago

Skills: CV Basics90%LLM Foundations70%Prompt Craft60%

Key Takeaways

The video discusses Mosaic Data Augmentation, a new state-of-the-art technique for object detection models, and its implementation using tools like Coco data set, Cut out, Cut mix, and Yolo v5 repo. It also covers the concepts of data augmentation, object localization, and model robustness.

Full Transcript

hey guys this is jacob from roboflow here today to talk about mosaic data augmentation which is a new and very exciting augmentation technique in computer vision uh so a little bit of motivation for what we're going to be talking about today mosaic data augmentation is a new augmentation that is pushing the state of the art for object detection models here we have two graphs we have uh yellow v4 on the left and we have yolo v5 on the right both both of them are employing mosaic data augmentation to push their performance on state-of-the-art object detection tests so as you can see these models are getting better and better as they're improving in average precision which is the y-axis of these graphs and the way that they're doing this is they're employing new model techniques but they're also employing new data augmentation techniques and oftentimes the data augmentation techniques outweigh the importance of the modeling techniques and mosaic data augmentation is one of those cutting edge augmentations that is really pushing the state of the art of these of these of these models but it's it's very important to know that the mosaic augmentation is applied on the coco data set which is a general data set so it's important to know the augmentation for yourself before you jump into your own object detection modeling task so now here's a little bit of a road map of the video that we're going to be going through today first we're going to be talking about mosaic augmentation predecessors which are cut out and cut mix after that we're going to be talking a little bit more about the nuts and bolts of what exactly mosaic that data augmentation is and then we'll talk about exactly how one would go through and implement mosaic augmentation all the way from the theory all the way through the actual implementation and code and then we're going to talk about when mosaic might work best or when it might not work well after that we're going to talk about how we could possibly improve mosaic going forward as researchers and as as as individuals in computer vision what might be the next thing for mosaic data augmentation and then after that i'm going to do a little bit of a live demo where we're actually going to implement mosaic on a live data set and we're going to see mosaic actually transform our images all right so moving along the first thing we're going to cover here is the predecessor augmentations to mosaic so this is important as we're kind of thinking about what motivated researchers to come up with mosaic data augmentation and and what ultimately has led to um mosaic the birth of of mosaic data augmentation so these techniques here are image augmentation techniques that are regularization techniques so uh regular regularization technique is designed to uh basically um make it so your model doesn't over fit to the training data that it's seeing and it's it's able to adapt robustly as it's thrown into new data environments so here we can see a couple of these augmentation techniques so here we have cutout which is basically where you cut out a chunk of the image and that is supposed to block the model from learning features and parts of the image that it might rely too highly on so for example in order for a model to identify saint bernard it might rely very heavily on the uh the dog's head here um so you might want to cut out this piece so it actually starts to learn to identify dogs through uh the back part of of the animal so that that's one way that you might be kind of trying to teach your model robustness and and being able to adapt to new environments um now the other one is cut mix where instead of just cutting out a black square you actually take a piece of a different image and you move that piece over to where the cutout was originally so here for saint bernard and poodle we can see we've actually mixed those two images together and if you're looking at the cam for each of these uh there's more activation for the saint bernard rate on that back half of the dog and for poodle there's actually more on the right half so that that's showing that the model is kind of starting to learn robustness to be able to identify these things in different places by cutting and mixing different pieces of the images so now moving along this is the mosaic data augmentation so this is kind of coming out of that idea of mixing images together to make the model perform more in different scenarios and teach it to localize objects in different places so mosaic data augmentation takes the image and it tiles it into four and then puts those four into four different corners uh and combines all the annotations in that one place um so here you can see uh i'm i have a mask uh wearing data set and all these people are wearing masks and they're tagged in different places but you can see the different images that have been brought together um so the the model is going to have to localize these objects in different places and it's going to have to learn slightly different contexts that are around it so it won't start to rely too heavily on what it originally sees in the training set because the training set is going to be multiplied in all these different ways and it's going to have to be learning all around that so this is a very very powerful augmentation and certainly the academic results are there to back that up so this is this is mosaic and this is a very important augmentation we're going to be going through now how you might actually implement this if you wanted to on your own data set so now i'm going to go through how mosaic is implemented in the yolo v5 repo and this is uh not necessarily the only way to do it in fact we will point out some downsides and uh how those could be addressed um and uh here i'm just gonna go through uh basically uh walk through the steps that uh happen in yellow e5 so here um basically i've overestimated the the differences here but um you may have you'll bring four images together those will be picked randomly from the data set and you'll want to combine these into uh into the mosaic pattern so the first thing that the llv5 repo does is it resizes each of those images to be the same size and then it brings all four of those together in the same grid so here we have a grid let's say uh was resized to 416x416 and then all those images are pasted together um so the other thing you need to keep in mind is each one of those images will have bounding boxes where the objects will be localized with bounding boxes and all those will need to be brought in at the same time so now you have all four images all resize all their bounding boxes are resized and they're all brought into a 4x4 or 2x2 square of the mosaic map and then finally after you kind of have all these images resized and stitched together you'll take a random crop from center in there where that's where this red line is representing and that will be your final image and you'll be doing that randomly as you're bringing in different images for each epoch you're going through and you're changing um the way all these mosaics are mapping and then you go ahead and cut that out and there you go there you have your mosaic tiled image so that's how one would actually go about and implement mosaic in practice now let's talk about um some scenarios in which mosaic works best and some scenarios where it might not work so well so this is really a big heavy hitting augmentation so you need to be careful when you use it and and know uh know when to not use it um so some examples that are really good for mosaic um are things like aerial imagery where objects could kind of appear at any place on the ground um but you might want to be moving those objects around in different places um and in different contexts to kind of be teaching the model a little bit more robustness um it's good a lot a lot of times for real world objects like if you're detecting fish or animals or various things that are kind of out in scenes because these things can usually be kind of moved around uh like you're seeing with with the dogs with cut out and cut mix um those are good examples where mosaic is probably really going to improve uh the performance of your model by leveraging the the power of your training set um another good area to use mosaic just in general is if you have only um if you have low object distribution and you know that you're going to want to be moving your objects into different parts of the image this is a good way to be using mosaic some examples of when not to use mosaic like if you have a data set of written documents with bounding boxes around text or or numbers this is probably going to be a bad place to use mosaic because it's going to become all jumbled it'll be very unrealistic um another bad place to use mosaic is if you have large prominent and upfront objects because these will be getting chopped down and randomly cropped um and so you'll just be kind of getting these big pieces of things that's not so easy not so easy to use um and then another example where one should be a little bit careful uh in using mosaic is with fixed location objects so this has to do with any data set where you actually know that you want those objects to be in the same place so for example uh if you're scanning a tray and you know the tray is going to be set on the table in the exact spot every time that the camera goes over and takes the image you don't want to use mosaic because that's going to be shifting it around in different parts of the image which would be rather unrealistic and would actually end up hurting your model when when you end up going to deploy it uh so now here are some of my thoughts on improving mosaic i'd be curious in the comments if you do have some yourself so one thing about the current uh implementation that we talked about uh is that since it's randomly cropping there from uh from the center is it's actually under sampling the middle of the image because it's more likely for the image to get included either in the top or the bottom than it is to go all the way down to the middle or all the way up to the middle um so that's actually uh one of the things that we've addressed in our implementation of mosaic at roboflow and we've seen some promising results uh to that end uh another idea that i had was um mosaic i originally thought that it was just tiling the images equally out in a four by four square um so i thought that it was actually kind of zooming uh zooming out so i thought this would be kind of a good way to address the the small object problem which is often a problem that models face is that a larger up front objects are more commonly labeled but things that are further away aren't so well so uh object detectors often don't uh don't see small objects very well so i thought maybe you know maybe you could kind of be modularly zooming in and out of that space that is being cropped in the image and that could uh kind of start to change the distribution size of objects which right now that is staying completely flat with the way that the current crop is working um and then another thing is to add in a probabilistic mosaic where it doesn't necessarily happen every time so that means that you could kind of keep some of the ground truth or training set without having to do mosaic every time necessarily now we're actually going to move on to a live demo so here we're going to see mosaic in real life on the rebelflow platform so this is really exciting i hope i hope you guys are excited as excited as i am here we have a mask wearing data set so this is what your data set will look like once you've loaded it uh into rebel flow which that is a very easy process you just drag and drop your data in once it's in then you your bounding boxes are automatically matched and you have your data set in roboflow so you can start using advanced augmentations like mosaic just with the click of a button so here's our mask wearing data set it's got images of people wearing masks or not wearing masks importantly so here's an image all these faces are tagged with masks but you can see this guy actually doesn't have a mask so that will be labeled as uh as no mask um real quick looking at our dataset health check here is a very powerful feature in roboflow where you can start to introspect your data set and uh sort of get a feel for the way things are looking so here we can see we've got 806 masks and 148 no masks so ideally maybe we would throw a few more of those in there um have a little bit more balance classes but importantly here we have our annotation heat map which shows that the annotations are actually kind of occurring up in this top part of the image so maybe if we were worried about the bottom parts of the image like right here where my hands are um then we we might want to use mosaic to be able to get more masked and unmasked people uh in the corners of the image actually so um let's go ahead and let's go ahead and do that so here you would click modify dataset you can see we've got our splits here um we can choose pre-processing steps so maybe we want to make sure that all our images are 4 16 by 4 16. say we know that's going to be kind of the inference size that we want to be going into to be able to inference at the speed that we want to um and then another thing we'll say here is uh we want to do mosaic tiling so here uh here we have a little jellyfish uh depiction of what's going to happen as we go through mosaic we can hit apply there we'll choose three augmentation so this will multiply our training set by three now an important thing to know is that your validation set and your test set will not get augmented they will get pre-processed but they won't get augmented because uh you want to be using those as sort of ground truth measures of the way that your your model is improving in performance um so once those are all set we can go ahead and click generate i'll just call this a mosaic version um and so you can kind of see what we're doing here we're we're going through we're experimenting we're trying to think about what sort of augmentations and pre-processing steps are going to lead to the best results for our data set so we're generating these data sets and then we'll be going over to modeling um to uh to then kind of go in and see which augmentations have really improved our data set the most uh here we can choose among all kinds of formats to download this but i'll go ahead and skip that for today and we can just go ahead and look at the images so let's go ahead and view all images and here we can see here all the mosaic augmentations have occurred and our data set is now split out into these uh 4x4 tiles so hopefully that will uh take us uh into the next level of our modeling and uh this was an introductory video into mosaic data augmentation i hope it brings you better model performance for many many days to come and stay tuned for the the next latest and greatest augmentations in computer vision and if you would be so kind as to subscribe below and like this video uh i would be very much obliged by that and uh uh we'll talk soon

Original Description

We review the new state of the art mosaic data augmentation. We discuss the following roadmap in this video: * Cutout and Cutmix predecessor augmentations * Mosaic augmentation high level introduction * Mosaic augmentation implementation * When to use (or not use) mosaic augmentation * How to improve mosaic augmentation * Mosaic Live Demo! Use mosaic on your dataset with Roboflow (no code): https://blog.roboflow.ai/advanced-augmentations/ ✅ Subscribe: https://bit.ly/rf-yt-sub

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Roboflow · Roboflow · 12 of 60

← Previous Next →

YOLOv3 PyTorch Notebook Tutorial

YOLOv3 PyTorch Notebook Tutorial

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv5 on a Custom Dataset

How to Train YOLOv5 on a Custom Dataset

How to Use the Roboflow Dataset Health Check

How to Use the Roboflow Dataset Health Check

What is Mean Average Precision (mAP)?

What is Mean Average Precision (mAP)?

How to Use the Roboflow Model Library

How to Use the Roboflow Model Library

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

Ask the Roboflow Team Anything - Episode 1

Ask the Roboflow Team Anything - Episode 1

Exploring The COCO Dataset

Exploring The COCO Dataset

Community Spotlight: Improving Uno with Computer Vision

Community Spotlight: Improving Uno with Computer Vision

Mosaic Data Augmentation - Deep Dive

Mosaic Data Augmentation - Deep Dive

Hands on with the OAK-1

Hands on with the OAK-1

Glenn Jocher: What is New in YOLO v5?

Glenn Jocher: What is New in YOLO v5?

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

Tackling the Small Object Problem in Object Detection

Tackling the Small Object Problem in Object Detection

Fast.ai v2 Released - What's New?

Fast.ai v2 Released - What's New?

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

How to Train a Custom Resnet34 Image Classification Model

How to Train a Custom Resnet34 Image Classification Model

How to Label Images for Object Detection with CVAT

How to Label Images for Object Detection with CVAT

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Getting Started with VoTT - Computer Vision Annotation

Getting Started with VoTT - Computer Vision Annotation

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Train YOLOv4 on a Custom Dataset in Darknet

How to Train YOLOv4 on a Custom Dataset in Darknet

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Getting Started with Image Data Augmentation

Getting Started with Image Data Augmentation

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

GA Hosts Roboflow - Healthcare and AI

GA Hosts Roboflow - Healthcare and AI

How do self driving cars know when to stop?

How do self driving cars know when to stop?

What is PASCAL VOC XML?

What is PASCAL VOC XML?

AutoML Showdown: Google vs Amazon vs Microsoft

AutoML Showdown: Google vs Amazon vs Microsoft

How is computer vision changing manufacturing?

How is computer vision changing manufacturing?

The Alphabet in American Sign Language

The Alphabet in American Sign Language

Luxonis OAK-D: Computer Vision on Device

Luxonis OAK-D: Computer Vision on Device

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

TensorFlow vs PyTorch: Fireside

TensorFlow vs PyTorch: Fireside

Occlusion Techniques in Computer Vision

Occlusion Techniques in Computer Vision

A Customizable Web Application for Your Computer Vision Model

A Customizable Web Application for Your Computer Vision Model

Model Tradeoffs and the Future of Computer Vision

Model Tradeoffs and the Future of Computer Vision

Designing an Augmented Reality Board Game App

Designing an Augmented Reality Board Game App

YOLOv4 - Advanced Tactics

YOLOv4 - Advanced Tactics

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

Fireside Chat: Computer Vision in Agriculture

Fireside Chat: Computer Vision in Agriculture

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

What is Image Preprocessing?

What is Image Preprocessing?

Building a Community of Creators with BlkArthouse and Von Deon

Building a Community of Creators with BlkArthouse and Von Deon

How to Train Scaled-YOLOv4 to Detect Custom Objects

How to Train Scaled-YOLOv4 to Detect Custom Objects

Intro to Computer Vision: Fireside

Intro to Computer Vision: Fireside

The Best Way to Annotate Images for Object Detection

The Best Way to Annotate Images for Object Detection

The Computer Vision Process: Fireside

The Computer Vision Process: Fireside

How to Annotate Images with Your Team Using Roboflow

How to Annotate Images with Your Team Using Roboflow

Introducing the Roboflow Object Count Histogram

Introducing the Roboflow Object Count Histogram

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

CLIP: OpenAI's amazing new zero-shot image classifier

CLIP: OpenAI's amazing new zero-shot image classifier

How I hacked my Nest camera to run custom models

How I hacked my Nest camera to run custom models

Getting Started with the Roboflow Inference API

Getting Started with the Roboflow Inference API

Transfer Learning in Computer Vision | What, How, Why

Transfer Learning in Computer Vision | What, How, Why

This video teaches the concept of Mosaic Data Augmentation, its implementation, and its applications in object detection models. It covers the tools and techniques used in Mosaic Data Augmentation and provides practical steps for its implementation.

Key Takeaways

Resize images to the same size
Bring four images together in a grid
Combine all annotations in one place
Take a random crop from the center
Split a dataset into 4x4 tiles

💡 Mosaic Data Augmentation is a powerful technique that requires careful use and knowledge of when to apply it, and it can be used to improve model performance in object detection tasks.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

Related AI Lessons

Best AI Tools and Software Reviews: 2026 Picks

Discover the best AI tools and software for your specific needs in 2026, and learn how to match them to your work for optimal results

Verify real estate listings with Dwell, a platform that checks claims against records before you sign

Reddit r/artificial

X now offers an MCP server to make its platform easier for AI tools to use

X launches a hosted MCP server to simplify AI tool integration with its API

n8n Automation Repurpose Video Content: The 2025 Production Guide

Learn to repurpose video content using n8n automation, replacing manual labor with a self-hosted workflow solution

How to Open HPL Files (HP-GL Plotter)

File Extension Geeks