Image Data Augmentation for Deep Learning
Key Takeaways
The video discusses data augmentation techniques for deep learning in computer vision, including image manipulation, random erasing, feature space augmentation, and adversarial training, with tools such as Generative Adversarial Networks (GANs) and AlexNet.
Full Transcript
[Music] this video will present a paper I recently published on image data augmentation for deep learning the motivation behind data augmentation is to prevent overfitting well we're fitting refers to a phenomenon when deep learning models with high capacity exactly modeled the training data such that they don't generalize to the testing data this is shown in this image whereas the training error decreases the testing error actually increases rather than continuing to decrease with the training error so data augmentation shown right here in these rotation images in the video on the left is an is a way of doing manipulations on data such that you have new data points and it serves as a regular rising effect such that you can hard code these translational invariance --iz into your model so you would want your image recognition model to be able to recognize the Panda in the normal image and in the horizontal reflect image so data-orientation has been applied a lot in the history of computer vision in the Laocoon net five on the m missed recognition task this shows how they do did a data warping to get more out of there and miss dataset alex net also uses data augmentation and their did augmentation increases the image net data set by a factor of 2048 and they do this by randomly cropping 224 by 224 patches clipping them horizontally and then changing the intensity of the RGB channels and so they directly tribute Algren tation in the alex net paper to one percent error rate reduction so this is the taxonomy of data augmentations covered in this paper will talk about basic image manipulations like color space transformations geometric transformations random erasing mixing images and kernel filters then we'll talk about deep learning approaches like generative adversarial networks normal style transfer and adversarial training then we'll see how these can be controlled with a meta learning controller to get an even better performance with data augmentation so image manipulations are the most commonly used data augmentation in computer vision this includes things like flipping color space cropping rotation translation and noise injection the in the top right shows an example of some color augmentations to this image you'd want the convolutional neural hour to be invariant to these color transformations and still be able to recognize objects despite lighting differences one other thing to consider with image Minette manipulations is non label preserving transformations so for example if you horizontally flip the m-miss dataset then you flip a 9at horizontally it's no longer really at 9 so with it all these image met collisions there have an affiliated magnitude parameter and there's always some level of distortion that is going to corrupt the label as well so this is a comparison of augmentations by image manipulation in one study so it's definitely interesting to search over the augmentation space of classic image manipulation and see how the accuracy of your model changes so in this case you see that they get a much better performance result with cropping than the other occupations another interesting idea is kernel filters this is the patch shuffled regularization technique where they randomly shuffle around pixels in a four by four sliding window mixing images is another really surprisingly successful data organ what they do is they extract patches and they just randomly average together the patches for each pixel and they train a network this may be successful because of the increased data set size some kind of regular ACE regularization effect it's really unclear why exactly this works but it does work well which is surprising they also experiment with nonlinear mixing and all these interesting ways of mixing images to form new samples another very interesting technique is random erasing error cut out this is used really frequently instead the RNA with recognition models so this does is it's like drop out but in the input space so you have like this rectangle that is placed on the images and then instead of the original image it's like all zero all ones or you know the static noise so this is the results of applying cutout in the cutout regularization paper and you see that they get a like greater than one percent error rate and almost all of the trials with it another interesting idea is feature space augmentation so the way the convolutional networks work is they sequentially transform an image into a series of rank 3 tensors where each dimension is number of feature maps height width of the feature map so this study they they augment the image representations and these intermediate tensors and then they decode them back into the image space another interesting way of doing this is adversarial training so in addition to the phenomenons of adversarial examples we could have an adversarial agent which is constrained to a set of image manipulations like rotations and translations and is trying to select geometric transformation that will result in a miss classification and it is beneficial to use these adversarial agents to direct the search process of augmentations one really interesting idea is to use data from a gener adversarial Network to augment datasets Durov adversarial networks as shown on the slide here take random noise and then they learn to generate new data based on the discriminators lost function of real or fake so eventually the generator is able to produce novel data samples so it's really interesting to see and this hasn't really been shown to work successfully on datasets like image net but on this liberal ijen classification data set they use this technique of generating data and then just appending it to the dataset and they get these seventy eight point six to eighty five point seven and eighty eight point four to ninety two point four but other than this medical image domain this hasn't really worked well on things like image net or even see far ten so another really interesting day talent a shin that hasn't been explored too much as neural style transfer using these styles to augment images in interesting ways and this goes beyond just like color transformations this really augments images in a cement whay so style transfer has been really successful in robotic applications like this study from UC Berkeley where they randomized the colors in the simulation such that the when the robot goes to the real world it generalizes because it just sees the color transformation as another color set in the diverse data set it's been trained on and then on the opposite end of that rather than going for diversity these the Sangam model is for realism and they take data from a graphics engine from like unity and then they use the game to make it to align the generated data from the graphics engine with the original training set so this brings the idea of meta learning using some kind of controlling algorithm to search through this space of augmentations so auto augment is the most interesting way of applying reinforcement learning to basic image manipulations so what they do is they search for a policy that selects a image manipulation like a rotation and translate and then it will find a magnitude of applying the operation like rotated 45 degrees or 70 degrees and then it will find a probability of applying the operation so some other things to consider with data augmentation is test time the augmentation and this isn't good for applications that need fast and friends and fast predictions but if you want to increase the accuracy of your model it would be useful to take to take the image then augment it several times and then aggregate the predictions across the different augmentations so like an Alex net what they do is they randomly crop 5 to 24 by 224 patches and then they horizontally flip them all and then they aggregate the predictions across these ten pairs another thing is curriculum learning and recent paper presented in nice email 2019 population-based augmentation they progressively increase the magnitude parameter of image manipulation so when they first start reading the model they might rotate it like 10 degrees or minus 10 degrees and then as the training progresses the rotations would get larger like sixty degrees or negative sixty degrees another interesting thing is the impact of image resolution on model performance and it's not really quite similar to the other things presented but frequently images are downsized to fit as input and to save computation but this study shows that if you preserve the high resolution you'll tend to get better classification performance so then one other consideration is online and offline data augmentation so online data augmentation means that as it goes into the batch its augmented with some probability parameter and then offline would be where you are meant it in in preparation for the training and then write it to the disk which has a storage cost so there are some other regularization tools out there like dropout Batchelor ization transforming free training and then one chattin zero learning zero shot learning these are all techniques that are aiming to overcome the problem of overfitting especially in the case of limited data domains like medical image analysis so again this is the taxonomy of image data augmentations surveyed in this paper basically manipulations deep learning approaches and then meta learning approaches that use controllers to search for the augmentation parameters thanks for watching this video please subscribe to Henry AI labs for more deep learning videos and please check out this paper a survey on image data augmentation for deep learning published in the spring or Journal of big data
Original Description
Paper Link: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0
Data Augmentation is used in all modern Computer Vision models. This video explains some of the different ways Data Augmentation is implemented and some design considerations for your own problems. Please Subscribe for more Deep Learning videos!
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Connor Shorten · Connor Shorten · 38 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
▶
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
DenseNets
Connor Shorten
DeepWalk Explained
Connor Shorten
Inception Network Explained
Connor Shorten
StackGAN
Connor Shorten
StyleGAN
Connor Shorten
Progressive Growing of GANs Explained
Connor Shorten
Improved Techniques for Training GANs
Connor Shorten
Word2Vec Explained
Connor Shorten
Must Read Papers on GANs
Connor Shorten
Unsupervised Feature Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Embedding Graphs with Deep Learning
Connor Shorten
Transfer Learning in GANs
Connor Shorten
ReLU Activation Function
Connor Shorten
AC-GAN Explained
Connor Shorten
SimGAN Explained
Connor Shorten
DC-GAN Explained!
Connor Shorten
ResNet Explained!
Connor Shorten
Graph Convolutional Networks
Connor Shorten
Neural Architecture Search
Connor Shorten
Henry AI Labs
Connor Shorten
Video Classification with Deep Learning
Connor Shorten
BigGANs in Data Augmentation
Connor Shorten
Introduction to Deep Learning
Connor Shorten
EfficientNet Explained!
Connor Shorten
Self-Attention GAN
Connor Shorten
Curriculum Learning in Deep Neural Networks
Connor Shorten
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
Deep Compression
Connor Shorten
Skin Cancer Classification with Deep Learning
Connor Shorten
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
The Lottery Ticket Hypothesis Explained!
Connor Shorten
SqueezeNet
Connor Shorten
GauGAN Explained!
Connor Shorten
AutoML with Hyperband
Connor Shorten
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
Weakly Supervised Pretraining
Connor Shorten
Image Data Augmentation for Deep Learning
Connor Shorten
Unsupervised Data Augmentation
Connor Shorten
Wide ResNet Explained!
Connor Shorten
RevNet: Backpropagation without Storing Activations
Connor Shorten
GANs with Fewer Labels
Connor Shorten
BigBiGAN Unsupervised Learning!
Connor Shorten
Self-Supervised Learning
Connor Shorten
Multi-Task Self-Supervised Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Population Based Training
Connor Shorten
Show, Attend and Tell
Connor Shorten
Siamese Neural Networks
Connor Shorten
WaveGAN Explained!
Connor Shorten
VAE-GAN Explained!
Connor Shorten
Evolution in Neural Architecture Search!
Connor Shorten
AI Research Weekly Update August 18th, 2019
Connor Shorten
Weight Agnostic Neural Networks Explained!
Connor Shorten
AI Research Weekly Update August 25th, 2019
Connor Shorten
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
CoDeepNEAT
Connor Shorten
AI Research Weekly Update September 1st, 2019
Connor Shorten
Randomly Wired Neural Networks
Connor Shorten
Genetic CNN
Connor Shorten
More on: CV Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Machine Learning
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Data Science
The Python Dictionary Trick That Makes Interviewers Smile
Dev.to · Ameer Abdullah
I Compared 50 Python Courses. Here Are My Top 5 Recommendations for 2026
Medium · Python
🎓
Tutor Explanation
DeepCamp AI