15. Batch Size and Learning Rate in CNNs
Key Takeaways
This video covers the fundamentals of batch size and learning rate in Convolutional Neural Networks (CNNs), demonstrating how to tune these hyperparameters to improve model performance and efficiency using tools like Adam, CNN, and GPU on the CFR data set.
Full Transcript
all right so you know you've got your model training and you want to make it a little bit better so this video is kind of like a sequel to the video on convolutional neural networks and I want to talk about learning rate and batch size and a couple ways that you can often make your models better now unlike a lot of other courses on deep learning I don't really start with learning rate and batch size and things like that because I do think beginners tend to spend way too much time kind of optimizing and tweaking these parameters where they don't really matter that much but they can matter and they can be really effective for making your models train faster or kind of getting the last piece of performance on your model so just as a quick refresher the learning rate is kind of like the size of the step that the model makes as its looking for the best possible weights so you know really low learning rate will mean that your model might take a really long time to find the best set of parameters and a really high learning rate might mean it's kind of jumping over the best possible place or even sort of jumping into regions where you get numerical instability batch size you might remember is the number of examples that a model looks at when it decides which direction to send all the weights so you know a really small batch size means that each step your models optimizing only say over one example or two examples it could add a lot of noise to your model now a really big batch size sometimes it doesn't have enough noise and it's hard for the model to actually find the best place so on one hand smaller batch sizes actually can ironically help your model train better by adding a little bit of noise into that search but maybe a bigger effect or another important effect is that a bigger batch size can help your model train faster especially on GPUs because on a GPU you can actually compute the derivative on all the batches all at once so sometimes as long as you can fit this batch into memory and an associated model into memory a bigger bigger batch size it might not actually slow down your training by much at all in fact that computation might be sort of atomic and the same speed regardless of your batch size now if you say your batch size is too big it just won't fit into memory your thing will crash so at that point you'll know that you need to reduce your batch size but one thing I see with a lot of people starting out is I think they tend to set their bachelor set is too small and waste more time in their training than they really need to so let's jump into the code and run some experiments so I'm going to do these experiments on the CFR data set that we've used in a previous class on data augmentation that you should probably take a look at if you haven't already so we're going to do the standard imports and then we're going to pull in the CFR ten data set you know these things tend to be different on different data sets but I wanted to use this data set that's small enough that we can run a lot of experience quickly but maybe not M NIST where that's actually so small that you might get unusual results so as usual I normalize the data and and convert the labels into one hot encoded versions of themselves and I'm going to start with a very very small i'm convolutional net were probably small as possible convolutional net ruth for just one convolution and one pooling layer so certainly you could make this network bigger and get better performance but I kind of want to see what happens as we modify the learning rate and the batch size first so the first thing I want to show you is what happens if you set the learning rate to something very small so you know by default Adam sets the learning rate to point zero zero one right so let's see what happens if we set it to point zero zero zero one okay so here the blue line is the accuracy of the CNN with the lower learning rate and the orange line is the accuracy of CNN with the default learning rate and so you can see lower learning rate nothing really bad happens but it learns slower and that can be really annoying when you're training lots of models right so when I see this this blue line it's it's consistently below the orange line although I think that'll probably catch up over time okay so suppose you look at this and you say you know what learning rate of point zero zero zero one that doesn't work as well as the default point zero zero one let's you know let's raise it up by a few factors let's make it say point one this would be considered a pretty high learning rate but you know who knows maybe the model will learn faster so here we call model def it and look at that we see an accuracy of about 10% and it's not even getting better another real telltale sign is there are losses 14 now remember our losses kind of in log space really it's like log loss so if you see a loss that's you know above 4 or 5 you should think of that as a massively huge loss right I think like what's actually happening here is that the models returning like Nan's and infinities and and just crazy things right and so this model is never gonna get better it just sits at a loss of 14 and accuracy of 10 percent and you remember we have 10 classes so an accuracy of 10 percent is really the models just guessing right so this is what happens when you said learning rate too high you get in a really really bad place so lower learning rates tend to be safer than higher learning rates what about back size so you know our default batchat is 32 what happens if we set it to say 128 right so there's a 4x increase in batch size if we call model that fit the first thing you'll notice is that it takes a little bit longer to start because it's loading more into memory but then actually it runs a lot faster and so that's actually pretty nice but you know if it runs faster and we sacrifice accuracy that's not very interesting and remember it's running faster because each step that it's taking is over bigger chunk so it's looking at the same number of training examples but it's doing it in larger chunks you all right cool so so here I mean this is a reason that people often use don't use larger batch sizes the CNN with the 128 batch size is the each step is faster but it's actually performing worse than the CNN with the smaller batch size but there's actually a subtle point there that it's important to know which is that as you increase the batch size you should also include increase the learning rate so if you multiply the batch size by four you should also generally speaking and multiply the learning rate by four so with bigger batches because you're averaging over larger number samples you can get away with higher learning rates than you otherwise could so let's try this 128 batch size again but with the 4x learning rate versus the baseline thirty-two batch size you okay so actually what you see here is that when we Forex the learning rate with the larger batch size now we get the best of both worlds so we have a CNN that's training faster and it's training at about the same level of accuracy and validation accuracy as our baseline CNN so one more fancy thing you can do and especially this is really effective when you train over a large number of epochs is you can reduce the learning rate on plateau so the idea here is that once your model kind of gets stuck and can't find a better optimum it lowers the learning rate so it can kind of fine-tune in the area that it's in right so there's actually this is such a common technique that there's actually a Carus callback that we can just use it will do it for us it has lots of options but you know the defaults as usual are pretty sensible so we just add to our callbacks a callback reduce LR on plateau and now in order to get this effect we actually have to have our learning actually plateaus so let's set the epochs to say 300 and time-lapses a little bit to not make a really long video okay and so this model has been training for a while and you can see what happened is at about the 35th epic it stopped improving and so our reduce learning rate system automatically reduced the learning rate and you can see that the accuracy and the validation accuracy actually popped up quickly and then petered out and with a lot of models you actually see that effect happen a few times each probably like a little bit smaller than the time before but it can really add some accuracy on the end of your model training so you know again I don't think learning rate is always the first thing to mess with unless your model is really not training you're having numerical issues generally the Charis defaults are pretty good and you know most people go through a phase I think where they spend too much time kind of changing the optimizer and you know changing you know momentum and things like that but it definitely does make sense to understand what learning rate and batch size are and spend a little time tweaking them and I think doing this kind of reduced learning rate on plateau system is generally best practice for long running long training models Thanks you
Original Description
Dive into ways to tune your batch size and learning rate to improve model performance and efficiency.
This video is a sequel to the previous CNN video:
Convolutional Neural Networks Part 1: https://www.youtube.com/watch?v=wzy8jI-duEQ&list=PUBp3w4DCEC64FZr4k9ROxig&index=7
Github repo: https://github.com/lukas/ml-class
See all classes: http://wandb.com/classes
Weights & Biases: http://wandb.com
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Weights & Biases · Weights & Biases · 22 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
▶
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
0. What is machine learning?
Weights & Biases
1. Build Your First Machine Learning Model
Weights & Biases
Intro to ML: Course Overview
Weights & Biases
2. Multi-Layer Perceptrons
Weights & Biases
3. Convolutional Neural Networks
Weights & Biases
Weights & Biases at OpenAI
Weights & Biases
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
4. Autoencoders
Weights & Biases
5. Sentiment Analysis
Weights & Biases
6. Recurrent Neural Networks [RNNs]
Weights & Biases
7. Text Generation using LSTMs and GRUs
Weights & Biases
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
Introducing Weights & Biases
Weights & Biases
10. Seq2Seq Models
Weights & Biases
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
14. Data Augmentation | Keras
Weights & Biases
15. Batch Size and Learning Rate in CNNs
Weights & Biases
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
Testing Machine Learning Models with Eric Schles
Weights & Biases
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
Reformer by Han Lee
Weights & Biases
Sequence Models with Pujaa Rajan
Weights & Biases
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
Made with ML - Goku Mohandas
Weights & Biases
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases
More on: ML Maths Basics
View skill →Related Reads
📰
📰
📰
📰
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI