15. Batch Size and Learning Rate in CNNs

Weights & Biases · Beginner ·🧬 Deep Learning ·7y ago

Key Takeaways

This video covers the fundamentals of batch size and learning rate in Convolutional Neural Networks (CNNs), demonstrating how to tune these hyperparameters to improve model performance and efficiency using tools like Adam, CNN, and GPU on the CFR data set.

Full Transcript

all right so you know you've got your model training and you want to make it a little bit better so this video is kind of like a sequel to the video on convolutional neural networks and I want to talk about learning rate and batch size and a couple ways that you can often make your models better now unlike a lot of other courses on deep learning I don't really start with learning rate and batch size and things like that because I do think beginners tend to spend way too much time kind of optimizing and tweaking these parameters where they don't really matter that much but they can matter and they can be really effective for making your models train faster or kind of getting the last piece of performance on your model so just as a quick refresher the learning rate is kind of like the size of the step that the model makes as its looking for the best possible weights so you know really low learning rate will mean that your model might take a really long time to find the best set of parameters and a really high learning rate might mean it's kind of jumping over the best possible place or even sort of jumping into regions where you get numerical instability batch size you might remember is the number of examples that a model looks at when it decides which direction to send all the weights so you know a really small batch size means that each step your models optimizing only say over one example or two examples it could add a lot of noise to your model now a really big batch size sometimes it doesn't have enough noise and it's hard for the model to actually find the best place so on one hand smaller batch sizes actually can ironically help your model train better by adding a little bit of noise into that search but maybe a bigger effect or another important effect is that a bigger batch size can help your model train faster especially on GPUs because on a GPU you can actually compute the derivative on all the batches all at once so sometimes as long as you can fit this batch into memory and an associated model into memory a bigger bigger batch size it might not actually slow down your training by much at all in fact that computation might be sort of atomic and the same speed regardless of your batch size now if you say your batch size is too big it just won't fit into memory your thing will crash so at that point you'll know that you need to reduce your batch size but one thing I see with a lot of people starting out is I think they tend to set their bachelor set is too small and waste more time in their training than they really need to so let's jump into the code and run some experiments so I'm going to do these experiments on the CFR data set that we've used in a previous class on data augmentation that you should probably take a look at if you haven't already so we're going to do the standard imports and then we're going to pull in the CFR ten data set you know these things tend to be different on different data sets but I wanted to use this data set that's small enough that we can run a lot of experience quickly but maybe not M NIST where that's actually so small that you might get unusual results so as usual I normalize the data and and convert the labels into one hot encoded versions of themselves and I'm going to start with a very very small i'm convolutional net were probably small as possible convolutional net ruth for just one convolution and one pooling layer so certainly you could make this network bigger and get better performance but I kind of want to see what happens as we modify the learning rate and the batch size first so the first thing I want to show you is what happens if you set the learning rate to something very small so you know by default Adam sets the learning rate to point zero zero one right so let's see what happens if we set it to point zero zero zero one okay so here the blue line is the accuracy of the CNN with the lower learning rate and the orange line is the accuracy of CNN with the default learning rate and so you can see lower learning rate nothing really bad happens but it learns slower and that can be really annoying when you're training lots of models right so when I see this this blue line it's it's consistently below the orange line although I think that'll probably catch up over time okay so suppose you look at this and you say you know what learning rate of point zero zero zero one that doesn't work as well as the default point zero zero one let's you know let's raise it up by a few factors let's make it say point one this would be considered a pretty high learning rate but you know who knows maybe the model will learn faster so here we call model def it and look at that we see an accuracy of about 10% and it's not even getting better another real telltale sign is there are losses 14 now remember our losses kind of in log space really it's like log loss so if you see a loss that's you know above 4 or 5 you should think of that as a massively huge loss right I think like what's actually happening here is that the models returning like Nan's and infinities and and just crazy things right and so this model is never gonna get better it just sits at a loss of 14 and accuracy of 10 percent and you remember we have 10 classes so an accuracy of 10 percent is really the models just guessing right so this is what happens when you said learning rate too high you get in a really really bad place so lower learning rates tend to be safer than higher learning rates what about back size so you know our default batchat is 32 what happens if we set it to say 128 right so there's a 4x increase in batch size if we call model that fit the first thing you'll notice is that it takes a little bit longer to start because it's loading more into memory but then actually it runs a lot faster and so that's actually pretty nice but you know if it runs faster and we sacrifice accuracy that's not very interesting and remember it's running faster because each step that it's taking is over bigger chunk so it's looking at the same number of training examples but it's doing it in larger chunks you all right cool so so here I mean this is a reason that people often use don't use larger batch sizes the CNN with the 128 batch size is the each step is faster but it's actually performing worse than the CNN with the smaller batch size but there's actually a subtle point there that it's important to know which is that as you increase the batch size you should also include increase the learning rate so if you multiply the batch size by four you should also generally speaking and multiply the learning rate by four so with bigger batches because you're averaging over larger number samples you can get away with higher learning rates than you otherwise could so let's try this 128 batch size again but with the 4x learning rate versus the baseline thirty-two batch size you okay so actually what you see here is that when we Forex the learning rate with the larger batch size now we get the best of both worlds so we have a CNN that's training faster and it's training at about the same level of accuracy and validation accuracy as our baseline CNN so one more fancy thing you can do and especially this is really effective when you train over a large number of epochs is you can reduce the learning rate on plateau so the idea here is that once your model kind of gets stuck and can't find a better optimum it lowers the learning rate so it can kind of fine-tune in the area that it's in right so there's actually this is such a common technique that there's actually a Carus callback that we can just use it will do it for us it has lots of options but you know the defaults as usual are pretty sensible so we just add to our callbacks a callback reduce LR on plateau and now in order to get this effect we actually have to have our learning actually plateaus so let's set the epochs to say 300 and time-lapses a little bit to not make a really long video okay and so this model has been training for a while and you can see what happened is at about the 35th epic it stopped improving and so our reduce learning rate system automatically reduced the learning rate and you can see that the accuracy and the validation accuracy actually popped up quickly and then petered out and with a lot of models you actually see that effect happen a few times each probably like a little bit smaller than the time before but it can really add some accuracy on the end of your model training so you know again I don't think learning rate is always the first thing to mess with unless your model is really not training you're having numerical issues generally the Charis defaults are pretty good and you know most people go through a phase I think where they spend too much time kind of changing the optimizer and you know changing you know momentum and things like that but it definitely does make sense to understand what learning rate and batch size are and spend a little time tweaking them and I think doing this kind of reduced learning rate on plateau system is generally best practice for long running long training models Thanks you

Original Description

Dive into ways to tune your batch size and learning rate to improve model performance and efficiency. This video is a sequel to the previous CNN video: Convolutional Neural Networks Part 1: https://www.youtube.com/watch?v=wzy8jI-duEQ&list=PUBp3w4DCEC64FZr4k9ROxig&index=7 Github repo: https://github.com/lukas/ml-class See all classes: http://wandb.com/classes Weights & Biases: http://wandb.com
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 22 of 60

1 0. What is machine learning?
0. What is machine learning?
Weights & Biases
2 1. Build Your First Machine Learning Model
1. Build Your First Machine Learning Model
Weights & Biases
3 Intro to ML: Course Overview
Intro to ML: Course Overview
Weights & Biases
4 2. Multi-Layer Perceptrons
2. Multi-Layer Perceptrons
Weights & Biases
5 3. Convolutional Neural Networks
3. Convolutional Neural Networks
Weights & Biases
6 Weights & Biases at OpenAI
Weights & Biases at OpenAI
Weights & Biases
7 Why Experiment Tracking is Crucial to OpenAI
Why Experiment Tracking is Crucial to OpenAI
Weights & Biases
8 4. Autoencoders
4. Autoencoders
Weights & Biases
9 5. Sentiment Analysis
5. Sentiment Analysis
Weights & Biases
10 6. Recurrent Neural Networks [RNNs]
6. Recurrent Neural Networks [RNNs]
Weights & Biases
11 7. Text Generation using LSTMs and GRUs
7. Text Generation using LSTMs and GRUs
Weights & Biases
12 8. Text Classification Using Convolutional Neural Networks
8. Text Classification Using Convolutional Neural Networks
Weights & Biases
13 9. Hybrid LSTMs [Long Short-Term Memory]
9. Hybrid LSTMs [Long Short-Term Memory]
Weights & Biases
14 Toyota Research Institute on Experiment Tracking with Weights & Biases
Toyota Research Institute on Experiment Tracking with Weights & Biases
Weights & Biases
15 Weights and Biases - Developer Tools for Deep Learning
Weights and Biases - Developer Tools for Deep Learning
Weights & Biases
16 Introducing Weights & Biases
Introducing Weights & Biases
Weights & Biases
17 10. Seq2Seq Models
10. Seq2Seq Models
Weights & Biases
18 11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
11. Transfer Learning for Domain-Specific Image Classification with Small Datasets
Weights & Biases
19 12. One-shot learning for teaching neural networks to classify objects never seen before
12. One-shot learning for teaching neural networks to classify objects never seen before
Weights & Biases
20 13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow
Weights & Biases
21 14. Data Augmentation | Keras
14. Data Augmentation | Keras
Weights & Biases
15. Batch Size and Learning Rate in CNNs
15. Batch Size and Learning Rate in CNNs
Weights & Biases
23 Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)
Weights & Biases
24 Grading Rubric for AI Applications with Sergey Karayev  (2019)
Grading Rubric for AI Applications with Sergey Karayev (2019)
Weights & Biases
25 16. Video Frame Prediction using CNNs and LSTMs (2019)
16. Video Frame Prediction using CNNs and LSTMs (2019)
Weights & Biases
26 Image to LaTeX - Applied Deep Learning Fellowship (2019)
Image to LaTeX - Applied Deep Learning Fellowship (2019)
Weights & Biases
27 17.  Build and Deploy an Emotion Classifier (2019)
17. Build and Deploy an Emotion Classifier (2019)
Weights & Biases
28 Applied Deep Learning - Data Management with Josh Tobin (2019)
Applied Deep Learning - Data Management with Josh Tobin (2019)
Weights & Biases
29 Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)
Weights & Biases
30 Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)
Weights & Biases
31 Troubleshooting and Iterating ML Models with Lee Redden (2019)
Troubleshooting and Iterating ML Models with Lee Redden (2019)
Weights & Biases
32 Designing a Machine Learning Project with Neal Khosla (2019)
Designing a Machine Learning Project with Neal Khosla (2019)
Weights & Biases
33 Lukas Beiwald on ML Tools and Experiment Management (2019)
Lukas Beiwald on ML Tools and Experiment Management (2019)
Weights & Biases
34 Building Machine Learning Teams with Josh Tobin (2019)
Building Machine Learning Teams with Josh Tobin (2019)
Weights & Biases
35 Pieter Abeel on Potential Deep Learning Research Directions  (2019)
Pieter Abeel on Potential Deep Learning Research Directions (2019)
Weights & Biases
36 Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Testing and Deployment of Deep Learning Models with Josh Tobin (2019)
Weights & Biases
37 Five Lessons for Team-Oriented Research with Peter Welder (2019)
Five Lessons for Team-Oriented Research with Peter Welder (2019)
Weights & Biases
38 Applied Deep Learning - Rosanne Liu on AI Research (2019)
Applied Deep Learning - Rosanne Liu on AI Research (2019)
Weights & Biases
39 Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Making the Mid-career Leap from Urban Design to Deep Learning/Data Science
Weights & Biases
40 Organizing ML projects — W&B walkthrough (2020)
Organizing ML projects — W&B walkthrough (2020)
Weights & Biases
41 Brandon Rohrer — Machine Learning in Production for Robots
Brandon Rohrer — Machine Learning in Production for Robots
Weights & Biases
42 Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars
Weights & Biases
43 My experiments with Reinforcement Learning with Jariullah Safi
My experiments with Reinforcement Learning with Jariullah Safi
Weights & Biases
44 Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Applications of Machine Learning to COVID-19 Research with Isaac Godfried
Weights & Biases
45 Testing Machine Learning Models with Eric Schles
Testing Machine Learning Models with Eric Schles
Weights & Biases
46 How Linear Algebra is not like Algebra with Charles Frye
How Linear Algebra is not like Algebra with Charles Frye
Weights & Biases
47 Predicting Protein Structures using Deep Learning with Jonathan King
Predicting Protein Structures using Deep Learning with Jonathan King
Weights & Biases
48 Rachael Tatman — Conversational AI and Linguistics
Rachael Tatman — Conversational AI and Linguistics
Weights & Biases
49 Reformer by Han Lee
Reformer by Han Lee
Weights & Biases
50 Sequence Models with Pujaa Rajan
Sequence Models with Pujaa Rajan
Weights & Biases
51 GitHub Actions & Machine Learning Workflows with Hamel Husain
GitHub Actions & Machine Learning Workflows with Hamel Husain
Weights & Biases
52 Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye
Weights & Biases
53 Jack Clark — Building Trustworthy AI Systems
Jack Clark — Building Trustworthy AI Systems
Weights & Biases
54 Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye
Weights & Biases
55 Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Track your machine learning experiments locally, with W&B Local - Chris Van Pelt
Weights & Biases
56 Antipatterns in open source research code with Jariullah Safi
Antipatterns in open source research code with Jariullah Safi
Weights & Biases
57 Attention for time series forecasting & COVID predictions - Isaac Godfried
Attention for time series forecasting & COVID predictions - Isaac Godfried
Weights & Biases
58 Made with ML - Goku Mohandas
Made with ML - Goku Mohandas
Weights & Biases
59 Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Angela & Danielle — Designing ML Models for Millions of Consumer Robots
Weights & Biases
60 Deep Learning Salon by Weights & Biases
Deep Learning Salon by Weights & Biases
Weights & Biases

This video teaches how to optimize batch size and learning rate in CNNs to improve model performance and efficiency, covering key concepts like model training, fine-tuning, and hyperparameter tuning.

Key Takeaways
  1. Run experiments on the CFR data set
  2. Pull in the CFR ten data set
  3. Normalize the data and convert the labels into one hot encoded versions of themselves
  4. Set the learning rate to a very small value
  5. Train a CNN with the lower learning rate and the default learning rate
  6. Set learning rate to 0.0001
  7. Increase learning rate to 1.0
  8. Increase batch size to 128
  9. Reduce learning rate on plateau
💡 Reducing learning rate on plateau can help fine-tune the model's performance, and using Carus callback can automatically reduce learning rate for better results.

Related Reads

📰
Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
📰
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
📰
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
📰
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →