15. Batch Size and Learning Rate in CNNs

Weights & Biases · Beginner ·🧬 Deep Learning ·7y ago

Skills: ML Maths Basics90%Supervised Learning80%ML Pipelines80%

Key Takeaways

This video covers the fundamentals of batch size and learning rate in Convolutional Neural Networks (CNNs), demonstrating how to tune these hyperparameters to improve model performance and efficiency using tools like Adam, CNN, and GPU on the CFR data set.

Full Transcript

all right so you know you've got your model training and you want to make it a little bit better so this video is kind of like a sequel to the video on convolutional neural networks and I want to talk about learning rate and batch size and a couple ways that you can often make your models better now unlike a lot of other courses on deep learning I don't really start with learning rate and batch size and things like that because I do think beginners tend to spend way too much time kind of optimizing and tweaking these parameters where they don't really matter that much but they can matter and they can be really effective for making your models train faster or kind of getting the last piece of performance on your model so just as a quick refresher the learning rate is kind of like the size of the step that the model makes as its looking for the best possible weights so you know really low learning rate will mean that your model might take a really long time to find the best set of parameters and a really high learning rate might mean it's kind of jumping over the best possible place or even sort of jumping into regions where you get numerical instability batch size you might remember is the number of examples that a model looks at when it decides which direction to send all the weights so you know a really small batch size means that each step your models optimizing only say over one example or two examples it could add a lot of noise to your model now a really big batch size sometimes it doesn't have enough noise and it's hard for the model to actually find the best place so on one hand smaller batch sizes actually can ironically help your model train better by adding a little bit of noise into that search but maybe a bigger effect or another important effect is that a bigger batch size can help your model train faster especially on GPUs because on a GPU you can actually compute the derivative on all the batches all at once so sometimes as long as you can fit this batch into memory and an associated model into memory a bigger bigger batch size it might not actually slow down your training by much at all in fact that computation might be sort of atomic and the same speed regardless of your batch size now if you say your batch size is too big it just won't fit into memory your thing will crash so at that point you'll know that you need to reduce your batch size but one thing I see with a lot of people starting out is I think they tend to set their bachelor set is too small and waste more time in their training than they really need to so let's jump into the code and run some experiments so I'm going to do these experiments on the CFR data set that we've used in a previous class on data augmentation that you should probably take a look at if you haven't already so we're going to do the standard imports and then we're going to pull in the CFR ten data set you know these things tend to be different on different data sets but I wanted to use this data set that's small enough that we can run a lot of experience quickly but maybe not M NIST where that's actually so small that you might get unusual results so as usual I normalize the data and and convert the labels into one hot encoded versions of themselves and I'm going to start with a very very small i'm convolutional net were probably small as possible convolutional net ruth for just one convolution and one pooling layer so certainly you could make this network bigger and get better performance but I kind of want to see what happens as we modify the learning rate and the batch size first so the first thing I want to show you is what happens if you set the learning rate to something very small so you know by default Adam sets the learning rate to point zero zero one right so let's see what happens if we set it to point zero zero zero one okay so here the blue line is the accuracy of the CNN with the lower learning rate and the orange line is the accuracy of CNN with the default learning rate and so you can see lower learning rate nothing really bad happens but it learns slower and that can be really annoying when you're training lots of models right so when I see this this blue line it's it's consistently below the orange line although I think that'll probably catch up over time okay so suppose you look at this and you say you know what learning rate of point zero zero zero one that doesn't work as well as the default point zero zero one let's you know let's raise it up by a few factors let's make it say point one this would be considered a pretty high learning rate but you know who knows maybe the model will learn faster so here we call model def it and look at that we see an accuracy of about 10% and it's not even getting better another real telltale sign is there are losses 14 now remember our losses kind of in log space really it's like log loss so if you see a loss that's you know above 4 or 5 you should think of that as a massively huge loss right I think like what's actually happening here is that the models returning like Nan's and infinities and and just crazy things right and so this model is never gonna get better it just sits at a loss of 14 and accuracy of 10 percent and you remember we have 10 classes so an accuracy of 10 percent is really the models just guessing right so this is what happens when you said learning rate too high you get in a really really bad place so lower learning rates tend to be safer than higher learning rates what about back size so you know our default batchat is 32 what happens if we set it to say 128 right so there's a 4x increase in batch size if we call model that fit the first thing you'll notice is that it takes a little bit longer to start because it's loading more into memory but then actually it runs a lot faster and so that's actually pretty nice but you know if it runs faster and we sacrifice accuracy that's not very interesting and remember it's running faster because each step that it's taking is over bigger chunk so it's looking at the same number of training examples but it's doing it in larger chunks you all right cool so so here I mean this is a reason that people often use don't use larger batch sizes the CNN with the 128 batch size is the each step is faster but it's actually performing worse than the CNN with the smaller batch size but there's actually a subtle point there that it's important to know which is that as you increase the batch size you should also include increase the learning rate so if you multiply the batch size by four you should also generally speaking and multiply the learning rate by four so with bigger batches because you're averaging over larger number samples you can get away with higher learning rates than you otherwise could so let's try this 128 batch size again but with the 4x learning rate versus the baseline thirty-two batch size you okay so actually what you see here is that when we Forex the learning rate with the larger batch size now we get the best of both worlds so we have a CNN that's training faster and it's training at about the same level of accuracy and validation accuracy as our baseline CNN so one more fancy thing you can do and especially this is really effective when you train over a large number of epochs is you can reduce the learning rate on plateau so the idea here is that once your model kind of gets stuck and can't find a better optimum it lowers the learning rate so it can kind of fine-tune in the area that it's in right so there's actually this is such a common technique that there's actually a Carus callback that we can just use it will do it for us it has lots of options but you know the defaults as usual are pretty sensible so we just add to our callbacks a callback reduce LR on plateau and now in order to get this effect we actually have to have our learning actually plateaus so let's set the epochs to say 300 and time-lapses a little bit to not make a really long video okay and so this model has been training for a while and you can see what happened is at about the 35th epic it stopped improving and so our reduce learning rate system automatically reduced the learning rate and you can see that the accuracy and the validation accuracy actually popped up quickly and then petered out and with a lot of models you actually see that effect happen a few times each probably like a little bit smaller than the time before but it can really add some accuracy on the end of your model training so you know again I don't think learning rate is always the first thing to mess with unless your model is really not training you're having numerical issues generally the Charis defaults are pretty good and you know most people go through a phase I think where they spend too much time kind of changing the optimizer and you know changing you know momentum and things like that but it definitely does make sense to understand what learning rate and batch size are and spend a little time tweaking them and I think doing this kind of reduced learning rate on plateau system is generally best practice for long running long training models Thanks you

Original Description

Dive into ways to tune your batch size and learning rate to improve model performance and efficiency. This video is a sequel to the previous CNN video: Convolutional Neural Networks Part 1: https://www.youtube.com/watch?v=wzy8jI-duEQ&list=PUBp3w4DCEC64FZr4k9ROxig&index=7 Github repo: https://github.com/lukas/ml-class See all classes: http://wandb.com/classes Weights & Biases: http://wandb.com

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Weights & Biases · Weights & Biases · 22 of 60

← Previous Next →

0. What is machine learning?

0. What is machine learning?

Weights & Biases

1. Build Your First Machine Learning Model

1. Build Your First Machine Learning Model

Weights & Biases

Intro to ML: Course Overview

Intro to ML: Course Overview

Weights & Biases

2. Multi-Layer Perceptrons

2. Multi-Layer Perceptrons

Weights & Biases

3. Convolutional Neural Networks

3. Convolutional Neural Networks

Weights & Biases

Weights & Biases at OpenAI

Weights & Biases at OpenAI

Weights & Biases

Why Experiment Tracking is Crucial to OpenAI

Why Experiment Tracking is Crucial to OpenAI

Weights & Biases

4. Autoencoders

4. Autoencoders

Weights & Biases

5. Sentiment Analysis

5. Sentiment Analysis

Weights & Biases

6. Recurrent Neural Networks [RNNs]

6. Recurrent Neural Networks [RNNs]

Weights & Biases

7. Text Generation using LSTMs and GRUs

7. Text Generation using LSTMs and GRUs

Weights & Biases

8. Text Classification Using Convolutional Neural Networks

8. Text Classification Using Convolutional Neural Networks

Weights & Biases

9. Hybrid LSTMs [Long Short-Term Memory]

9. Hybrid LSTMs [Long Short-Term Memory]

Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Toyota Research Institute on Experiment Tracking with Weights & Biases

Weights & Biases

Weights and Biases - Developer Tools for Deep Learning

Weights and Biases - Developer Tools for Deep Learning

Weights & Biases

Introducing Weights & Biases

Introducing Weights & Biases

Weights & Biases

10. Seq2Seq Models

10. Seq2Seq Models

Weights & Biases

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

11. Transfer Learning for Domain-Specific Image Classification with Small Datasets

Weights & Biases

12. One-shot learning for teaching neural networks to classify objects never seen before

12. One-shot learning for teaching neural networks to classify objects never seen before

Weights & Biases

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

14. Data Augmentation | Keras

14. Data Augmentation | Keras

Weights & Biases

15. Batch Size and Learning Rate in CNNs

15. Batch Size and Learning Rate in CNNs

Weights & Biases

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Applied Deep Learning Fellowship Overview and Project Selection with Josh Tobin (2019)

Weights & Biases

Grading Rubric for AI Applications with Sergey Karayev (2019)

Grading Rubric for AI Applications with Sergey Karayev (2019)

Weights & Biases

16. Video Frame Prediction using CNNs and LSTMs (2019)

16. Video Frame Prediction using CNNs and LSTMs (2019)

Weights & Biases

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Image to LaTeX - Applied Deep Learning Fellowship (2019)

Weights & Biases

17. Build and Deploy an Emotion Classifier (2019)

17. Build and Deploy an Emotion Classifier (2019)

Weights & Biases

Applied Deep Learning - Data Management with Josh Tobin (2019)

Applied Deep Learning - Data Management with Josh Tobin (2019)

Weights & Biases

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Snorkel: Programming Training Data with Paroma Varma of Stanford University (2019)

Weights & Biases

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Applied Deep Learning - Troubleshooting and Debugging with Josh Tobin (2019)

Weights & Biases

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Troubleshooting and Iterating ML Models with Lee Redden (2019)

Weights & Biases

Designing a Machine Learning Project with Neal Khosla (2019)

Designing a Machine Learning Project with Neal Khosla (2019)

Weights & Biases

Lukas Beiwald on ML Tools and Experiment Management (2019)

Lukas Beiwald on ML Tools and Experiment Management (2019)

Weights & Biases

Building Machine Learning Teams with Josh Tobin (2019)

Building Machine Learning Teams with Josh Tobin (2019)

Weights & Biases

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Pieter Abeel on Potential Deep Learning Research Directions (2019)

Weights & Biases

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Testing and Deployment of Deep Learning Models with Josh Tobin (2019)

Weights & Biases

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Five Lessons for Team-Oriented Research with Peter Welder (2019)

Weights & Biases

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Applied Deep Learning - Rosanne Liu on AI Research (2019)

Weights & Biases

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Making the Mid-career Leap from Urban Design to Deep Learning/Data Science

Weights & Biases

Organizing ML projects — W&B walkthrough (2020)

Organizing ML projects — W&B walkthrough (2020)

Weights & Biases

Brandon Rohrer — Machine Learning in Production for Robots

Brandon Rohrer — Machine Learning in Production for Robots

Weights & Biases

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Nicolas Koumchatzky — Machine Learning in Production for Self-Driving Cars

Weights & Biases

My experiments with Reinforcement Learning with Jariullah Safi

My experiments with Reinforcement Learning with Jariullah Safi

Weights & Biases

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Applications of Machine Learning to COVID-19 Research with Isaac Godfried

Weights & Biases

Testing Machine Learning Models with Eric Schles

Testing Machine Learning Models with Eric Schles

Weights & Biases

How Linear Algebra is not like Algebra with Charles Frye

How Linear Algebra is not like Algebra with Charles Frye

Weights & Biases

Predicting Protein Structures using Deep Learning with Jonathan King

Predicting Protein Structures using Deep Learning with Jonathan King

Weights & Biases

Rachael Tatman — Conversational AI and Linguistics

Rachael Tatman — Conversational AI and Linguistics

Weights & Biases

Reformer by Han Lee

Reformer by Han Lee

Weights & Biases

Sequence Models with Pujaa Rajan

Sequence Models with Pujaa Rajan

Weights & Biases

GitHub Actions & Machine Learning Workflows with Hamel Husain

GitHub Actions & Machine Learning Workflows with Hamel Husain

Weights & Biases

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Look Mom, No Indices! Vector Calculus with the Fréchet Derivative by Charles Frye

Weights & Biases

Jack Clark — Building Trustworthy AI Systems

Jack Clark — Building Trustworthy AI Systems

Weights & Biases

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Surprising Utility of Surprise: Why ML Uses Negative Log Probabilities - Charles Frye

Weights & Biases

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Track your machine learning experiments locally, with W&B Local - Chris Van Pelt

Weights & Biases

Antipatterns in open source research code with Jariullah Safi

Antipatterns in open source research code with Jariullah Safi

Weights & Biases

Attention for time series forecasting & COVID predictions - Isaac Godfried

Attention for time series forecasting & COVID predictions - Isaac Godfried

Weights & Biases

Made with ML - Goku Mohandas

Made with ML - Goku Mohandas

Weights & Biases

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Angela & Danielle — Designing ML Models for Millions of Consumer Robots

Weights & Biases

Deep Learning Salon by Weights & Biases

Deep Learning Salon by Weights & Biases

Weights & Biases

This video teaches how to optimize batch size and learning rate in CNNs to improve model performance and efficiency, covering key concepts like model training, fine-tuning, and hyperparameter tuning.

Key Takeaways

Run experiments on the CFR data set
Pull in the CFR ten data set
Normalize the data and convert the labels into one hot encoded versions of themselves
Set the learning rate to a very small value
Train a CNN with the lower learning rate and the default learning rate
Set learning rate to 0.0001
Increase learning rate to 1.0
Increase batch size to 128
Reduce learning rate on plateau

💡 Reducing learning rate on plateau can help fine-tune the model's performance, and using Carus callback can automatically reduce learning rate for better results.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

Understanding Deep Learning Through Four Interactive Experiments

Explore deep learning concepts through interactive experiments to gain hands-on understanding

Medium · Data Science

Understanding Deep Learning Through Four Interactive Experiments

Explore deep learning through interactive experiments to gain hands-on understanding

Medium · Deep Learning

Optimizers in Deep Learning: From Gradient Descent to Adam

Learn how optimizers in deep learning work, from basic Gradient Descent to advanced Adam optimizer, to improve model training

Medium · Deep Learning

The Meta-Architecture of Interface Fracture: High-Dimensional Logical Stress and Systemic Collapse…

Learn about the meta-architecture of interface fracture and its relation to high-dimensional logical stress and systemic collapse in deep learning systems

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train