Batch Norm At Test Time (C2W3L07)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Key Takeaways

Batch normalization at test time using exponentially weighted average, implementing neural networks to process single examples at a time

Full Transcript

bachelor on processes or data one me me batch at the time but at times you might need to process the examples one at a time let's see how you can adapt your network to do that recall that during training here are the equations you just implement national within a single mini-batches sum over that mini batch of the zi values to compute the mean so here you're just summing over the examples in one mini batch I'm using M to denote the number of examples in the mini batch not not in the whole training set then you compute the variance and then you compute the norm by scaling by the mean and standard deviation what that's on added for numerical stability and then V tilde is taking Z norm and rescaling by gamma and beta so notice that mu and Sigma squared which you need for this scaling calculation are computed on the entire mini value but at times you might not have a mini batch of 64 128 alternative Pacific examples to process at the same time so you need some different way of coming up with mu and Sigma squared and if just one example taking the mean and variance of that one example doesn't make sense so what's actually done in order to apply your neural network at test time is to come up with some separate estimate of mu and Sigma squared and in typical implementations of national what you do is estimate this using a exponentially weighted average where the average is across the mini batches so to be very concrete here's what I mean let's pick some layer L and let's say you're going through mini batches x1 x2 together with the corresponding values of Y and so on so when training on x1 for that layer L you get some new L and in fact I'm going to write this as new for the first mini batch and that lane and then when you train on the second mini batch for that layer and that mean about you and there was some second value of you and then for the third mini batch in this hidden layer you end up with some third value for MU so just as means for how to use the exponentially weighted average to compute the mean of theta1 theta2 theta3 when you are trying to compute a exponentially weighted average of the current temperature you will do that to keep track of so what's the latest average value of this mean vector your seat so that exponentially weighted average becomes your estimate for what the mean of the B's is for that hidden layer and similarly you'd use an exponentially weighted average to keep track of these values of Sigma squared that you see on the first mini batch in that layer Sigma squared then you see on a second mini batch and so on so you keep a running average of the MU and the Sigma square that you're seeing for each layer as you train the neural network across different mini batches then finally at test time what you do is in place of this equation you would just compute Z norm using whatever value you see you have and using your exponentially weighted average of the MU and Sigma squared whatever was the latest value you have to do the scaling here and then you would compute each other on your one test example using that Z norm that we just computed on the left and using the beta and gamma parameters then you'll you have learned during your neural network training process so the takeaway from this is that during training time mu and Sigma squared are computed on an entire mini batch of you know say 64 and June 28 or some number of examples but at test time you might need to process a single example at a time so the way to do that is to estimate mu and Sigma squared from your training and there many ways to do that you couldn't clearly run your whole training set through your final network to get mu and Sigma squared but in practice what people usually do is implement an exponentially weighted average where you just keep track of the new and Sigma squared values you've seen during training and use an exponentially weighted average also sometimes called a running average to just get a rough estimate of mu and Sigma squared and then you use those values of MU and Sigma square that test time to do the scaling you need of the hidden unit values z in practice this process is pretty robust to the exact way you use to estimate mu and Sigma squared so I wouldn't worry too much about exactly how you do this and if you're using a deep learning framework they'll usually have some default way to estimate mu and Sigma squared tension work reasonably well as well but in practice any reasonable way to estimate the mean and variance of your hidden unit values of Z should work fine and test so that's it - dome and using it I think you'll be able to train much deeper networks and get your learning album to run much more quickly before we wrap up for this video I want to share you some thoughts on deep learning frameworks as well let's start to talk about that in the next video

Original Description

Take the Deep Learning Specialization: http://bit.ly/2vBGGmD Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 26 of 60

1 Forward and Backward Propagation (C1W4L06)
Forward and Backward Propagation (C1W4L06)
DeepLearningAI
2 deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
DeepLearningAI
3 deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
DeepLearningAI
4 deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
DeepLearningAI
5 deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
DeepLearningAI
6 deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
DeepLearningAI
7 deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
DeepLearningAI
8 Using an Appropriate Scale (C2W3L02)
Using an Appropriate Scale (C2W3L02)
DeepLearningAI
9 Gradient Checking (C2W1L13)
Gradient Checking (C2W1L13)
DeepLearningAI
10 Gradient Checking Implementation Notes (C2W1L14)
Gradient Checking Implementation Notes (C2W1L14)
DeepLearningAI
11 Learning Rate Decay (C2W2L09)
Learning Rate Decay (C2W2L09)
DeepLearningAI
12 Understanding Mini-Batch Gradient Dexcent (C2W2L02)
Understanding Mini-Batch Gradient Dexcent (C2W2L02)
DeepLearningAI
13 Mini Batch Gradient Descent (C2W2L01)
Mini Batch Gradient Descent (C2W2L01)
DeepLearningAI
14 The Problem of Local Optima (C2W3L10)
The Problem of Local Optima (C2W3L10)
DeepLearningAI
15 Exponentially Weighted Averages (C2W2L03)
Exponentially Weighted Averages (C2W2L03)
DeepLearningAI
16 Tuning Process (C2W3L01)
Tuning Process (C2W3L01)
DeepLearningAI
17 Understanding Exponentially Weighted Averages (C2W2L04)
Understanding Exponentially Weighted Averages (C2W2L04)
DeepLearningAI
18 Bias Correction of Exponentially Weighted Averages (C2W2L05)
Bias Correction of Exponentially Weighted Averages (C2W2L05)
DeepLearningAI
19 Gradient Descent With Momentum (C2W2L06)
Gradient Descent With Momentum (C2W2L06)
DeepLearningAI
20 Normalizing Activations in a Network (C2W3L04)
Normalizing Activations in a Network (C2W3L04)
DeepLearningAI
21 Hyperparameter Tuning in Practice (C2W3L03)
Hyperparameter Tuning in Practice (C2W3L03)
DeepLearningAI
22 Adam Optimization Algorithm (C2W2L08)
Adam Optimization Algorithm (C2W2L08)
DeepLearningAI
23 RMSProp (C2W2L07)
RMSProp (C2W2L07)
DeepLearningAI
24 Fitting Batch Norm Into Neural Networks (C2W3L05)
Fitting Batch Norm Into Neural Networks (C2W3L05)
DeepLearningAI
25 Why Does Batch Norm Work? (C2W3L06)
Why Does Batch Norm Work? (C2W3L06)
DeepLearningAI
Batch Norm At Test Time (C2W3L07)
Batch Norm At Test Time (C2W3L07)
DeepLearningAI
27 Softmax Regression (C2W3L08)
Softmax Regression (C2W3L08)
DeepLearningAI
28 Deep Learning Frameworks (C2W3L10)
Deep Learning Frameworks (C2W3L10)
DeepLearningAI
29 Neural Network Overview (C1W3L01)
Neural Network Overview (C1W3L01)
DeepLearningAI
30 Training Softmax Classifier (C2W3L09)
Training Softmax Classifier (C2W3L09)
DeepLearningAI
31 Why Deep Representations? (C1W4L04)
Why Deep Representations? (C1W4L04)
DeepLearningAI
32 Gradient Descent For Neural Networks (C1W3L09)
Gradient Descent For Neural Networks (C1W3L09)
DeepLearningAI
33 Neural Network Representations (C1W3L02)
Neural Network Representations (C1W3L02)
DeepLearningAI
34 TensorFlow (C2W3L11)
TensorFlow (C2W3L11)
DeepLearningAI
35 Activation Functions (C1W3L06)
Activation Functions (C1W3L06)
DeepLearningAI
36 Explanation For Vectorized Implementation (C1W3L05)
Explanation For Vectorized Implementation (C1W3L05)
DeepLearningAI
37 Getting Matrix Dimensions Right (C1W4L03)
Getting Matrix Dimensions Right (C1W4L03)
DeepLearningAI
38 Understanding Dropout (C2W1L07)
Understanding Dropout (C2W1L07)
DeepLearningAI
39 Building Blocks of a Deep Neural Network (C1W4L05)
Building Blocks of a Deep Neural Network (C1W4L05)
DeepLearningAI
40 Why Non-linear Activation Functions (C1W3L07)
Why Non-linear Activation Functions (C1W3L07)
DeepLearningAI
41 Computing Neural Network Output (C1W3L03)
Computing Neural Network Output (C1W3L03)
DeepLearningAI
42 Backpropagation Intuition (C1W3L10)
Backpropagation Intuition (C1W3L10)
DeepLearningAI
43 Train/Dev/Test Sets (C2W1L01)
Train/Dev/Test Sets (C2W1L01)
DeepLearningAI
44 Deep L-Layer Neural Network (C1W4L01)
Deep L-Layer Neural Network (C1W4L01)
DeepLearningAI
45 Random Initialization (C1W3L11)
Random Initialization (C1W3L11)
DeepLearningAI
46 Other Regularization Methods (C2W1L08)
Other Regularization Methods (C2W1L08)
DeepLearningAI
47 Normalizing Inputs (C2W1L09)
Normalizing Inputs (C2W1L09)
DeepLearningAI
48 Derivatives Of Activation Functions (C1W3L08)
Derivatives Of Activation Functions (C1W3L08)
DeepLearningAI
49 Parameters vs Hyperparameters (C1W4L07)
Parameters vs Hyperparameters (C1W4L07)
DeepLearningAI
50 Vectorizing Across Multiple Examples (C1W3L04)
Vectorizing Across Multiple Examples (C1W3L04)
DeepLearningAI
51 What does this have to do with the brain? (C1W4L08)
What does this have to do with the brain? (C1W4L08)
DeepLearningAI
52 Dropout Regularization (C2W1L06)
Dropout Regularization (C2W1L06)
DeepLearningAI
53 Vanishing/Exploding Gradients (C2W1L10)
Vanishing/Exploding Gradients (C2W1L10)
DeepLearningAI
54 Basic Recipe for Machine Learning (C2W1L03)
Basic Recipe for Machine Learning (C2W1L03)
DeepLearningAI
55 Bias/Variance (C2W1L02)
Bias/Variance (C2W1L02)
DeepLearningAI
56 Forward Propagation in a Deep Network (C1W4L02)
Forward Propagation in a Deep Network (C1W4L02)
DeepLearningAI
57 Weight Initialization in a Deep Network (C2W1L11)
Weight Initialization in a Deep Network (C2W1L11)
DeepLearningAI
58 Numerical Approximations of Gradients (C2W1L12)
Numerical Approximations of Gradients (C2W1L12)
DeepLearningAI
59 Regularization (C2W1L04)
Regularization (C2W1L04)
DeepLearningAI
60 Why Regularization Reduces Overfitting (C2W1L05)
Why Regularization Reduces Overfitting (C2W1L05)
DeepLearningAI

Batch normalization is used to normalize the input data for each layer in a neural network, and at test time, we need to estimate the mean and variance of the hidden unit values using an exponentially weighted average. This allows us to process single examples at a time, and it's a crucial step in implementing neural networks. In this video, we learn how to implement batch normalization at test time using an exponentially weighted average, and how to use deep learning frameworks to make this pro

Key Takeaways
  1. Compute the mean and variance of the hidden unit values for each mini batch during training
  2. Use an exponentially weighted average to estimate the mean and variance of the hidden unit values
  3. Implement batch normalization at test time using the estimated mean and variance
  4. Process single examples at a time using the implemented batch normalization
💡 Using an exponentially weighted average to estimate the mean and variance of the hidden unit values is a robust and efficient way to implement batch normalization at test time

Related AI Lessons

Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →