Gradient Checking Implementation Notes (C2W1L14)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Key Takeaways

The video discusses practical tips for implementing gradient checking in neural networks, including using backpropagation to compute derivatives, turning off gradient checking during training, and debugging individual components of the gradient approximation. It also covers the importance of including regularization terms and handling dropout layers.

Full Transcript

in the last video you learned about gradient checking in this video I want to share you some practical tips or some notes on how to actually go about implementing this for your neural network first don't use graduating training or me to debug so what I mean is that computing D theta or procs eyes all the values of ID is a very slow computation so to implement gradient descent use backprop to compute D theta and just use back prop to compute the derivative and as only when you're debugging that you would compute this to Mitchell as close to D theta but once you've done that then you would turn off the grant check and don't run this during every iteration being a sentence it's just much too slow second if the never fails brag check look at the components look at the individual components to try to identify the bug so what I mean by that is is the faith in aprox is very far from DSA so what I would do is look at the different values of I to see which are the values of D theta aprox they're really very different than the values of D theta so for example um if you find that the values of theta or D theta they're very far off all corresponding to D BL for some layer or for some layers but the components for DW are quite close right remember different components of theta correspond to different components of B P and W but you find this is the case then maybe you find that some the bug is in how you're computing DP the derivative respect to parameters B and then similarly vice versa we find that the values they're very far you know the values from D theta aprox that are very far from D theta and you find that all those components came from GW or from GW and certain layer then that might help you hone in on the location of the bug there doesn't always let you identify the bug right away but sometimes it helps you give you some guesses about other where they track down the bug next um when doing grad check remember your regularization term if you're using regularization so if your cost function is J of theta equals 1 over m sum of your losses um and then plus this regularization term right some of the hell of wll Frobenius norm squared then this is the definition of J and you should have that D theta is gradients of J or respect to theta including the regularization term so just remember to include that term next Grouch egg doesn't work with dropouts because in every iteration dropout is randomly eliminating different subsets or the fit in humans there isn't a easy to compute cost function J the dropout is doing gradient descent on it turns out that dropout can be viewed as optimizing some cost function J but its cost function J is defined by summing over all exponentially large subsets of nodes they could eliminate in any iteration so the cost function J is very difficult to compute menu just sampling the cost function every time you live in a different random subset and military use gravel so it's difficult to use grad chair to double-check your computation with dropouts so what I usually do is implement grad check without dropout so you if you want in set key prop in dropout to be equal to 1.0 and then turn on dropout and hope that my implementation of dropout was correct there are some other things you could do like fix the pattern of nose dropped and verify that grad check for that a pattern of unis killed off is correct but in practice I don't usually do that so my recommendation is turn off dropout use drag check to double-check that your algorithm is at least correct without dropout and then turn on dropout so finally this is the subtlety it is not impossible rarely happens with not impossible that your implementation of gradient descent is correct when W and B are close to zero so at random initialization but that as you run grain descent and W and B become bigger maybe your implementation of back prop is correct only when W and B is close to 0 but it gives more inaccurate when W and B become large so one thing you could do I don't do this very often but one thing you could do is run drag check your randomness elevation and then train the network for a while so the wmb had some time to wonder away from zero from the small random initial values and then run drat check again after you've trained for some number of innovations so that's it so gradient checking and congratulations are coming to the end of this week's materials in this week you learned about how to set up your trained jab intersect how to analyze bias and variance and what things to do if you have high bias and Siberians versus maybe high by 9 high variance you also saw how to apply different forms of regularization like l2 regularization and drop on your neural network so some tricks for speeding up the training video network and then finally gradient checking so I think you've seen a lot in this week and you get to exercise all these ideas in this week's program exercise so best of luck exact and I look forward to seeing you in the week 2 materials

Original Description

Take the Deep Learning Specialization: http://bit.ly/2VGFA3w Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 10 of 60

1 Forward and Backward Propagation (C1W4L06)
Forward and Backward Propagation (C1W4L06)
DeepLearningAI
2 deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
DeepLearningAI
3 deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
DeepLearningAI
4 deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
DeepLearningAI
5 deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
DeepLearningAI
6 deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
DeepLearningAI
7 deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
DeepLearningAI
8 Using an Appropriate Scale (C2W3L02)
Using an Appropriate Scale (C2W3L02)
DeepLearningAI
9 Gradient Checking (C2W1L13)
Gradient Checking (C2W1L13)
DeepLearningAI
Gradient Checking Implementation Notes (C2W1L14)
Gradient Checking Implementation Notes (C2W1L14)
DeepLearningAI
11 Learning Rate Decay (C2W2L09)
Learning Rate Decay (C2W2L09)
DeepLearningAI
12 Understanding Mini-Batch Gradient Dexcent (C2W2L02)
Understanding Mini-Batch Gradient Dexcent (C2W2L02)
DeepLearningAI
13 Mini Batch Gradient Descent (C2W2L01)
Mini Batch Gradient Descent (C2W2L01)
DeepLearningAI
14 The Problem of Local Optima (C2W3L10)
The Problem of Local Optima (C2W3L10)
DeepLearningAI
15 Exponentially Weighted Averages (C2W2L03)
Exponentially Weighted Averages (C2W2L03)
DeepLearningAI
16 Tuning Process (C2W3L01)
Tuning Process (C2W3L01)
DeepLearningAI
17 Understanding Exponentially Weighted Averages (C2W2L04)
Understanding Exponentially Weighted Averages (C2W2L04)
DeepLearningAI
18 Bias Correction of Exponentially Weighted Averages (C2W2L05)
Bias Correction of Exponentially Weighted Averages (C2W2L05)
DeepLearningAI
19 Gradient Descent With Momentum (C2W2L06)
Gradient Descent With Momentum (C2W2L06)
DeepLearningAI
20 Normalizing Activations in a Network (C2W3L04)
Normalizing Activations in a Network (C2W3L04)
DeepLearningAI
21 Hyperparameter Tuning in Practice (C2W3L03)
Hyperparameter Tuning in Practice (C2W3L03)
DeepLearningAI
22 Adam Optimization Algorithm (C2W2L08)
Adam Optimization Algorithm (C2W2L08)
DeepLearningAI
23 RMSProp (C2W2L07)
RMSProp (C2W2L07)
DeepLearningAI
24 Fitting Batch Norm Into Neural Networks (C2W3L05)
Fitting Batch Norm Into Neural Networks (C2W3L05)
DeepLearningAI
25 Why Does Batch Norm Work? (C2W3L06)
Why Does Batch Norm Work? (C2W3L06)
DeepLearningAI
26 Batch Norm At Test Time (C2W3L07)
Batch Norm At Test Time (C2W3L07)
DeepLearningAI
27 Softmax Regression (C2W3L08)
Softmax Regression (C2W3L08)
DeepLearningAI
28 Deep Learning Frameworks (C2W3L10)
Deep Learning Frameworks (C2W3L10)
DeepLearningAI
29 Neural Network Overview (C1W3L01)
Neural Network Overview (C1W3L01)
DeepLearningAI
30 Training Softmax Classifier (C2W3L09)
Training Softmax Classifier (C2W3L09)
DeepLearningAI
31 Why Deep Representations? (C1W4L04)
Why Deep Representations? (C1W4L04)
DeepLearningAI
32 Gradient Descent For Neural Networks (C1W3L09)
Gradient Descent For Neural Networks (C1W3L09)
DeepLearningAI
33 Neural Network Representations (C1W3L02)
Neural Network Representations (C1W3L02)
DeepLearningAI
34 TensorFlow (C2W3L11)
TensorFlow (C2W3L11)
DeepLearningAI
35 Activation Functions (C1W3L06)
Activation Functions (C1W3L06)
DeepLearningAI
36 Explanation For Vectorized Implementation (C1W3L05)
Explanation For Vectorized Implementation (C1W3L05)
DeepLearningAI
37 Getting Matrix Dimensions Right (C1W4L03)
Getting Matrix Dimensions Right (C1W4L03)
DeepLearningAI
38 Understanding Dropout (C2W1L07)
Understanding Dropout (C2W1L07)
DeepLearningAI
39 Building Blocks of a Deep Neural Network (C1W4L05)
Building Blocks of a Deep Neural Network (C1W4L05)
DeepLearningAI
40 Why Non-linear Activation Functions (C1W3L07)
Why Non-linear Activation Functions (C1W3L07)
DeepLearningAI
41 Computing Neural Network Output (C1W3L03)
Computing Neural Network Output (C1W3L03)
DeepLearningAI
42 Backpropagation Intuition (C1W3L10)
Backpropagation Intuition (C1W3L10)
DeepLearningAI
43 Train/Dev/Test Sets (C2W1L01)
Train/Dev/Test Sets (C2W1L01)
DeepLearningAI
44 Deep L-Layer Neural Network (C1W4L01)
Deep L-Layer Neural Network (C1W4L01)
DeepLearningAI
45 Random Initialization (C1W3L11)
Random Initialization (C1W3L11)
DeepLearningAI
46 Other Regularization Methods (C2W1L08)
Other Regularization Methods (C2W1L08)
DeepLearningAI
47 Normalizing Inputs (C2W1L09)
Normalizing Inputs (C2W1L09)
DeepLearningAI
48 Derivatives Of Activation Functions (C1W3L08)
Derivatives Of Activation Functions (C1W3L08)
DeepLearningAI
49 Parameters vs Hyperparameters (C1W4L07)
Parameters vs Hyperparameters (C1W4L07)
DeepLearningAI
50 Vectorizing Across Multiple Examples (C1W3L04)
Vectorizing Across Multiple Examples (C1W3L04)
DeepLearningAI
51 What does this have to do with the brain? (C1W4L08)
What does this have to do with the brain? (C1W4L08)
DeepLearningAI
52 Dropout Regularization (C2W1L06)
Dropout Regularization (C2W1L06)
DeepLearningAI
53 Vanishing/Exploding Gradients (C2W1L10)
Vanishing/Exploding Gradients (C2W1L10)
DeepLearningAI
54 Basic Recipe for Machine Learning (C2W1L03)
Basic Recipe for Machine Learning (C2W1L03)
DeepLearningAI
55 Bias/Variance (C2W1L02)
Bias/Variance (C2W1L02)
DeepLearningAI
56 Forward Propagation in a Deep Network (C1W4L02)
Forward Propagation in a Deep Network (C1W4L02)
DeepLearningAI
57 Weight Initialization in a Deep Network (C2W1L11)
Weight Initialization in a Deep Network (C2W1L11)
DeepLearningAI
58 Numerical Approximations of Gradients (C2W1L12)
Numerical Approximations of Gradients (C2W1L12)
DeepLearningAI
59 Regularization (C2W1L04)
Regularization (C2W1L04)
DeepLearningAI
60 Why Regularization Reduces Overfitting (C2W1L05)
Why Regularization Reduces Overfitting (C2W1L05)
DeepLearningAI

This video provides practical tips for implementing gradient checking in neural networks, including using backpropagation and handling regularization and dropout layers. It also covers debugging techniques for identifying issues in the gradient approximation.

Key Takeaways
  1. Use backpropagation to compute derivatives
  2. Turn off gradient checking during training
  3. Debug individual components of the gradient approximation
  4. Include regularization terms in the cost function
  5. Handle dropout layers by setting dropout to 1.0
💡 Gradient checking can help identify issues in the gradient approximation, but it can be slow and may not work with dropout layers.

Related AI Lessons

Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →