Bias Correction of Exponentially Weighted Averages (C2W2L05)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Key Takeaways

The video explains the concept of bias correction in exponentially weighted averages, a technique used in machine learning to improve the accuracy of estimates, particularly during the initial phase of learning. It demonstrates how to implement bias correction using a modified formula, VT divided by 1 minus beta to the power of T, where T is the current data point.

Full Transcript

you've learned how to implement exponentially weighted averages there's one technical detail called bias correction that can make your computation of these averages more accurately let's see how that works in the previous video you saw this figure for beta equals 0.9 this figure for beta equals 0.98 but it turns out that if you implement the formula as written here you won't actually get the green curve when say beta equals 0.98 you actually get the purple curve here and you notice that the purple curve starts off really low so let's see how to fix that when you're implementing a moving average you initialize it with B zero equals zero and then V 1 is equal to 0.98 V zero plus 0.02 theta one but the zero is equal to 0 so that term just goes away so B 1 is just 0.02 times theta 1 so that's why if the first day's temperature is say 40 degrees Fahrenheit then V 1 will be 0.02 times 40 which is 8 so you get a much lower value down here so it's not a very good estimate of the first day's temperature V 2 will be 0.98 times B 1 plus 0.02 times theta 2 and if you plug in you know V 1 which is this down here and multiply it out then you find that B 2 is actually equal to 0.98 times zero point zero 2 times theta 1 plus 0.02 times theta 2 and that is zero point zero one nine six theta one plus zero point zero two theta 2 so again you know assuming if they the 1 and theta 2 a positive numbers when you compute this B 2 will be much less than say they want all theta 2 so B 2 is in a very good estimate of the first two days temperature of the year so it turns out that there's a way to modify the Zestimate that makes it much better and it makes it more accurate especially during this initial phase of your estimate which is that instead of taking VT take VT divided by one minus beta to the power of T where T is the current data you're on so let's take a concrete example when T is equal to 2 1 minus Bay to the power of T is 1 minus 0.98 squared and it turns out that this is 0.0 0.6 and so your estimate of the Thames on day 2 becomes be 2 divided by 0.03 9 6 and this is going to be 0.01 9 6 times theta 1 plus 0.02 Zeta 2 you notice that these two things adds up to the nominator Oh Penelope 9 6 and so this becomes a weighted average of theta 1 and theta 2 and this removes this bias so you notice that dump as T becomes large beta to the T will become will approach 0 which is why when T is large enough the bias correction makes almost no difference this is why when T is large the Purple Line and the green line you are pretty much overlap but during this initial phase of learning when you're still warming up with your estimate when bias correction can help you to obtain a better estimate of temperature and as this bias correction that helps you go from the purple line to the green line so in machine learning for most informations of the exponential weighted average people don't often bother to implement bias Corrections because most people would rather just wait that initial period and a slightly more bias estimate and go from there but we are concerned about the buyers during this initial phase while you're exponentially weighted moving average is the warming up or then bias Corrections can help you get a better estimate early on so that you now know how to implement exponentially weighted moving averages let's go on and use this to build some better optimization algorithms

Original Description

Take the Deep Learning Specialization: http://bit.ly/3cqn45p Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 18 of 60

1 Forward and Backward Propagation (C1W4L06)
Forward and Backward Propagation (C1W4L06)
DeepLearningAI
2 deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
DeepLearningAI
3 deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
DeepLearningAI
4 deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
DeepLearningAI
5 deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
DeepLearningAI
6 deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
DeepLearningAI
7 deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
DeepLearningAI
8 Using an Appropriate Scale (C2W3L02)
Using an Appropriate Scale (C2W3L02)
DeepLearningAI
9 Gradient Checking (C2W1L13)
Gradient Checking (C2W1L13)
DeepLearningAI
10 Gradient Checking Implementation Notes (C2W1L14)
Gradient Checking Implementation Notes (C2W1L14)
DeepLearningAI
11 Learning Rate Decay (C2W2L09)
Learning Rate Decay (C2W2L09)
DeepLearningAI
12 Understanding Mini-Batch Gradient Dexcent (C2W2L02)
Understanding Mini-Batch Gradient Dexcent (C2W2L02)
DeepLearningAI
13 Mini Batch Gradient Descent (C2W2L01)
Mini Batch Gradient Descent (C2W2L01)
DeepLearningAI
14 The Problem of Local Optima (C2W3L10)
The Problem of Local Optima (C2W3L10)
DeepLearningAI
15 Exponentially Weighted Averages (C2W2L03)
Exponentially Weighted Averages (C2W2L03)
DeepLearningAI
16 Tuning Process (C2W3L01)
Tuning Process (C2W3L01)
DeepLearningAI
17 Understanding Exponentially Weighted Averages (C2W2L04)
Understanding Exponentially Weighted Averages (C2W2L04)
DeepLearningAI
Bias Correction of Exponentially Weighted Averages (C2W2L05)
Bias Correction of Exponentially Weighted Averages (C2W2L05)
DeepLearningAI
19 Gradient Descent With Momentum (C2W2L06)
Gradient Descent With Momentum (C2W2L06)
DeepLearningAI
20 Normalizing Activations in a Network (C2W3L04)
Normalizing Activations in a Network (C2W3L04)
DeepLearningAI
21 Hyperparameter Tuning in Practice (C2W3L03)
Hyperparameter Tuning in Practice (C2W3L03)
DeepLearningAI
22 Adam Optimization Algorithm (C2W2L08)
Adam Optimization Algorithm (C2W2L08)
DeepLearningAI
23 RMSProp (C2W2L07)
RMSProp (C2W2L07)
DeepLearningAI
24 Fitting Batch Norm Into Neural Networks (C2W3L05)
Fitting Batch Norm Into Neural Networks (C2W3L05)
DeepLearningAI
25 Why Does Batch Norm Work? (C2W3L06)
Why Does Batch Norm Work? (C2W3L06)
DeepLearningAI
26 Batch Norm At Test Time (C2W3L07)
Batch Norm At Test Time (C2W3L07)
DeepLearningAI
27 Softmax Regression (C2W3L08)
Softmax Regression (C2W3L08)
DeepLearningAI
28 Deep Learning Frameworks (C2W3L10)
Deep Learning Frameworks (C2W3L10)
DeepLearningAI
29 Neural Network Overview (C1W3L01)
Neural Network Overview (C1W3L01)
DeepLearningAI
30 Training Softmax Classifier (C2W3L09)
Training Softmax Classifier (C2W3L09)
DeepLearningAI
31 Why Deep Representations? (C1W4L04)
Why Deep Representations? (C1W4L04)
DeepLearningAI
32 Gradient Descent For Neural Networks (C1W3L09)
Gradient Descent For Neural Networks (C1W3L09)
DeepLearningAI
33 Neural Network Representations (C1W3L02)
Neural Network Representations (C1W3L02)
DeepLearningAI
34 TensorFlow (C2W3L11)
TensorFlow (C2W3L11)
DeepLearningAI
35 Activation Functions (C1W3L06)
Activation Functions (C1W3L06)
DeepLearningAI
36 Explanation For Vectorized Implementation (C1W3L05)
Explanation For Vectorized Implementation (C1W3L05)
DeepLearningAI
37 Getting Matrix Dimensions Right (C1W4L03)
Getting Matrix Dimensions Right (C1W4L03)
DeepLearningAI
38 Understanding Dropout (C2W1L07)
Understanding Dropout (C2W1L07)
DeepLearningAI
39 Building Blocks of a Deep Neural Network (C1W4L05)
Building Blocks of a Deep Neural Network (C1W4L05)
DeepLearningAI
40 Why Non-linear Activation Functions (C1W3L07)
Why Non-linear Activation Functions (C1W3L07)
DeepLearningAI
41 Computing Neural Network Output (C1W3L03)
Computing Neural Network Output (C1W3L03)
DeepLearningAI
42 Backpropagation Intuition (C1W3L10)
Backpropagation Intuition (C1W3L10)
DeepLearningAI
43 Train/Dev/Test Sets (C2W1L01)
Train/Dev/Test Sets (C2W1L01)
DeepLearningAI
44 Deep L-Layer Neural Network (C1W4L01)
Deep L-Layer Neural Network (C1W4L01)
DeepLearningAI
45 Random Initialization (C1W3L11)
Random Initialization (C1W3L11)
DeepLearningAI
46 Other Regularization Methods (C2W1L08)
Other Regularization Methods (C2W1L08)
DeepLearningAI
47 Normalizing Inputs (C2W1L09)
Normalizing Inputs (C2W1L09)
DeepLearningAI
48 Derivatives Of Activation Functions (C1W3L08)
Derivatives Of Activation Functions (C1W3L08)
DeepLearningAI
49 Parameters vs Hyperparameters (C1W4L07)
Parameters vs Hyperparameters (C1W4L07)
DeepLearningAI
50 Vectorizing Across Multiple Examples (C1W3L04)
Vectorizing Across Multiple Examples (C1W3L04)
DeepLearningAI
51 What does this have to do with the brain? (C1W4L08)
What does this have to do with the brain? (C1W4L08)
DeepLearningAI
52 Dropout Regularization (C2W1L06)
Dropout Regularization (C2W1L06)
DeepLearningAI
53 Vanishing/Exploding Gradients (C2W1L10)
Vanishing/Exploding Gradients (C2W1L10)
DeepLearningAI
54 Basic Recipe for Machine Learning (C2W1L03)
Basic Recipe for Machine Learning (C2W1L03)
DeepLearningAI
55 Bias/Variance (C2W1L02)
Bias/Variance (C2W1L02)
DeepLearningAI
56 Forward Propagation in a Deep Network (C1W4L02)
Forward Propagation in a Deep Network (C1W4L02)
DeepLearningAI
57 Weight Initialization in a Deep Network (C2W1L11)
Weight Initialization in a Deep Network (C2W1L11)
DeepLearningAI
58 Numerical Approximations of Gradients (C2W1L12)
Numerical Approximations of Gradients (C2W1L12)
DeepLearningAI
59 Regularization (C2W1L04)
Regularization (C2W1L04)
DeepLearningAI
60 Why Regularization Reduces Overfitting (C2W1L05)
Why Regularization Reduces Overfitting (C2W1L05)
DeepLearningAI

This video teaches how to implement bias correction in exponentially weighted averages to improve the accuracy of estimates in machine learning. It explains the concept of bias correction, demonstrates how to modify the VT formula, and discusses the importance of bias correction during the initial phase of learning. By applying bias correction, learners can build better optimization algorithms and improve their machine learning models.

Key Takeaways
  1. Initialize the moving average with B zero equals zero
  2. Compute V1 using the formula V1 = beta * V0 + (1 - beta) * theta1
  3. Apply bias correction using the modified formula VT divided by 1 minus beta to the power of T
  4. Use the bias-corrected estimate to build optimization algorithms
💡 Bias correction can significantly improve the accuracy of estimates during the initial phase of learning, and it is essential to apply it when using exponentially weighted averages in machine learning.

Related Reads

📰
Gate on what the model can't author (my comment section redesigned my trust model)
Redesign your trust model by identifying features with external sources, as seen in a comment section discussion on an email classifier's scoring system
Dev.to AI
📰
Your gradient dies on the way to layer 1 (and how to save it)
Learn how to address the vanishing gradient problem in deep neural networks and improve training efficiency
Dev.to · Devanshu Biswas
📰
AdaBoost from Scratch: How a Pile of Dumb Rules Becomes a Smart Classifier
Learn how to implement AdaBoost from scratch and understand how it combines weak models to create a strong classifier
Dev.to · Devanshu Biswas
📰
Your Optimizer Spends Its Whole Life One Step From Exploding. On Purpose.
Learn how gradient descent optimizers can explode if not properly managed and why understanding their speed limits is crucial for stable training
Medium · Data Science
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →