Bias Correction of Exponentially Weighted Averages (C2W2L05)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Skills: ML Maths Basics90%ML Pipelines70%

Key Takeaways

The video explains the concept of bias correction in exponentially weighted averages, a technique used in machine learning to improve the accuracy of estimates, particularly during the initial phase of learning. It demonstrates how to implement bias correction using a modified formula, VT divided by 1 minus beta to the power of T, where T is the current data point.

Full Transcript

you've learned how to implement exponentially weighted averages there's one technical detail called bias correction that can make your computation of these averages more accurately let's see how that works in the previous video you saw this figure for beta equals 0.9 this figure for beta equals 0.98 but it turns out that if you implement the formula as written here you won't actually get the green curve when say beta equals 0.98 you actually get the purple curve here and you notice that the purple curve starts off really low so let's see how to fix that when you're implementing a moving average you initialize it with B zero equals zero and then V 1 is equal to 0.98 V zero plus 0.02 theta one but the zero is equal to 0 so that term just goes away so B 1 is just 0.02 times theta 1 so that's why if the first day's temperature is say 40 degrees Fahrenheit then V 1 will be 0.02 times 40 which is 8 so you get a much lower value down here so it's not a very good estimate of the first day's temperature V 2 will be 0.98 times B 1 plus 0.02 times theta 2 and if you plug in you know V 1 which is this down here and multiply it out then you find that B 2 is actually equal to 0.98 times zero point zero 2 times theta 1 plus 0.02 times theta 2 and that is zero point zero one nine six theta one plus zero point zero two theta 2 so again you know assuming if they the 1 and theta 2 a positive numbers when you compute this B 2 will be much less than say they want all theta 2 so B 2 is in a very good estimate of the first two days temperature of the year so it turns out that there's a way to modify the Zestimate that makes it much better and it makes it more accurate especially during this initial phase of your estimate which is that instead of taking VT take VT divided by one minus beta to the power of T where T is the current data you're on so let's take a concrete example when T is equal to 2 1 minus Bay to the power of T is 1 minus 0.98 squared and it turns out that this is 0.0 0.6 and so your estimate of the Thames on day 2 becomes be 2 divided by 0.03 9 6 and this is going to be 0.01 9 6 times theta 1 plus 0.02 Zeta 2 you notice that these two things adds up to the nominator Oh Penelope 9 6 and so this becomes a weighted average of theta 1 and theta 2 and this removes this bias so you notice that dump as T becomes large beta to the T will become will approach 0 which is why when T is large enough the bias correction makes almost no difference this is why when T is large the Purple Line and the green line you are pretty much overlap but during this initial phase of learning when you're still warming up with your estimate when bias correction can help you to obtain a better estimate of temperature and as this bias correction that helps you go from the purple line to the green line so in machine learning for most informations of the exponential weighted average people don't often bother to implement bias Corrections because most people would rather just wait that initial period and a slightly more bias estimate and go from there but we are concerned about the buyers during this initial phase while you're exponentially weighted moving average is the warming up or then bias Corrections can help you get a better estimate early on so that you now know how to implement exponentially weighted moving averages let's go on and use this to build some better optimization algorithms

Original Description

Take the Deep Learning Specialization: http://bit.ly/3cqn45p Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 18 of 60

← Previous Next →

Forward and Backward Propagation (C1W4L06)

Forward and Backward Propagation (C1W4L06)

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

Using an Appropriate Scale (C2W3L02)

Using an Appropriate Scale (C2W3L02)

Gradient Checking (C2W1L13)

Gradient Checking (C2W1L13)

Gradient Checking Implementation Notes (C2W1L14)

Gradient Checking Implementation Notes (C2W1L14)

Learning Rate Decay (C2W2L09)

Learning Rate Decay (C2W2L09)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Mini Batch Gradient Descent (C2W2L01)

Mini Batch Gradient Descent (C2W2L01)

The Problem of Local Optima (C2W3L10)

The Problem of Local Optima (C2W3L10)

Exponentially Weighted Averages (C2W2L03)

Exponentially Weighted Averages (C2W2L03)

Tuning Process (C2W3L01)

Tuning Process (C2W3L01)

Understanding Exponentially Weighted Averages (C2W2L04)

Understanding Exponentially Weighted Averages (C2W2L04)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Gradient Descent With Momentum (C2W2L06)

Gradient Descent With Momentum (C2W2L06)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Hyperparameter Tuning in Practice (C2W3L03)

Hyperparameter Tuning in Practice (C2W3L03)

Adam Optimization Algorithm (C2W2L08)

Adam Optimization Algorithm (C2W2L08)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

Batch Norm At Test Time (C2W3L07)

Batch Norm At Test Time (C2W3L07)

Softmax Regression (C2W3L08)

Softmax Regression (C2W3L08)

Deep Learning Frameworks (C2W3L10)

Deep Learning Frameworks (C2W3L10)

Neural Network Overview (C1W3L01)

Neural Network Overview (C1W3L01)

Training Softmax Classifier (C2W3L09)

Training Softmax Classifier (C2W3L09)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Gradient Descent For Neural Networks (C1W3L09)

Gradient Descent For Neural Networks (C1W3L09)

Neural Network Representations (C1W3L02)

Neural Network Representations (C1W3L02)

TensorFlow (C2W3L11)

TensorFlow (C2W3L11)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Explanation For Vectorized Implementation (C1W3L05)

Explanation For Vectorized Implementation (C1W3L05)

Getting Matrix Dimensions Right (C1W4L03)

Getting Matrix Dimensions Right (C1W4L03)

Understanding Dropout (C2W1L07)

Understanding Dropout (C2W1L07)

Building Blocks of a Deep Neural Network (C1W4L05)

Building Blocks of a Deep Neural Network (C1W4L05)

Why Non-linear Activation Functions (C1W3L07)

Why Non-linear Activation Functions (C1W3L07)

Computing Neural Network Output (C1W3L03)

Computing Neural Network Output (C1W3L03)

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

Train/Dev/Test Sets (C2W1L01)

Train/Dev/Test Sets (C2W1L01)

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Normalizing Inputs (C2W1L09)

Normalizing Inputs (C2W1L09)

Derivatives Of Activation Functions (C1W3L08)

Derivatives Of Activation Functions (C1W3L08)

Parameters vs Hyperparameters (C1W4L07)

Parameters vs Hyperparameters (C1W4L07)

Vectorizing Across Multiple Examples (C1W3L04)

Vectorizing Across Multiple Examples (C1W3L04)

What does this have to do with the brain? (C1W4L08)

What does this have to do with the brain? (C1W4L08)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

Basic Recipe for Machine Learning (C2W1L03)

Basic Recipe for Machine Learning (C2W1L03)

Bias/Variance (C2W1L02)

Bias/Variance (C2W1L02)

Forward Propagation in a Deep Network (C1W4L02)

Forward Propagation in a Deep Network (C1W4L02)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

Numerical Approximations of Gradients (C2W1L12)

Numerical Approximations of Gradients (C2W1L12)

Regularization (C2W1L04)

Regularization (C2W1L04)

Why Regularization Reduces Overfitting (C2W1L05)

Why Regularization Reduces Overfitting (C2W1L05)

This video teaches how to implement bias correction in exponentially weighted averages to improve the accuracy of estimates in machine learning. It explains the concept of bias correction, demonstrates how to modify the VT formula, and discusses the importance of bias correction during the initial phase of learning. By applying bias correction, learners can build better optimization algorithms and improve their machine learning models.

Key Takeaways

Initialize the moving average with B zero equals zero
Compute V1 using the formula V1 = beta * V0 + (1 - beta) * theta1
Apply bias correction using the modified formula VT divided by 1 minus beta to the power of T
Use the bias-corrected estimate to build optimization algorithms

💡 Bias correction can significantly improve the accuracy of estimates during the initial phase of learning, and it is essential to apply it when using exponentially weighted averages in machine learning.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

Gate on what the model can't author (my comment section redesigned my trust model)

Redesign your trust model by identifying features with external sources, as seen in a comment section discussion on an email classifier's scoring system

Your gradient dies on the way to layer 1 (and how to save it)

Learn how to address the vanishing gradient problem in deep neural networks and improve training efficiency

Dev.to · Devanshu Biswas

AdaBoost from Scratch: How a Pile of Dumb Rules Becomes a Smart Classifier

Learn how to implement AdaBoost from scratch and understand how it combines weak models to create a strong classifier

Dev.to · Devanshu Biswas

Your Optimizer Spends Its Whole Life One Step From Exploding. On Purpose.

Learn how gradient descent optimizers can explode if not properly managed and why understanding their speed limits is crucial for stable training

Medium · Data Science

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB