8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)

Sebastian Raschka · Beginner ·📐 ML Fundamentals ·5y ago

Skills: ML Maths Basics80%Supervised Learning70%

Key Takeaways

This video discusses the relationship between bias-variance decomposition and overfitting-underfitting in machine learning, covering key concepts such as model capacity, training error, and generalization error, using tools like parametric models and decision trees.

Full Transcript

yes so in this video i intend to break a record in this course i will make this hopefully the shortest video so far so i don't want to torture you too long with this topic i only have two slides so it can't be that long so in this video we are going to take a look at the relationship between a bias variance decomposition so the bias and variance terms and the terms of under fitting and overfitting okay with this slide it's actually three slides but yeah we are now looking at how is the bias variance decomposition related to overfitting and under fitting so recall this little figure here i've shown you earlier in this uh lecture so yeah there are two reasons why this is such a crude drawing one reason is efficiency it's pretty tedious to draw something like that in um keynote or powerpoint um the second reason is that it's also indicating that this is just like an a sketch approximation of something this is not like based on real numbers and in practice you will never see a very smooth or perfect plot or relationship between these terms shown here so in practice it will also be a noisy process if you do this for a real world data set and yeah so what we see here is the error for example the squared error loss plotted against the capacity of the model and we talked about this before capacity is basically how well the model is capable of fitting a training set so capacity is basically something like complexity so the higher the capacity the higher the capability of the model to fit the data well usually in many contexts in the contexts of parametric models like regression models polynomial regression and so forth capacity also relates to the number of parameters or terms so here what we have is the training error and the larger the capacity of the model the lower the training error because yeah the more complex the model is the better it will be able to fit the training data for example think of short decision trees and very deep decision trees so short decision trees versus deep decision trees deep decision trees will be able to fit the data better so the training error will go down however having a good or low training error doesn't mean it will perform well on new data because it can happen that we fit the training data too closely and then yeah the error on new data for example as measured on the test set will actually increase and that is um measured by the generalization error for example the generalization error can be estimated from an independent test set so and the generalization error um first improves somewhat if the capacity becomes larger because if the model is too simple a short decision tree will neither be able to fit the training set well nor perform well on new data because it's just too simple of a model if we make the model capacity larger the model error will decrease also on new data however the larger the capacity becomes the larger the error also will become after some inflection point and the gap in general the gap between the training error and the generalization error that is considered as the degree of overfitting so it's by how much the model overfits and in this region is the overfitting is increasing the gap is increasing because the model fits the data too closely it fits noise in the data for example and then it won't be able to generalize well to new data okay so now how is that related to the concepts of bias and variants so here i added two new mod terms the variance and bias variance here in red so the larger the capacity of the model the more complex the model for example as i've shown you also in the graphs this could be a deep decision tree and this is again a short decision tree on this end it's exactly the same graph as on this previous slide except now that i have also the variance and bias shown here so if the capacity increases the variance will increase right so that is what we've seen for deep decision trees also when we did the bias variance decomposition and compare it for example to a backing classifier or a begging model where we averaged emerging will reduce the variance but here we are only talking about a single model and you will see that the higher the capacity of the model the more complex the model the higher the variance and this will then relate to the degree of overfitting so models with high variance are more prone to overfitting so here if we have this inflection point again overfitting increases when we go to the right and the degree of overfitting is again the gap between the training error here the black line and the generalization another green line so here this gap is the degree of overfitting and it increases as the variance increases usually and vice versa the higher the variance becomes the lower the bias will become so a more complex model will usually have a lower bias so yes it looks maybe a little bit misleading it looks like it goes down and then up again it's just a bad drawing it should just go down or converge basically asymptotically um so the bias will go down while the variance goes up if the capacity increases and vice versa the bias will be large if we have a low capacity and a large bias you can see is also then at the same end here as the high degree of under fitting so when we go here to the left when we make the model too simple it will under fit the data so it will not perform well on the training data but it also won't perform well on the on the test set so what happens is it will perform badly on both training and test set and yeah this has also some relationship to having a high bias so yeah high bias is which we correlated to under fitting high end of hitting and high variance is correlated to overfitting okay so that's it for the relationship between overfitting and underfitting to bias and variants yeah so in the next video we will also take a brief look at the bias variance decomposition of the zero oneness it's more like i would say an optional topic it's um closer to our classification context but it's somewhat more i would say um less intuitive as decomposing the squared error loss there are some it's more like a workaround but we will see more about that in the next video

Original Description

Sebastian's books: https://sebastianraschka.com/books/ This brief video discusses the connection between bias & variance and overfitting & underfitting. ------- This video is part of my Introduction of Machine Learning course. Next video: https://youtu.be/IvHZ4-yd5is The complete playlist: https://www.youtube.com/playlist?list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3 A handy overview page with links to the materials: https://sebastianraschka.com/blog/2021/ml-course.html ------- If you want to be notified about future videos, please consider subscribing to my channel: https://youtube.com/c/SebastianRaschka

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sebastian Raschka · Sebastian Raschka · 60 of 60

← Previous Next →

Intro to Deep Learning -- L06.5 Cloud Computing [Stat453, SS20]

Intro to Deep Learning -- L06.5 Cloud Computing [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L09 Regularization [Stat453, SS20]

Intro to Deep Learning -- L09 Regularization [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L10 Input and Weight Normalization Part 1/2 [Stat453, SS20]

Intro to Deep Learning -- L10 Input and Weight Normalization Part 1/2 [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L10 Input and Weight Normalization Part 2/2 [Stat453, SS20]

Intro to Deep Learning -- L10 Input and Weight Normalization Part 2/2 [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L11 Common Optimization Algorithms [Stat453, SS20]

Intro to Deep Learning -- L11 Common Optimization Algorithms [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L12 Intro to Convolutional Neural Networks (Part 1) [Stat453, SS20]

Intro to Deep Learning -- L12 Intro to Convolutional Neural Networks (Part 1) [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 1/2 [Stat453, SS20]

Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 1/2 [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 2/2 [Stat453, SS20]

Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 2/2 [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L14 Intro to Recurrent Neural Networks [Stat453, SS20]

Intro to Deep Learning -- L14 Intro to Recurrent Neural Networks [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L15 Autoencoders [Stat453, SS20]

Intro to Deep Learning -- L15 Autoencoders [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- L16 Generative Adversarial Networks [Stat453, SS20]

Intro to Deep Learning -- L16 Generative Adversarial Networks [Stat453, SS20]

Sebastian Raschka

Intro to Deep Learning -- Student Presentations, Day 1 [Stat453, SS20]

Intro to Deep Learning -- Student Presentations, Day 1 [Stat453, SS20]

Sebastian Raschka

1.2 What is Machine Learning (L01: What is Machine Learning)

1.2 What is Machine Learning (L01: What is Machine Learning)

Sebastian Raschka

1.3 Categories of Machine Learning (L01: What is Machine Learning)

1.3 Categories of Machine Learning (L01: What is Machine Learning)

Sebastian Raschka

1.4 Notation (L01: What is Machine Learning)

1.4 Notation (L01: What is Machine Learning)

Sebastian Raschka

1.1 Course overview (L01: What is Machine Learning)

1.1 Course overview (L01: What is Machine Learning)

Sebastian Raschka

1.5 ML application (L01: What is Machine Learning)

1.5 ML application (L01: What is Machine Learning)

Sebastian Raschka

1.6 ML motivation (L01: What is Machine Learning)

1.6 ML motivation (L01: What is Machine Learning)

Sebastian Raschka

2.1 Introduction to NN (L02: Nearest Neighbor Methods)

2.1 Introduction to NN (L02: Nearest Neighbor Methods)

Sebastian Raschka

2.2 Nearest neighbor decision boundary (L02: Nearest Neighbor Methods)

2.2 Nearest neighbor decision boundary (L02: Nearest Neighbor Methods)

Sebastian Raschka

2.3 K-nearest neighbors (L02: Nearest Neighbor Methods)

2.3 K-nearest neighbors (L02: Nearest Neighbor Methods)

Sebastian Raschka

2.4 Big O of K-nearest neighbors (L02: Nearest Neighbor Methods)

2.4 Big O of K-nearest neighbors (L02: Nearest Neighbor Methods)

Sebastian Raschka

2.5 Improving k-nearest neighbors (L02: Nearest Neighbor Methods)

2.5 Improving k-nearest neighbors (L02: Nearest Neighbor Methods)

Sebastian Raschka

2.6 K-nearest neighbors in Python (L02: Nearest Neighbor Methods)

2.6 K-nearest neighbors in Python (L02: Nearest Neighbor Methods)

Sebastian Raschka

3.1 (Optional) Python overview

3.1 (Optional) Python overview

Sebastian Raschka

3.2 (Optional) Python setup

3.2 (Optional) Python setup

Sebastian Raschka

3.3 (Optional) Running Python code

3.3 (Optional) Running Python code

Sebastian Raschka

4.1 Intro to NumPy (L04: Scientific Computing in Python)

4.1 Intro to NumPy (L04: Scientific Computing in Python)

Sebastian Raschka

4.2 NumPy Array Construction and Indexing (L04: Scientific Computing in Python)

4.2 NumPy Array Construction and Indexing (L04: Scientific Computing in Python)

Sebastian Raschka

4.4 NumPy Broadcasting (L04: Scientific Computing in Python)

4.4 NumPy Broadcasting (L04: Scientific Computing in Python)

Sebastian Raschka

4.5 NumPy Advanced Indexing -- Memory Views and Copies (L04: Scientific Computing in Python)

4.5 NumPy Advanced Indexing -- Memory Views and Copies (L04: Scientific Computing in Python)

Sebastian Raschka

4.3 NumPy Array Math and Universal Functions (L04: Scientific Computing in Python)

4.3 NumPy Array Math and Universal Functions (L04: Scientific Computing in Python)

Sebastian Raschka

4.7 Reshaping NumPy Arrays (L04: Scientific Computing in Python)

4.7 Reshaping NumPy Arrays (L04: Scientific Computing in Python)

Sebastian Raschka

4.6 NumPy Random Number Generators (L04: Scientific Computing in Python)

4.6 NumPy Random Number Generators (L04: Scientific Computing in Python)

Sebastian Raschka

4.8 NumPy Comparison Operators and Masks (L04: Scientific Computing in Python)

4.8 NumPy Comparison Operators and Masks (L04: Scientific Computing in Python)

Sebastian Raschka

4.9 NumPy Linear Algebra Basics (L04: Scientific Computing in Python)

4.9 NumPy Linear Algebra Basics (L04: Scientific Computing in Python)

Sebastian Raschka

4.10 Matplotlib (L04: Scientific Computing in Python)

4.10 Matplotlib (L04: Scientific Computing in Python)

Sebastian Raschka

5.1 Reading a Dataset from a Tabular Text File (L05: Machine Learning with Scikit-Learn)

5.1 Reading a Dataset from a Tabular Text File (L05: Machine Learning with Scikit-Learn)

Sebastian Raschka

5.2 Basic data handling (L05: Machine Learning with Scikit-Learn)

5.2 Basic data handling (L05: Machine Learning with Scikit-Learn)

Sebastian Raschka

5.3 Object Oriented Programming & Python Classes (L05: Machine Learning with Scikit-Learn)

5.3 Object Oriented Programming & Python Classes (L05: Machine Learning with Scikit-Learn)

Sebastian Raschka

5.4 Intro to Scikit-learn (L05: Machine Learning with Scikit-Learn)

5.4 Intro to Scikit-learn (L05: Machine Learning with Scikit-Learn)

Sebastian Raschka

5.5 Scikit-learn Transformer API (L05: Machine Learning with Scikit-Learn)

5.5 Scikit-learn Transformer API (L05: Machine Learning with Scikit-Learn)

Sebastian Raschka

5.6 Scikit-learn Pipelines (L05: Machine Learning with Scikit-Learn)

5.6 Scikit-learn Pipelines (L05: Machine Learning with Scikit-Learn)

Sebastian Raschka

6.1 Intro to Decision Trees (L06: Decision Trees)

6.1 Intro to Decision Trees (L06: Decision Trees)

Sebastian Raschka

6.2 Recursive algorithms & Big-O (L06: Decision Trees)

6.2 Recursive algorithms & Big-O (L06: Decision Trees)

Sebastian Raschka

6.3 Types of decision trees (L06: Decision Trees)

6.3 Types of decision trees (L06: Decision Trees)

Sebastian Raschka

6.5 Gini & Entropy versus misclassification error (L06: Decision Trees)

6.5 Gini & Entropy versus misclassification error (L06: Decision Trees)

Sebastian Raschka

6.6 Improvements & dealing with overfitting (L06: Decision Trees)

6.6 Improvements & dealing with overfitting (L06: Decision Trees)

Sebastian Raschka

6.7 Code Example Implementing Decision Trees in Scikit-Learn (L06: Decision Trees)

6.7 Code Example Implementing Decision Trees in Scikit-Learn (L06: Decision Trees)

Sebastian Raschka

7.1 Intro to ensemble methods (L07: Ensemble Methods)

7.1 Intro to ensemble methods (L07: Ensemble Methods)

Sebastian Raschka

7.2 Majority Voting (L07: Ensemble Methods)

7.2 Majority Voting (L07: Ensemble Methods)

Sebastian Raschka

7.3 Bagging (L07: Ensemble Methods)

7.3 Bagging (L07: Ensemble Methods)

Sebastian Raschka

7.4 Boosting and AdaBoost (L07: Ensemble Methods)

7.4 Boosting and AdaBoost (L07: Ensemble Methods)

Sebastian Raschka

7.5 Gradient Boosting (L07: Ensemble Methods)

7.5 Gradient Boosting (L07: Ensemble Methods)

Sebastian Raschka

7.6 Random Forests (L07: Ensemble Methods)

7.6 Random Forests (L07: Ensemble Methods)

Sebastian Raschka

7.7 Stacking (L07: Ensemble Methods)

7.7 Stacking (L07: Ensemble Methods)

Sebastian Raschka

8.1 Intro to overfitting and underfitting (L08: Model Evaluation Part 1)

8.1 Intro to overfitting and underfitting (L08: Model Evaluation Part 1)

Sebastian Raschka

8.2 Intuition behind bias and variance (L08: Model Evaluation Part 1)

8.2 Intuition behind bias and variance (L08: Model Evaluation Part 1)

Sebastian Raschka

8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)

8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)

Sebastian Raschka

8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)

8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)

Sebastian Raschka

This video explains the connection between bias-variance decomposition and overfitting-underfitting, helping viewers understand how model capacity affects error and generalization, and how to identify and address overfitting and underfitting in machine learning models.

Key Takeaways

Understand the concept of model capacity and its relationship to error
Learn to identify overfitting and underfitting in models
Analyze the bias-variance decomposition of a model
Evaluate model performance using training and generalization error

💡 High variance is correlated with overfitting, while high bias is correlated with underfitting

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

How I Built a Retail Product Recommendation System That Could Generate £311K Annual Business Value

Learn how to build a retail product recommendation system that can generate significant annual business value using machine learning

Medium · Machine Learning

How I Built a Retail Product Recommendation System That Could Generate £311K Annual Business Value

Learn how to build a retail product recommendation system that can generate significant annual business value

Medium · Data Science

Normal Distribution — A Complete Guide for Beginners

Learn the basics of the normal distribution and its importance in statistics and data science

Normal Distribution — A Complete Guide for Beginners

Learn the basics of the normal distribution and its importance in machine learning and statistics

Medium · Machine Learning

Dropout in Deep Learning