8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)

Sebastian Raschka · Beginner ·📐 ML Fundamentals ·5y ago

Key Takeaways

This video discusses the relationship between bias-variance decomposition and overfitting-underfitting in machine learning, covering key concepts such as model capacity, training error, and generalization error, using tools like parametric models and decision trees.

Full Transcript

yes so in this video i intend to break a record in this course i will make this hopefully the shortest video so far so i don't want to torture you too long with this topic i only have two slides so it can't be that long so in this video we are going to take a look at the relationship between a bias variance decomposition so the bias and variance terms and the terms of under fitting and overfitting okay with this slide it's actually three slides but yeah we are now looking at how is the bias variance decomposition related to overfitting and under fitting so recall this little figure here i've shown you earlier in this uh lecture so yeah there are two reasons why this is such a crude drawing one reason is efficiency it's pretty tedious to draw something like that in um keynote or powerpoint um the second reason is that it's also indicating that this is just like an a sketch approximation of something this is not like based on real numbers and in practice you will never see a very smooth or perfect plot or relationship between these terms shown here so in practice it will also be a noisy process if you do this for a real world data set and yeah so what we see here is the error for example the squared error loss plotted against the capacity of the model and we talked about this before capacity is basically how well the model is capable of fitting a training set so capacity is basically something like complexity so the higher the capacity the higher the capability of the model to fit the data well usually in many contexts in the contexts of parametric models like regression models polynomial regression and so forth capacity also relates to the number of parameters or terms so here what we have is the training error and the larger the capacity of the model the lower the training error because yeah the more complex the model is the better it will be able to fit the training data for example think of short decision trees and very deep decision trees so short decision trees versus deep decision trees deep decision trees will be able to fit the data better so the training error will go down however having a good or low training error doesn't mean it will perform well on new data because it can happen that we fit the training data too closely and then yeah the error on new data for example as measured on the test set will actually increase and that is um measured by the generalization error for example the generalization error can be estimated from an independent test set so and the generalization error um first improves somewhat if the capacity becomes larger because if the model is too simple a short decision tree will neither be able to fit the training set well nor perform well on new data because it's just too simple of a model if we make the model capacity larger the model error will decrease also on new data however the larger the capacity becomes the larger the error also will become after some inflection point and the gap in general the gap between the training error and the generalization error that is considered as the degree of overfitting so it's by how much the model overfits and in this region is the overfitting is increasing the gap is increasing because the model fits the data too closely it fits noise in the data for example and then it won't be able to generalize well to new data okay so now how is that related to the concepts of bias and variants so here i added two new mod terms the variance and bias variance here in red so the larger the capacity of the model the more complex the model for example as i've shown you also in the graphs this could be a deep decision tree and this is again a short decision tree on this end it's exactly the same graph as on this previous slide except now that i have also the variance and bias shown here so if the capacity increases the variance will increase right so that is what we've seen for deep decision trees also when we did the bias variance decomposition and compare it for example to a backing classifier or a begging model where we averaged emerging will reduce the variance but here we are only talking about a single model and you will see that the higher the capacity of the model the more complex the model the higher the variance and this will then relate to the degree of overfitting so models with high variance are more prone to overfitting so here if we have this inflection point again overfitting increases when we go to the right and the degree of overfitting is again the gap between the training error here the black line and the generalization another green line so here this gap is the degree of overfitting and it increases as the variance increases usually and vice versa the higher the variance becomes the lower the bias will become so a more complex model will usually have a lower bias so yes it looks maybe a little bit misleading it looks like it goes down and then up again it's just a bad drawing it should just go down or converge basically asymptotically um so the bias will go down while the variance goes up if the capacity increases and vice versa the bias will be large if we have a low capacity and a large bias you can see is also then at the same end here as the high degree of under fitting so when we go here to the left when we make the model too simple it will under fit the data so it will not perform well on the training data but it also won't perform well on the on the test set so what happens is it will perform badly on both training and test set and yeah this has also some relationship to having a high bias so yeah high bias is which we correlated to under fitting high end of hitting and high variance is correlated to overfitting okay so that's it for the relationship between overfitting and underfitting to bias and variants yeah so in the next video we will also take a brief look at the bias variance decomposition of the zero oneness it's more like i would say an optional topic it's um closer to our classification context but it's somewhat more i would say um less intuitive as decomposing the squared error loss there are some it's more like a workaround but we will see more about that in the next video

Original Description

Sebastian's books: https://sebastianraschka.com/books/ This brief video discusses the connection between bias & variance and overfitting & underfitting. ------- This video is part of my Introduction of Machine Learning course. Next video: https://youtu.be/IvHZ4-yd5is The complete playlist: https://www.youtube.com/playlist?list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3 A handy overview page with links to the materials: https://sebastianraschka.com/blog/2021/ml-course.html ------- If you want to be notified about future videos, please consider subscribing to my channel: https://youtube.com/c/SebastianRaschka
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sebastian Raschka · Sebastian Raschka · 60 of 60

← Previous Next →
1 Intro to Deep Learning -- L06.5 Cloud Computing [Stat453, SS20]
Intro to Deep Learning -- L06.5 Cloud Computing [Stat453, SS20]
Sebastian Raschka
2 Intro to Deep Learning -- L09 Regularization [Stat453, SS20]
Intro to Deep Learning -- L09 Regularization [Stat453, SS20]
Sebastian Raschka
3 Intro to Deep Learning -- L10 Input and Weight Normalization Part 1/2 [Stat453, SS20]
Intro to Deep Learning -- L10 Input and Weight Normalization Part 1/2 [Stat453, SS20]
Sebastian Raschka
4 Intro to Deep Learning -- L10 Input and Weight Normalization Part 2/2 [Stat453, SS20]
Intro to Deep Learning -- L10 Input and Weight Normalization Part 2/2 [Stat453, SS20]
Sebastian Raschka
5 Intro to Deep Learning -- L11 Common Optimization Algorithms [Stat453, SS20]
Intro to Deep Learning -- L11 Common Optimization Algorithms [Stat453, SS20]
Sebastian Raschka
6 Intro to Deep Learning -- L12 Intro to Convolutional Neural Networks  (Part 1) [Stat453, SS20]
Intro to Deep Learning -- L12 Intro to Convolutional Neural Networks (Part 1) [Stat453, SS20]
Sebastian Raschka
7 Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 1/2 [Stat453, SS20]
Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 1/2 [Stat453, SS20]
Sebastian Raschka
8 Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 2/2 [Stat453, SS20]
Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 2/2 [Stat453, SS20]
Sebastian Raschka
9 Intro to Deep Learning -- L14 Intro to Recurrent Neural Networks [Stat453, SS20]
Intro to Deep Learning -- L14 Intro to Recurrent Neural Networks [Stat453, SS20]
Sebastian Raschka
10 Intro to Deep Learning -- L15 Autoencoders [Stat453, SS20]
Intro to Deep Learning -- L15 Autoencoders [Stat453, SS20]
Sebastian Raschka
11 Intro to Deep Learning -- L16 Generative Adversarial Networks [Stat453, SS20]
Intro to Deep Learning -- L16 Generative Adversarial Networks [Stat453, SS20]
Sebastian Raschka
12 Intro to Deep Learning -- Student Presentations, Day 1 [Stat453, SS20]
Intro to Deep Learning -- Student Presentations, Day 1 [Stat453, SS20]
Sebastian Raschka
13 1.2 What is Machine Learning (L01: What is Machine Learning)
1.2 What is Machine Learning (L01: What is Machine Learning)
Sebastian Raschka
14 1.3 Categories of Machine Learning (L01: What is Machine Learning)
1.3 Categories of Machine Learning (L01: What is Machine Learning)
Sebastian Raschka
15 1.4 Notation (L01: What is Machine Learning)
1.4 Notation (L01: What is Machine Learning)
Sebastian Raschka
16 1.1 Course overview (L01: What is Machine Learning)
1.1 Course overview (L01: What is Machine Learning)
Sebastian Raschka
17 1.5 ML application (L01: What is Machine Learning)
1.5 ML application (L01: What is Machine Learning)
Sebastian Raschka
18 1.6 ML motivation (L01: What is Machine Learning)
1.6 ML motivation (L01: What is Machine Learning)
Sebastian Raschka
19 2.1 Introduction to NN (L02: Nearest Neighbor Methods)
2.1 Introduction to NN (L02: Nearest Neighbor Methods)
Sebastian Raschka
20 2.2 Nearest neighbor decision boundary (L02: Nearest Neighbor Methods)
2.2 Nearest neighbor decision boundary (L02: Nearest Neighbor Methods)
Sebastian Raschka
21 2.3 K-nearest neighbors (L02: Nearest Neighbor Methods)
2.3 K-nearest neighbors (L02: Nearest Neighbor Methods)
Sebastian Raschka
22 2.4 Big O of K-nearest neighbors (L02: Nearest Neighbor Methods)
2.4 Big O of K-nearest neighbors (L02: Nearest Neighbor Methods)
Sebastian Raschka
23 2.5 Improving k-nearest neighbors (L02: Nearest Neighbor Methods)
2.5 Improving k-nearest neighbors (L02: Nearest Neighbor Methods)
Sebastian Raschka
24 2.6 K-nearest neighbors in Python (L02: Nearest Neighbor Methods)
2.6 K-nearest neighbors in Python (L02: Nearest Neighbor Methods)
Sebastian Raschka
25 3.1 (Optional) Python overview
3.1 (Optional) Python overview
Sebastian Raschka
26 3.2 (Optional) Python setup
3.2 (Optional) Python setup
Sebastian Raschka
27 3.3 (Optional) Running Python code
3.3 (Optional) Running Python code
Sebastian Raschka
28 4.1 Intro to NumPy (L04: Scientific Computing in Python)
4.1 Intro to NumPy (L04: Scientific Computing in Python)
Sebastian Raschka
29 4.2 NumPy Array Construction and Indexing (L04: Scientific Computing in Python)
4.2 NumPy Array Construction and Indexing (L04: Scientific Computing in Python)
Sebastian Raschka
30 4.4 NumPy Broadcasting (L04: Scientific Computing in Python)
4.4 NumPy Broadcasting (L04: Scientific Computing in Python)
Sebastian Raschka
31 4.5 NumPy Advanced Indexing -- Memory Views and Copies (L04: Scientific Computing in Python)
4.5 NumPy Advanced Indexing -- Memory Views and Copies (L04: Scientific Computing in Python)
Sebastian Raschka
32 4.3 NumPy Array Math and Universal Functions (L04: Scientific Computing in Python)
4.3 NumPy Array Math and Universal Functions (L04: Scientific Computing in Python)
Sebastian Raschka
33 4.7 Reshaping NumPy Arrays (L04: Scientific Computing in Python)
4.7 Reshaping NumPy Arrays (L04: Scientific Computing in Python)
Sebastian Raschka
34 4.6 NumPy Random Number Generators (L04: Scientific Computing in Python)
4.6 NumPy Random Number Generators (L04: Scientific Computing in Python)
Sebastian Raschka
35 4.8 NumPy Comparison Operators and Masks (L04: Scientific Computing in Python)
4.8 NumPy Comparison Operators and Masks (L04: Scientific Computing in Python)
Sebastian Raschka
36 4.9 NumPy Linear Algebra Basics (L04: Scientific Computing in Python)
4.9 NumPy Linear Algebra Basics (L04: Scientific Computing in Python)
Sebastian Raschka
37 4.10 Matplotlib (L04: Scientific Computing in Python)
4.10 Matplotlib (L04: Scientific Computing in Python)
Sebastian Raschka
38 5.1 Reading a Dataset from a Tabular Text File (L05: Machine Learning with Scikit-Learn)
5.1 Reading a Dataset from a Tabular Text File (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
39 5.2 Basic data handling (L05: Machine Learning with Scikit-Learn)
5.2 Basic data handling (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
40 5.3 Object Oriented Programming & Python Classes (L05: Machine Learning with Scikit-Learn)
5.3 Object Oriented Programming & Python Classes (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
41 5.4 Intro to Scikit-learn (L05: Machine Learning with Scikit-Learn)
5.4 Intro to Scikit-learn (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
42 5.5 Scikit-learn Transformer API (L05: Machine Learning with Scikit-Learn)
5.5 Scikit-learn Transformer API (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
43 5.6 Scikit-learn Pipelines (L05: Machine Learning with Scikit-Learn)
5.6 Scikit-learn Pipelines (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
44 6.1 Intro to Decision Trees (L06: Decision Trees)
6.1 Intro to Decision Trees (L06: Decision Trees)
Sebastian Raschka
45 6.2 Recursive algorithms & Big-O (L06: Decision Trees)
6.2 Recursive algorithms & Big-O (L06: Decision Trees)
Sebastian Raschka
46 6.3 Types of decision trees (L06: Decision Trees)
6.3 Types of decision trees (L06: Decision Trees)
Sebastian Raschka
47 6.5 Gini & Entropy versus misclassification error (L06: Decision Trees)
6.5 Gini & Entropy versus misclassification error (L06: Decision Trees)
Sebastian Raschka
48 6.6 Improvements & dealing with overfitting (L06: Decision Trees)
6.6 Improvements & dealing with overfitting (L06: Decision Trees)
Sebastian Raschka
49 6.7 Code Example Implementing Decision Trees in Scikit-Learn (L06: Decision Trees)
6.7 Code Example Implementing Decision Trees in Scikit-Learn (L06: Decision Trees)
Sebastian Raschka
50 7.1 Intro to ensemble methods (L07: Ensemble Methods)
7.1 Intro to ensemble methods (L07: Ensemble Methods)
Sebastian Raschka
51 7.2 Majority Voting (L07: Ensemble Methods)
7.2 Majority Voting (L07: Ensemble Methods)
Sebastian Raschka
52 7.3 Bagging (L07: Ensemble Methods)
7.3 Bagging (L07: Ensemble Methods)
Sebastian Raschka
53 7.4 Boosting and AdaBoost (L07: Ensemble Methods)
7.4 Boosting and AdaBoost (L07: Ensemble Methods)
Sebastian Raschka
54 7.5 Gradient Boosting (L07: Ensemble Methods)
7.5 Gradient Boosting (L07: Ensemble Methods)
Sebastian Raschka
55 7.6 Random Forests (L07: Ensemble Methods)
7.6 Random Forests (L07: Ensemble Methods)
Sebastian Raschka
56 7.7 Stacking (L07: Ensemble Methods)
7.7 Stacking (L07: Ensemble Methods)
Sebastian Raschka
57 8.1 Intro to overfitting and underfitting (L08: Model Evaluation Part 1)
8.1 Intro to overfitting and underfitting (L08: Model Evaluation Part 1)
Sebastian Raschka
58 8.2 Intuition behind bias and variance (L08: Model Evaluation Part 1)
8.2 Intuition behind bias and variance (L08: Model Evaluation Part 1)
Sebastian Raschka
59 8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)
8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)
Sebastian Raschka
8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)
8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)
Sebastian Raschka

This video explains the connection between bias-variance decomposition and overfitting-underfitting, helping viewers understand how model capacity affects error and generalization, and how to identify and address overfitting and underfitting in machine learning models.

Key Takeaways
  1. Understand the concept of model capacity and its relationship to error
  2. Learn to identify overfitting and underfitting in models
  3. Analyze the bias-variance decomposition of a model
  4. Evaluate model performance using training and generalization error
💡 High variance is correlated with overfitting, while high bias is correlated with underfitting

Related Reads

📰
I Built an AI System That Does in 30 Seconds What Takes a Human 10 Minutes; Here’s What Nobody…
Learn how to bridge the gap between AI demo and deployment, a common pitfall for many AI projects
Medium · Machine Learning
📰
What Is MLIR and Why Does It Exist?
Learn about MLIR, a intermediate representation for machine learning models, and its purpose in optimizing ML workflows
Dev.to · Fedor Nikolaev
📰
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Choosing the right machine learning development company is crucial for turning AI investments into measurable results, as it can make or break the success of AI projects
Medium · Machine Learning
📰
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Learn how federated learning, differential privacy, and synthetic data preserve data privacy in AI training, and why they matter for secure machine learning
Dev.to AI
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →