8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)
Key Takeaways
This video discusses the relationship between bias-variance decomposition and overfitting-underfitting in machine learning, covering key concepts such as model capacity, training error, and generalization error, using tools like parametric models and decision trees.
Full Transcript
yes so in this video i intend to break a record in this course i will make this hopefully the shortest video so far so i don't want to torture you too long with this topic i only have two slides so it can't be that long so in this video we are going to take a look at the relationship between a bias variance decomposition so the bias and variance terms and the terms of under fitting and overfitting okay with this slide it's actually three slides but yeah we are now looking at how is the bias variance decomposition related to overfitting and under fitting so recall this little figure here i've shown you earlier in this uh lecture so yeah there are two reasons why this is such a crude drawing one reason is efficiency it's pretty tedious to draw something like that in um keynote or powerpoint um the second reason is that it's also indicating that this is just like an a sketch approximation of something this is not like based on real numbers and in practice you will never see a very smooth or perfect plot or relationship between these terms shown here so in practice it will also be a noisy process if you do this for a real world data set and yeah so what we see here is the error for example the squared error loss plotted against the capacity of the model and we talked about this before capacity is basically how well the model is capable of fitting a training set so capacity is basically something like complexity so the higher the capacity the higher the capability of the model to fit the data well usually in many contexts in the contexts of parametric models like regression models polynomial regression and so forth capacity also relates to the number of parameters or terms so here what we have is the training error and the larger the capacity of the model the lower the training error because yeah the more complex the model is the better it will be able to fit the training data for example think of short decision trees and very deep decision trees so short decision trees versus deep decision trees deep decision trees will be able to fit the data better so the training error will go down however having a good or low training error doesn't mean it will perform well on new data because it can happen that we fit the training data too closely and then yeah the error on new data for example as measured on the test set will actually increase and that is um measured by the generalization error for example the generalization error can be estimated from an independent test set so and the generalization error um first improves somewhat if the capacity becomes larger because if the model is too simple a short decision tree will neither be able to fit the training set well nor perform well on new data because it's just too simple of a model if we make the model capacity larger the model error will decrease also on new data however the larger the capacity becomes the larger the error also will become after some inflection point and the gap in general the gap between the training error and the generalization error that is considered as the degree of overfitting so it's by how much the model overfits and in this region is the overfitting is increasing the gap is increasing because the model fits the data too closely it fits noise in the data for example and then it won't be able to generalize well to new data okay so now how is that related to the concepts of bias and variants so here i added two new mod terms the variance and bias variance here in red so the larger the capacity of the model the more complex the model for example as i've shown you also in the graphs this could be a deep decision tree and this is again a short decision tree on this end it's exactly the same graph as on this previous slide except now that i have also the variance and bias shown here so if the capacity increases the variance will increase right so that is what we've seen for deep decision trees also when we did the bias variance decomposition and compare it for example to a backing classifier or a begging model where we averaged emerging will reduce the variance but here we are only talking about a single model and you will see that the higher the capacity of the model the more complex the model the higher the variance and this will then relate to the degree of overfitting so models with high variance are more prone to overfitting so here if we have this inflection point again overfitting increases when we go to the right and the degree of overfitting is again the gap between the training error here the black line and the generalization another green line so here this gap is the degree of overfitting and it increases as the variance increases usually and vice versa the higher the variance becomes the lower the bias will become so a more complex model will usually have a lower bias so yes it looks maybe a little bit misleading it looks like it goes down and then up again it's just a bad drawing it should just go down or converge basically asymptotically um so the bias will go down while the variance goes up if the capacity increases and vice versa the bias will be large if we have a low capacity and a large bias you can see is also then at the same end here as the high degree of under fitting so when we go here to the left when we make the model too simple it will under fit the data so it will not perform well on the training data but it also won't perform well on the on the test set so what happens is it will perform badly on both training and test set and yeah this has also some relationship to having a high bias so yeah high bias is which we correlated to under fitting high end of hitting and high variance is correlated to overfitting okay so that's it for the relationship between overfitting and underfitting to bias and variants yeah so in the next video we will also take a brief look at the bias variance decomposition of the zero oneness it's more like i would say an optional topic it's um closer to our classification context but it's somewhat more i would say um less intuitive as decomposing the squared error loss there are some it's more like a workaround but we will see more about that in the next video
Original Description
Sebastian's books: https://sebastianraschka.com/books/
This brief video discusses the connection between bias & variance and overfitting & underfitting.
-------
This video is part of my Introduction of Machine Learning course.
Next video: https://youtu.be/IvHZ4-yd5is
The complete playlist: https://www.youtube.com/playlist?list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3
A handy overview page with links to the materials: https://sebastianraschka.com/blog/2021/ml-course.html
-------
If you want to be notified about future videos, please consider subscribing to my channel: https://youtube.com/c/SebastianRaschka
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Sebastian Raschka · Sebastian Raschka · 60 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
▶
Intro to Deep Learning -- L06.5 Cloud Computing [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L09 Regularization [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L10 Input and Weight Normalization Part 1/2 [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L10 Input and Weight Normalization Part 2/2 [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L11 Common Optimization Algorithms [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L12 Intro to Convolutional Neural Networks (Part 1) [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 1/2 [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L13 Intro to Convolutional Neural Networks (Part 2) 2/2 [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L14 Intro to Recurrent Neural Networks [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L15 Autoencoders [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- L16 Generative Adversarial Networks [Stat453, SS20]
Sebastian Raschka
Intro to Deep Learning -- Student Presentations, Day 1 [Stat453, SS20]
Sebastian Raschka
1.2 What is Machine Learning (L01: What is Machine Learning)
Sebastian Raschka
1.3 Categories of Machine Learning (L01: What is Machine Learning)
Sebastian Raschka
1.4 Notation (L01: What is Machine Learning)
Sebastian Raschka
1.1 Course overview (L01: What is Machine Learning)
Sebastian Raschka
1.5 ML application (L01: What is Machine Learning)
Sebastian Raschka
1.6 ML motivation (L01: What is Machine Learning)
Sebastian Raschka
2.1 Introduction to NN (L02: Nearest Neighbor Methods)
Sebastian Raschka
2.2 Nearest neighbor decision boundary (L02: Nearest Neighbor Methods)
Sebastian Raschka
2.3 K-nearest neighbors (L02: Nearest Neighbor Methods)
Sebastian Raschka
2.4 Big O of K-nearest neighbors (L02: Nearest Neighbor Methods)
Sebastian Raschka
2.5 Improving k-nearest neighbors (L02: Nearest Neighbor Methods)
Sebastian Raschka
2.6 K-nearest neighbors in Python (L02: Nearest Neighbor Methods)
Sebastian Raschka
3.1 (Optional) Python overview
Sebastian Raschka
3.2 (Optional) Python setup
Sebastian Raschka
3.3 (Optional) Running Python code
Sebastian Raschka
4.1 Intro to NumPy (L04: Scientific Computing in Python)
Sebastian Raschka
4.2 NumPy Array Construction and Indexing (L04: Scientific Computing in Python)
Sebastian Raschka
4.4 NumPy Broadcasting (L04: Scientific Computing in Python)
Sebastian Raschka
4.5 NumPy Advanced Indexing -- Memory Views and Copies (L04: Scientific Computing in Python)
Sebastian Raschka
4.3 NumPy Array Math and Universal Functions (L04: Scientific Computing in Python)
Sebastian Raschka
4.7 Reshaping NumPy Arrays (L04: Scientific Computing in Python)
Sebastian Raschka
4.6 NumPy Random Number Generators (L04: Scientific Computing in Python)
Sebastian Raschka
4.8 NumPy Comparison Operators and Masks (L04: Scientific Computing in Python)
Sebastian Raschka
4.9 NumPy Linear Algebra Basics (L04: Scientific Computing in Python)
Sebastian Raschka
4.10 Matplotlib (L04: Scientific Computing in Python)
Sebastian Raschka
5.1 Reading a Dataset from a Tabular Text File (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
5.2 Basic data handling (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
5.3 Object Oriented Programming & Python Classes (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
5.4 Intro to Scikit-learn (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
5.5 Scikit-learn Transformer API (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
5.6 Scikit-learn Pipelines (L05: Machine Learning with Scikit-Learn)
Sebastian Raschka
6.1 Intro to Decision Trees (L06: Decision Trees)
Sebastian Raschka
6.2 Recursive algorithms & Big-O (L06: Decision Trees)
Sebastian Raschka
6.3 Types of decision trees (L06: Decision Trees)
Sebastian Raschka
6.5 Gini & Entropy versus misclassification error (L06: Decision Trees)
Sebastian Raschka
6.6 Improvements & dealing with overfitting (L06: Decision Trees)
Sebastian Raschka
6.7 Code Example Implementing Decision Trees in Scikit-Learn (L06: Decision Trees)
Sebastian Raschka
7.1 Intro to ensemble methods (L07: Ensemble Methods)
Sebastian Raschka
7.2 Majority Voting (L07: Ensemble Methods)
Sebastian Raschka
7.3 Bagging (L07: Ensemble Methods)
Sebastian Raschka
7.4 Boosting and AdaBoost (L07: Ensemble Methods)
Sebastian Raschka
7.5 Gradient Boosting (L07: Ensemble Methods)
Sebastian Raschka
7.6 Random Forests (L07: Ensemble Methods)
Sebastian Raschka
7.7 Stacking (L07: Ensemble Methods)
Sebastian Raschka
8.1 Intro to overfitting and underfitting (L08: Model Evaluation Part 1)
Sebastian Raschka
8.2 Intuition behind bias and variance (L08: Model Evaluation Part 1)
Sebastian Raschka
8.3 Bias-Variance Decomposition of the Squared Error (L08: Model Evaluation Part 1)
Sebastian Raschka
8.4 Bias and Variance vs Overfitting and Underfitting (L08: Model Evaluation Part 1)
Sebastian Raschka
More on: ML Maths Basics
View skill →Related Reads
📰
📰
📰
📰
I Built an AI System That Does in 30 Seconds What Takes a Human 10 Minutes; Here’s What Nobody…
Medium · Machine Learning
What Is MLIR and Why Does It Exist?
Dev.to · Fedor Nikolaev
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Medium · Machine Learning
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI