AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial
Skills:
Supervised Learning90%
Key Takeaways
This video tutorial demonstrates the implementation of the AdaBoost algorithm in Python from scratch, using only built-in Python modules and NumPy. It covers the basics of boosting, decision stumps, error calculation, and weight update rules, and provides a step-by-step guide to implementing the AdaBoost algorithm.
Full Transcript
hey guys welcome to another machine learning from scratch tutorial today we are going to implement the adaboost algorithm using only numpy and built-in Python modules adaboost uses the boosting approach which follows the simple idea to combine multiple weak classifier into one strong classifier and this approach works really well in practice so let's start with a theory before we jump to the code so let's have a look at this 2d example here to understand the concept so here we have our samples with only two different features on the x axis and on the y axis and now the first classifier makes a split based on the y axis in this example so it draws a horizontal decision line at some threshold so the dashed line that we can see here and we can see that some predictions are correct but we also have misclassifications and now with these misclassifications we can then calculate a performance measure so the accuracy for this classifier and with this measure we calculate and update weights for all training samples and now the second classifier comes in and it uses these weights and finds a different and possibly better decision boundary so the second classifier in this example here chooses a feature on the x axis and draws a vertical line and then again we calculate the performance and update the weights and then we repeat this step for as many classifiers as we want and then here at the very end we have all the different decision lines and we also have all the different classifier performances and then we combine all our classifiers so we can make a weighted sum with the calculated performances and this allows us to draw the perfect decision line that we can see here which can be more complex than a simple linear decision line and the idea with the way that some here at the end means that the better the classifier is the more impact it has for the final outcome so this is basically the concept and now let's look at all the different steps and also the math behind this in detail so the first thing that we need is a weak classifier and this is also called weak learner so a weak learner is always a very simple classifier and in the case of the adaboost we use a so called decision stamp for this so a decision truck stamp is basically a decision tree with only one split so what we can see here so we look at only one feature of our samples and only at one threshold and then based on if our feature value is greater or smaller than the threshold we say that it is class minus 1 or class plus 1 so this is the decision stamp and then we need the formula for the error so the first time the very first time during our iteration the error is calculated as the number of misclassifications divided by the total number of samples and this is the natural approach for the error so if you have a look at our example again then we can see that we have 10 samples in this case and in the first in the first classifier we have 3 misclassifications so this means that our error rate is 0.3 or 30% so this is the first time but the next time we also want to take into account the weights so if a sample was misclassify misclassified we give it a higher weight for the next iteration and this means that our formula is then calculated as the sum over the weights for all misclassifications and if our error is greater than 0.5 we simply flip the error so we flip all the sition all decisions and we also flip our error so it is then 1 minus the error so this is the error and now we need the weights so the weights are initially set to 1 over N for each sample and this also matches the error calculation in the first step so if we say we calculate the error as the sum over all misclassified weights and we also say that each weight is 1 over N in the beginning then it is equal to the number of misclassifications divided by the number of samples like here so yeah that's why the initial weights are 1 over N for each sample and then we also need the update rule that is defined here so we have the old wait times the exponential function of minus alpha times the actual y times H of X where H of X is our prediction and alpha is the accuracy of the classifier so if this is minus 1 we have a miss classification and if this is plus 1 here then we have a correct classification and this whole formula basically makes sure that classic misclassified samples have a higher impact for the next classifier so yeah this is what you should remember from the weights and now the performance so we need to calculate the performance or alpha for each classifier and we can do this and we need this for the final prediction then and the formula for the performance is calculated as this so it's point 5 times the log of 1 minus the error divided by the error so let me make this a little bit larger for you so this is the performance and our error is always between 0 and 1 so I plotted alpha for different arrows in this range here and we can see that it is equally distributed somewhere between a positive value here and a negative value here so with a low error we have a high positive value and with a high error here close to 1 we have a high negative value but since we are flipping the decision then this will then be correct classifications again with a high contribution to the negative side so the side here where the class is minus one so this is the concept of the Alpha and now we need the prediction so now if if we have understood all of this then the final prediction is very easy to understand so we just choose this sign here the sign of the sum over all predictions where we weigh each prediction with the performance of the classifier so alpha times the prediction here so the better our classifier the more impact it has for the final prediction and the better the classifier the more it points into the negative or positive side and then we take the better side as prediction for our class so yeah that's the concept of the prediction and it can be a bit confusing with a different formulas and the side flipping but the basic concept is not so difficult and let's summarize all the different training steps that we must do in the code so first of all we initialize initialize our weights for each sample and set the value to 1 over n then we choose the number of week learn as we want and then we iterate over this and then we train each decision stamp so we do a greedy search to find the best split feature and the best split threshold then we calculate the error for this decision stump so this is with the formula the sum over the misclassified weights then we also flip the error in the decision if it is greater than 0.5 then we calculate the alpha with the formula and then we need the predictions and then with the predictions and the alpha we can then calculate we can then update the weights so this is what we must do in the code now and yeah I promise you that since new now that we have all the formulas and all the training steps here the implementation is pretty straightforward and should not be so hard so let's jump to the coat so the first thing we do is import numpy so import numpy SNP and this is the only module that we're gonna need and now we create a class for the decision stamp so class decision stump and this gets an init so define an init and this only has self and here we want to store a couple of things so the first thing that we want to store is the so-called polarity so self dot polarity equals one and this tells us if the sample should be classified or as minus 1 or plus 1 for the given threshold so if we want to look at the right or the left side and this is needed because if we want to flip the arrow and we also must flip the polarity so this gets clearer in a second and now the second thing that we want to store here is the feature index so self dot feature index equals none in the beginning and we also want to store the threshold so the split threshold self dot threshold equals none in the beginning and we also want to store the a variable for the performance so the Alpha so we say self dot alpha equals none so this is the things that we want to store and then we also define a predict method for the decision stump so we say define predict and it gets self and it gets X so the sample set it should predict and now what we want to do here is simply look at only one feature of this sample and then compare it with the threshold and say if it's smaller than its minus one and otherwise it's plus one so that's the whole concept of the decision stump so let's do this so let's say the number of samples equals X dot shape 0 and then let's get only this feature so let's say X column equals x and then we can use a colon so we still want all the samples but only this feature index that we calculate later during the training so self dot feature index and now we make our prediction so we say predictions equals and by default we say this is 1 so let's say numpy once with the size of the number of samples and then we must check the polarities we say if self dot polarity equals equals 1 so this is the default case then we say that all the predictions that are smaller where the feature vector is smaller than the threshold then it's minus 1 so let's say predictions and then at these indexes where X column is smaller than self dot threshold then these predictions are minus 1 and in the other case else so if our polarity is minus 1 then we want to do it exactly the other way around so let me copy this but we want to say if the x value is greater than our threshold then these are the minus 1 predictions so yeah this is the all that our the decision stump is doing and then we can return the predictions so this is the class for the decision decision stump and now we need a class for the actual adaboost algorithm so let's say class adaboost and let's make this a small letter and now we need a in it first so define a in it and this gets cell and the only parameter it gets is the number of classifiers that we want so let's say m CLF equals 5 by default and then in the init we store these numbers so we say self dot n CLF equals the number of classifier and then as always we want to implement the fit and the predict method so let's start with the fit method so let's say define fit and it has self and it has X and y so the training samples and the labels and now the first thing we do is to get the shape of this vector so the number of samples and also the number of features features equals x dot shape and then we want to finish initialize our weight so in it the weights and as I set all the weights for each sample is set to one over N in the beginning so let's say W equals numpy and then we can use a method from numpy that is called full so numpy full and it gets the size number of samples and then it gets an initial value and here we say 1 over the number of samples so this sets each value to this calculated value and then this is our initialization and now let's iterate through all the classifiers and do the training so first we create a list where we want to store all the classifiers so let's say self dot CL FS and this is an empty list in the beginning and now let's do the iteration so let's say for underscore in range and here we have the number of classifiers that we specify so self dot n CLF and now what we want to do year is we want to do the greedy search so we want to iterate over all the features and all the thresholds so this is similar to the decision tree implementation that I did in another tutorial and I recommend that you check that out too so we want to do a similar thing here so first we create our classifier so let's say CLF equals decision stump and now let's define a min error in the beginning so we want to find the best feature value the split feature and the split threshold where this error then is minimum so in the beginning and we just say this is float in so this is a very high number and now let's iterate over all the features so let's say for feature for feature I in the range off and here we have the number of features that we got in the beginning and then we want to get only this feature so let's say X column so this is similar to what we did here so we can do the same thing and say X column equals this all the samples but only this feature index so I called it feature I in this case and then we get one to get only the unique values and these are our thresholds so let's say thresholds equals numpy dot unique and here the unique values of our column so X column and now we iterate over all the thresholds so let's say for threshold in thresholds and now what we want to do is we want to predict with the polarity one first and then calculate the error with the formulas that I showed you in the beginning so let's say our polarity equals one and then let's do the prediction so predictions equals and this is similar to what we did here so in the beginning just it's just 1 and then we use this formula here so since our polarity is 1 we have to compare it by saying if it's smaller than our threshold so predictions where our column value our feature value is smaller than our threshold then there our predictions are minus 1 so now we predicted all the samples and now we want to calculate the error and as I said the error is the sum over the weights of the misclassified samples so let's get the misclassified weight so let's say miss classified equals W and the W where our Y our training labels is not equal to the predictions that we just did so these are the misclassified weights and now we want to simply calculate this sum over these weights so error error equals the sum over this misclassified weight so this is the error and now we also want to flip our error if it is greater than 0.5 so we say if error is greater than 0.5 we simply say that our new error equals 1 minus the error and then we also flip the polarities we say P equals minus 1 so now we have our error and now we check if our error is smaller than the min error so let's say if error is smaller than the min error then this is our new min error so we say min error equals error and now this is the best car fit for our decision stumps and we want to store this so we say CLF dot polarity equals p and sorry only p and we also want to store the threshold and the feature so CL f dot threshold equals the current threshold and CL f dot feature index equals feature i and yeah so this is the whole training loop for a classifier and now when we are done with both for loops what we want to do here is have to check if I'm on the right indent so now what we have to do is to calculate the performance so calculate alpha so we say and CLF dot alpha equals and then we need this formula here so 0.5 times the log of 1 minus 2 error divided by the error and we also use a little epsilon so that we don't divide by 0 so let's say Apps equals this small value and this is our epsilon and now let's use the formula so 0.5 times and numpy dot the lock and here we have 1 minus the error and then divided by here let's say error plus our epsilon and let's wrap this in another parenthesis so this one and let me check if this is correct so let's do another one around this one here and then this should be fine so this is our alpha and now we want to update the weights and for this we also need a prediction so let's check the formula formula again so this is the wait times the exponential function of minus the alpha that we just calculated times the actual predictions or the actual labels times the predictions and then we normalize it so this is the formula that we need so let's write this here and let's first get the prediction so we can say predictions equals and we already implemented this so we can simply say CL f dot predicts X X yeah so we get the column up here so we can put the whole X here and now we have the predictions and now we can use them and update the wait so we say our weights is multiplied equals and then we say numpy X so the exponential function and then minus your f dot alpha times and here the actual labels and then times the predictions so times predictions so and then we want to normalize it so do we divide it by the sum over this weight so we say W divided equally and then here we say numpy dot some W and now we are done so we updated our weights and then we want to store this classifier so we want to save it so we say self dot CL f dot append the current classifier so we append CL f and now we are done with the fit method so this is the whole training of our adaboost classifier and now what we also need is of course we want to have the predict method so let's implement this down here so let's say define predict and here it gets self and it also gets and now this is the formula that I showed you here and so we look at the sign of the sum and here we multiply each alpha with the printer with the prediction yeah so let's do this so let's say CL f dot Prats equals and here we use list comprehension and then we do this for each of the classifiers so we say CL f dot alpha times and here we use the prediction CL f dot predict and here we want to predict X and we want to do this for each of the start classifiers so we say for CLF in self dot CLS so these are all the predictions in the sum and now we need to calculate the sum so we say y press equals and then numpy dot some and here we say CLF Pretz and along the axis 0 so now we have the sum and now the very last thing that we need to do is to look at the sign so we say Y press equals numpy dot sign sign of this Y Pratt and this is our final prediction and then we can return this so let's return Y Perret and now we should be done so now we have the fit method and the predict method and now here I've already written a little test script so here I import this class that we just created so from adaboost import adaboost then I also have a accuracy measure here and then in this example we load the breast cancer data set from the sq learn data sets and then the important thing that we must do here is to set all the labels that are 0 at the moment to minus 1 because adaboost needs the label as minus one and plus one and then we do a train test split as always and then here we create a adaboost classifier and in this case I put in five classifier then we call the fit method and then we call the predict method and then we calculate the accuracy so this is the test script so let's run this and hope that everything's working so let's say Python adaboost test it's called and hit enter and it's running and it's calculating and I hope that it's working and now we're done so yeah so here we have a accuracy and it's pretty good in this example so we have 0.94 so we see that it's working and yeah I hope you enjoyed this tutorial and see you next time bye
Original Description
Get my Free NumPy Handbook:
https://www.python-engineer.com/numpybook
In this Machine Learning from Scratch Tutorial, we are going to implement the AdaBoost algorithm using only built-in Python modules and numpy. AdaBoost is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers. We will first learn about the concept and the math behind this popular ML algorithm, and then we jump to the code.
~~~~~~~~~~~~~~ GREAT PLUGINS FOR YOUR CODE EDITOR ~~~~~~~~~~~~~~
✅ Write cleaner code with Sourcery: https://sourcery.ai/?utm_source=youtube&utm_campaign=pythonengineer *
📓 Notebooks available on Patreon:
https://www.patreon.com/patrickloeber
⭐ Join Our Discord : https://discord.gg/FHMg9tKFSN
If you enjoyed this video, please subscribe to the channel!
The code can be found here:
https://github.com/patrickloeber/MLfromscratch
Further readings:
https://towardsdatascience.com/understanding-adaboost-2f94f22d5bfe
https://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/
You can find me here:
Website: https://www.python-engineer.com
Twitter: https://twitter.com/patloeber
GitHub: https://github.com/patrickloeber
#Python #MachineLearning
----------------------------------------------------------------------------------------------------------
* This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Patrick Loeber · Patrick Loeber · 59 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
▶
60
Lists in Python - Advanced Python 01 - Programming Tutorial
Patrick Loeber
Tuples in Python - Advanced Python 02 - Programming Tutorial
Patrick Loeber
Dictionaries in Python - Advanced Python 03 - Programming Tutorial
Patrick Loeber
Sets in Python - Advanced Python 04 - Programming Tutorial
Patrick Loeber
Strings in Python - Advanced Python 05 - Programming Tutorial
Patrick Loeber
Collections in Python - Advanced Python 06 - Programming Tutorial
Patrick Loeber
Itertools in Python - Advanced Python 07 - Programming Tutorial
Patrick Loeber
Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce
Patrick Loeber
Exceptions in Python - Advanced Python 09 - Programming Tutorial
Patrick Loeber
Logging in Python - Advanced Python 10 - Programming Tutorial
Patrick Loeber
JSON in Python - Advanced Python 11 - Programming Tutorial
Patrick Loeber
Random Numbers in Python - Advanced Python 12 - Programming Tutorial
Patrick Loeber
Decorators in Python - Advanced Python 13 - Programming Tutorial
Patrick Loeber
Generators in Python - Advanced Python 14 - Programming Tutorial
Patrick Loeber
Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial
Patrick Loeber
Threading in Python - Advanced Python 16 - Programming Tutorial
Patrick Loeber
Multiprocessing in Python - Advanced Python 17 - Programming Tutorial
Patrick Loeber
Function arguments in detail - Advanced Python 18 - Programming Tutorial
Patrick Loeber
The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial
Patrick Loeber
Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial
Patrick Loeber
Context Managers in Python - Advanced Python 21 - Programming Tutorial
Patrick Loeber
KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial
Patrick Loeber
Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial
Patrick Loeber
Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial
Patrick Loeber
Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04
Patrick Loeber
Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial
Patrick Loeber
Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial
Patrick Loeber
SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial
Patrick Loeber
Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial
Patrick Loeber
PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial
Patrick Loeber
K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial
Patrick Loeber
Anaconda Tutorial - Installation and Basic Commands
Patrick Loeber
PyTorch Tutorial 01 - Installation
Patrick Loeber
PyTorch Tutorial 02 - Tensor Basics
Patrick Loeber
PyTorch Tutorial 03 - Gradient Calculation With Autograd
Patrick Loeber
PyTorch Tutorial 04 - Backpropagation - Theory With Example
Patrick Loeber
PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation
Patrick Loeber
PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer
Patrick Loeber
PyTorch Tutorial 07 - Linear Regression
Patrick Loeber
PyTorch Tutorial 08 - Logistic Regression
Patrick Loeber
PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training
Patrick Loeber
PyTorch Tutorial 10 - Dataset Transforms
Patrick Loeber
Download Images With Python Automatically - Python Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 11 - Softmax and Cross Entropy
Patrick Loeber
Select Movies with Python - Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 12 - Activation Functions
Patrick Loeber
List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial
Patrick Loeber
PyTorch Tutorial 13 - Feed-Forward Neural Network
Patrick Loeber
How To Add A Progress Bar In Python With Just One Line - Python Tutorial
Patrick Loeber
PyTorch Tutorial 14 - Convolutional Neural Network (CNN)
Patrick Loeber
The Walrus Operator - New in Python 3.8 - Python Tutorial
Patrick Loeber
PyTorch Tutorial 15 - Transfer Learning
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1
Patrick Loeber
YouTube Data API Tutorial with Python - Find Channel Videos - Part 2
Patrick Loeber
YouTube Data API Tutorial with Python - Get Video Statistics - Part 3
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze the Data - Part 4
Patrick Loeber
AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial
Patrick Loeber
Ultimate FREE Study Guide for Machine Learning and Deep Learning
Patrick Loeber
More on: Supervised Learning
View skill →
🎓
Tutor Explanation
DeepCamp AI