SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial
Key Takeaways
This video tutorial demonstrates the implementation of a Support Vector Machine (SVM) algorithm using only built-in Python modules and numpy, covering the concept, math, and code behind this popular machine learning technique. It uses a linear model to find a hyperplane that best separates data and maximizes the margin between classes by minimizing the magnitude of W, applying gradient descent to find W and B.
Full Transcript
hi everybody welcome to a new machine learning from scratch tutorial today we are going to implement the SVM algorithm using only build and Python modules and numpy the SVM or support vector machine is a very popular algorithm it follows the idea to use a linear model and to find a linear decision boundary also called a hyperplane that best separates our data and here the choice as the best hyperplane is the one that represents the largest separation or the largest march in between the two classes so we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized so if you have a look at this image then we want to find a hyperplane and the hyperplane has to satisfy this equation W times X minus B equals 0 and we want to find the hyperplane so that the distance to both the set to both classes is maximized so we use the class plus 1 here and minus 1 here so this distance or the margin should be maximized and first let's have a look at the math behind it so it's a little bit more complex than in my previous tutorials but I promise that once you have understood it the final implementation is fairly simple so we used the linear model W times X minus B that should be 0 and then our our function should also satisfy the condition that W times X minus B should be greater or equal than 1 for our class plus 1 so all the samples here must lie on the left side of this equation or this line here and all the samples of the class - one must lie on the right side from this equation so if we put this mathematically then we should it must satisfy W times X minus B should be greater or equal than one for class one or it should be less or equal than minus one for class minus one so if you put this in only one equation then we multiply our linear function with the class label and this should be greater or equal than one so this is the condition that we want to satisfy and now we want to come up with the W and the B so our weights and the bias and for this we use the cost function and then apply gradient descent so if you're not familiar with gradient decent already then please watch one of my previous tutorials for example the one with linear regression there I explained this a little bit more in detail so now let's continue so we use the user cost function here and in this case we use the hinge loss and this is defined as the maximum of zero and one - and here we have our condition Y I times our linear model so what this means is if if we plot the hinge loss then here the blue line is the hinge loss so this is either 0 if Y times F is greater or equal than 1 so if they have the the same sign then it's 0 and so if they yeah if they are correctly classified and are larger than 1 then our loss is zero so this means if we have a look at this image again if for the green class if it's if it lies on this side then it's 0 and for the blue class if it lies on this side then it's also 0 and otherwise and then we have a linear function so the further we are away from our decision boundary line the higher is our loss and so this is one part of our cost function and the other part is as I already said we want to maximize the margin here so between these two classes and the margin is defined is 2 over the magnitude of W so this is dependent from our weight dependent on our weight vector so we want to maximize this and therefore we want to minimize the magnitude so we put this or add this to our cost function so we also put this term the magnitude of W to the power of 2 times a lambda parameter and then here we have our hinge loss so the lambda parameter tries to find a trade-off between these two terms so with it says basically says which is more important so we want to of course we want to have the right classification we want to lie on the correct side of our lines but we also want to have the the line such that the margin is is maximized so yeah so if you look at the two cases if our if we are on the right side of the lines of why I times F on X f of X is greater or equal than one then we simply we only have this term because this is the hinge loss is 0 and otherwise then our cost function is this year and now we want to minimize that so we want to get the derivatives or the gradients of our cost function so in the first case if we are greater or equal than 1 our derivative is only is 2 times lambda times W so and here we only look at one component of our W so we get rid of the magnitude and the derivative with respect to the B is 0 so please double check that for yourself here I will not explain the derivatives in details and in the other case so if if Y I times F on X is not greater or equal than 1 then our derivative with respect to the W is this equation here and the derivative with respect to our bias is only Y I so again please double check that for yourself and then when we have our gradients we can use the update rule so the new weight is the old weight - because we use gradient descent so we go into negative direction - the learning rate or the step size times the derivative so these are our update rules and now I hope you've understood the concept and the math behind this and now we can start implementing it so this is now straightforward first of all we import numpy S&P of course and then we create our class as we M which will get an init method and here I will put in a learning rate which will get a default value of point zero zero one and it will get a lambda parameter which will also get a default and I will say this is point zero one so this is usually also a small value and then it will get the number of iterations for our optimization which will get the default of one thousand so then I will simply store them so I will say self dot L R equals learning rate self dot lambda param equals lambda param so note that I cannot use lambda here because lambda is a key word in Python for the lambda function so ya then self dot and ITER's equals and ITER's then I will say self dot W equals nun and self dot B equals nun so I have to come up with them later and then we define our two functions so as always one is the predict function where we fit the training samples and the training labels and the sorry this is the fit method and the other one is the predict method where we predict the labels of the test samples and now let's start with the predict method because this is very short so we want to as I said if we look at the math we apply this linear model and then we look at the sign of this so if it's positive then we say it's class one and if it's negative then we say it's class minus one so we say linear output equals numpy dot dot so the dot product of X and self dot W minus self dot B and then we choose the sign so we can simply say return numpy dot sine of this linear output so this is the whole predict implementation and now let's continue with the fit method so first of all as I said we used the classes plus 1 and minus 1 here so we want to make sure that our Y has only minus 1 and plus 1 so oftentimes it has 0 and 1 so let's convert this so let's say Y underscore equals and here we can use numpy dot where this will get a condition so we say y and if this is less or equal than 0 then we put in minus 1 and otherwise we put in plus 1 so this will convert all the zeros or smaller numbers to minus 1 and the other numbers 2 plus 1 and now let's get the number of samples and the number of features and this is simply X dot shape because our input vector X is in numpy and D array where the number of rows is the number of samples and the number of columns is the none features then we want to initialize our W and our B and we simply put in zeros in the beginning so we say self dot W equals numpy zeros of size and features so for each feature component we put in a zero for our weight component and then we say self dot B equals zero and now we can start with our gradient descent so we say for underscore because we don't need this in range self dot and it error so the number of iterations we want to do this and then we iterate over our training samples so I say for index and X I in enumerate X so this will give me the current index and also the current sample and now what I want to do now is let's have a look at the math again so I want to come I want to calculate the weight or the derivative of our cost function with respect to the W and with respect to the bias and here I first but at first I look if this condition is satisfied so I will say and the condition is why I times our linear function so I say condition equals y underscore of the current index times and then the linear function so numpy dot of the current sample and our self W minus self dot be this should be greater or equal than one so if this is satisfied and the condition is true and otherwise it's false so now I say if condition so if this is true then our derivatives look like this so the derivative with respect to the B is just zero and so we only need this so I say so it's two times lambda times W and then in our update we go in as a we say the new weight is the old way - the learning rate times this so I write this in one step so I say self dot W - equal self dot learning rate times and now here we have two times self dot lambda parameter times self dot W so this is the first update or if our condition is satisfied and we only need this update and otherwise we say self dot W - equals self times L our learning rate times and let's again have a look at the equation so it's 2 times lambda times W minus y I times X I so 2 times our lambda times the W - numpy dot so I want to multiply our vectors X I and y i so the y underscore of the current index so this is our update for the W and our self dot B is minus equal self times learning rate times the derivative and the derivative is only or just Y I so only Y underscore of the index and now we're done so this is the whole implementation and now let's test this so I've written a little test script that will import this SVM class and then it will generate a some test samples so it will generate two classes and then I will create my SVM classifier and fit the data and then I wrote a little function to visualize this so you can find the code on github by the way so please check that out for yourself and now if we run this so let's say Python as we am underscore test of time and now this should calculate the weights and the bias and it should also plop the decision function so that yellow line and the two lines on both sides here and we see that it's working so yeah that's all about the SVM I hope you enjoyed this and if you liked this please subscribe to my channel and see you next time bye
Original Description
Get my Free NumPy Handbook:
https://www.python-engineer.com/numpybook
In this Machine Learning from Scratch Tutorial, we are going to implement a SVM (Support Vector Machine) algorithm using only built-in Python modules and numpy. We will also learn about the concept and the math behind this popular ML algorithm.
~~~~~~~~~~~~~~ GREAT PLUGINS FOR YOUR CODE EDITOR ~~~~~~~~~~~~~~
✅ Write cleaner code with Sourcery: https://sourcery.ai/?utm_source=youtube&utm_campaign=pythonengineer *
📓 Notebooks available on Patreon:
https://www.patreon.com/patrickloeber
⭐ Join Our Discord : https://discord.gg/FHMg9tKFSN
If you enjoyed this video, please subscribe to the channel!
The code can be found here:
https://github.com/patrickloeber/MLfromscratch
Further readings:
https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
You can find me here:
Website: https://www.python-engineer.com
Twitter: https://twitter.com/patloeber
GitHub: https://github.com/patrickloeber
#Python #MachineLearning
----------------------------------------------------------------------------------------------------------
* This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Patrick Loeber · Patrick Loeber · 28 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
▶
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Lists in Python - Advanced Python 01 - Programming Tutorial
Patrick Loeber
Tuples in Python - Advanced Python 02 - Programming Tutorial
Patrick Loeber
Dictionaries in Python - Advanced Python 03 - Programming Tutorial
Patrick Loeber
Sets in Python - Advanced Python 04 - Programming Tutorial
Patrick Loeber
Strings in Python - Advanced Python 05 - Programming Tutorial
Patrick Loeber
Collections in Python - Advanced Python 06 - Programming Tutorial
Patrick Loeber
Itertools in Python - Advanced Python 07 - Programming Tutorial
Patrick Loeber
Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce
Patrick Loeber
Exceptions in Python - Advanced Python 09 - Programming Tutorial
Patrick Loeber
Logging in Python - Advanced Python 10 - Programming Tutorial
Patrick Loeber
JSON in Python - Advanced Python 11 - Programming Tutorial
Patrick Loeber
Random Numbers in Python - Advanced Python 12 - Programming Tutorial
Patrick Loeber
Decorators in Python - Advanced Python 13 - Programming Tutorial
Patrick Loeber
Generators in Python - Advanced Python 14 - Programming Tutorial
Patrick Loeber
Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial
Patrick Loeber
Threading in Python - Advanced Python 16 - Programming Tutorial
Patrick Loeber
Multiprocessing in Python - Advanced Python 17 - Programming Tutorial
Patrick Loeber
Function arguments in detail - Advanced Python 18 - Programming Tutorial
Patrick Loeber
The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial
Patrick Loeber
Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial
Patrick Loeber
Context Managers in Python - Advanced Python 21 - Programming Tutorial
Patrick Loeber
KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial
Patrick Loeber
Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial
Patrick Loeber
Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial
Patrick Loeber
Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04
Patrick Loeber
Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial
Patrick Loeber
Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial
Patrick Loeber
SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial
Patrick Loeber
Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial
Patrick Loeber
PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial
Patrick Loeber
K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial
Patrick Loeber
Anaconda Tutorial - Installation and Basic Commands
Patrick Loeber
PyTorch Tutorial 01 - Installation
Patrick Loeber
PyTorch Tutorial 02 - Tensor Basics
Patrick Loeber
PyTorch Tutorial 03 - Gradient Calculation With Autograd
Patrick Loeber
PyTorch Tutorial 04 - Backpropagation - Theory With Example
Patrick Loeber
PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation
Patrick Loeber
PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer
Patrick Loeber
PyTorch Tutorial 07 - Linear Regression
Patrick Loeber
PyTorch Tutorial 08 - Logistic Regression
Patrick Loeber
PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training
Patrick Loeber
PyTorch Tutorial 10 - Dataset Transforms
Patrick Loeber
Download Images With Python Automatically - Python Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 11 - Softmax and Cross Entropy
Patrick Loeber
Select Movies with Python - Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 12 - Activation Functions
Patrick Loeber
List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial
Patrick Loeber
PyTorch Tutorial 13 - Feed-Forward Neural Network
Patrick Loeber
How To Add A Progress Bar In Python With Just One Line - Python Tutorial
Patrick Loeber
PyTorch Tutorial 14 - Convolutional Neural Network (CNN)
Patrick Loeber
The Walrus Operator - New in Python 3.8 - Python Tutorial
Patrick Loeber
PyTorch Tutorial 15 - Transfer Learning
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1
Patrick Loeber
YouTube Data API Tutorial with Python - Find Channel Videos - Part 2
Patrick Loeber
YouTube Data API Tutorial with Python - Get Video Statistics - Part 3
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze the Data - Part 4
Patrick Loeber
AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial
Patrick Loeber
Ultimate FREE Study Guide for Machine Learning and Deep Learning
Patrick Loeber
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
Stop Overfitting With Basically One Line of Code
Medium · AI
Stop Overfitting With Basically One Line of Code
Medium · Machine Learning
Stop Overfitting With Basically One Line of Code
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI