SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial

Patrick Loeber · Beginner ·📐 ML Fundamentals ·6y ago

Skills: ML Maths Basics80%Supervised Learning80%

Key Takeaways

This video tutorial demonstrates the implementation of a Support Vector Machine (SVM) algorithm using only built-in Python modules and numpy, covering the concept, math, and code behind this popular machine learning technique. It uses a linear model to find a hyperplane that best separates data and maximizes the margin between classes by minimizing the magnitude of W, applying gradient descent to find W and B.

Full Transcript

hi everybody welcome to a new machine learning from scratch tutorial today we are going to implement the SVM algorithm using only build and Python modules and numpy the SVM or support vector machine is a very popular algorithm it follows the idea to use a linear model and to find a linear decision boundary also called a hyperplane that best separates our data and here the choice as the best hyperplane is the one that represents the largest separation or the largest march in between the two classes so we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized so if you have a look at this image then we want to find a hyperplane and the hyperplane has to satisfy this equation W times X minus B equals 0 and we want to find the hyperplane so that the distance to both the set to both classes is maximized so we use the class plus 1 here and minus 1 here so this distance or the margin should be maximized and first let's have a look at the math behind it so it's a little bit more complex than in my previous tutorials but I promise that once you have understood it the final implementation is fairly simple so we used the linear model W times X minus B that should be 0 and then our our function should also satisfy the condition that W times X minus B should be greater or equal than 1 for our class plus 1 so all the samples here must lie on the left side of this equation or this line here and all the samples of the class - one must lie on the right side from this equation so if we put this mathematically then we should it must satisfy W times X minus B should be greater or equal than one for class one or it should be less or equal than minus one for class minus one so if you put this in only one equation then we multiply our linear function with the class label and this should be greater or equal than one so this is the condition that we want to satisfy and now we want to come up with the W and the B so our weights and the bias and for this we use the cost function and then apply gradient descent so if you're not familiar with gradient decent already then please watch one of my previous tutorials for example the one with linear regression there I explained this a little bit more in detail so now let's continue so we use the user cost function here and in this case we use the hinge loss and this is defined as the maximum of zero and one - and here we have our condition Y I times our linear model so what this means is if if we plot the hinge loss then here the blue line is the hinge loss so this is either 0 if Y times F is greater or equal than 1 so if they have the the same sign then it's 0 and so if they yeah if they are correctly classified and are larger than 1 then our loss is zero so this means if we have a look at this image again if for the green class if it's if it lies on this side then it's 0 and for the blue class if it lies on this side then it's also 0 and otherwise and then we have a linear function so the further we are away from our decision boundary line the higher is our loss and so this is one part of our cost function and the other part is as I already said we want to maximize the margin here so between these two classes and the margin is defined is 2 over the magnitude of W so this is dependent from our weight dependent on our weight vector so we want to maximize this and therefore we want to minimize the magnitude so we put this or add this to our cost function so we also put this term the magnitude of W to the power of 2 times a lambda parameter and then here we have our hinge loss so the lambda parameter tries to find a trade-off between these two terms so with it says basically says which is more important so we want to of course we want to have the right classification we want to lie on the correct side of our lines but we also want to have the the line such that the margin is is maximized so yeah so if you look at the two cases if our if we are on the right side of the lines of why I times F on X f of X is greater or equal than one then we simply we only have this term because this is the hinge loss is 0 and otherwise then our cost function is this year and now we want to minimize that so we want to get the derivatives or the gradients of our cost function so in the first case if we are greater or equal than 1 our derivative is only is 2 times lambda times W so and here we only look at one component of our W so we get rid of the magnitude and the derivative with respect to the B is 0 so please double check that for yourself here I will not explain the derivatives in details and in the other case so if if Y I times F on X is not greater or equal than 1 then our derivative with respect to the W is this equation here and the derivative with respect to our bias is only Y I so again please double check that for yourself and then when we have our gradients we can use the update rule so the new weight is the old weight - because we use gradient descent so we go into negative direction - the learning rate or the step size times the derivative so these are our update rules and now I hope you've understood the concept and the math behind this and now we can start implementing it so this is now straightforward first of all we import numpy S&P of course and then we create our class as we M which will get an init method and here I will put in a learning rate which will get a default value of point zero zero one and it will get a lambda parameter which will also get a default and I will say this is point zero one so this is usually also a small value and then it will get the number of iterations for our optimization which will get the default of one thousand so then I will simply store them so I will say self dot L R equals learning rate self dot lambda param equals lambda param so note that I cannot use lambda here because lambda is a key word in Python for the lambda function so ya then self dot and ITER's equals and ITER's then I will say self dot W equals nun and self dot B equals nun so I have to come up with them later and then we define our two functions so as always one is the predict function where we fit the training samples and the training labels and the sorry this is the fit method and the other one is the predict method where we predict the labels of the test samples and now let's start with the predict method because this is very short so we want to as I said if we look at the math we apply this linear model and then we look at the sign of this so if it's positive then we say it's class one and if it's negative then we say it's class minus one so we say linear output equals numpy dot dot so the dot product of X and self dot W minus self dot B and then we choose the sign so we can simply say return numpy dot sine of this linear output so this is the whole predict implementation and now let's continue with the fit method so first of all as I said we used the classes plus 1 and minus 1 here so we want to make sure that our Y has only minus 1 and plus 1 so oftentimes it has 0 and 1 so let's convert this so let's say Y underscore equals and here we can use numpy dot where this will get a condition so we say y and if this is less or equal than 0 then we put in minus 1 and otherwise we put in plus 1 so this will convert all the zeros or smaller numbers to minus 1 and the other numbers 2 plus 1 and now let's get the number of samples and the number of features and this is simply X dot shape because our input vector X is in numpy and D array where the number of rows is the number of samples and the number of columns is the none features then we want to initialize our W and our B and we simply put in zeros in the beginning so we say self dot W equals numpy zeros of size and features so for each feature component we put in a zero for our weight component and then we say self dot B equals zero and now we can start with our gradient descent so we say for underscore because we don't need this in range self dot and it error so the number of iterations we want to do this and then we iterate over our training samples so I say for index and X I in enumerate X so this will give me the current index and also the current sample and now what I want to do now is let's have a look at the math again so I want to come I want to calculate the weight or the derivative of our cost function with respect to the W and with respect to the bias and here I first but at first I look if this condition is satisfied so I will say and the condition is why I times our linear function so I say condition equals y underscore of the current index times and then the linear function so numpy dot of the current sample and our self W minus self dot be this should be greater or equal than one so if this is satisfied and the condition is true and otherwise it's false so now I say if condition so if this is true then our derivatives look like this so the derivative with respect to the B is just zero and so we only need this so I say so it's two times lambda times W and then in our update we go in as a we say the new weight is the old way - the learning rate times this so I write this in one step so I say self dot W - equal self dot learning rate times and now here we have two times self dot lambda parameter times self dot W so this is the first update or if our condition is satisfied and we only need this update and otherwise we say self dot W - equals self times L our learning rate times and let's again have a look at the equation so it's 2 times lambda times W minus y I times X I so 2 times our lambda times the W - numpy dot so I want to multiply our vectors X I and y i so the y underscore of the current index so this is our update for the W and our self dot B is minus equal self times learning rate times the derivative and the derivative is only or just Y I so only Y underscore of the index and now we're done so this is the whole implementation and now let's test this so I've written a little test script that will import this SVM class and then it will generate a some test samples so it will generate two classes and then I will create my SVM classifier and fit the data and then I wrote a little function to visualize this so you can find the code on github by the way so please check that out for yourself and now if we run this so let's say Python as we am underscore test of time and now this should calculate the weights and the bias and it should also plop the decision function so that yellow line and the two lines on both sides here and we see that it's working so yeah that's all about the SVM I hope you enjoyed this and if you liked this please subscribe to my channel and see you next time bye

Original Description

Get my Free NumPy Handbook: https://www.python-engineer.com/numpybook In this Machine Learning from Scratch Tutorial, we are going to implement a SVM (Support Vector Machine) algorithm using only built-in Python modules and numpy. We will also learn about the concept and the math behind this popular ML algorithm. ~~~~~~~~~~~~~~ GREAT PLUGINS FOR YOUR CODE EDITOR ~~~~~~~~~~~~~~ ✅ Write cleaner code with Sourcery: https://sourcery.ai/?utm_source=youtube&utm_campaign=pythonengineer * 📓 Notebooks available on Patreon: https://www.patreon.com/patrickloeber ⭐ Join Our Discord : https://discord.gg/FHMg9tKFSN If you enjoyed this video, please subscribe to the channel! The code can be found here: https://github.com/patrickloeber/MLfromscratch Further readings: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 You can find me here: Website: https://www.python-engineer.com Twitter: https://twitter.com/patloeber GitHub: https://github.com/patrickloeber #Python #MachineLearning ---------------------------------------------------------------------------------------------------------- * This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Patrick Loeber · Patrick Loeber · 28 of 60

← Previous Next →

Lists in Python - Advanced Python 01 - Programming Tutorial

Lists in Python - Advanced Python 01 - Programming Tutorial

Tuples in Python - Advanced Python 02 - Programming Tutorial

Tuples in Python - Advanced Python 02 - Programming Tutorial

Dictionaries in Python - Advanced Python 03 - Programming Tutorial

Dictionaries in Python - Advanced Python 03 - Programming Tutorial

Sets in Python - Advanced Python 04 - Programming Tutorial

Sets in Python - Advanced Python 04 - Programming Tutorial

Strings in Python - Advanced Python 05 - Programming Tutorial

Strings in Python - Advanced Python 05 - Programming Tutorial

Collections in Python - Advanced Python 06 - Programming Tutorial

Collections in Python - Advanced Python 06 - Programming Tutorial

Itertools in Python - Advanced Python 07 - Programming Tutorial

Itertools in Python - Advanced Python 07 - Programming Tutorial

Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce

Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce

Exceptions in Python - Advanced Python 09 - Programming Tutorial

Exceptions in Python - Advanced Python 09 - Programming Tutorial

Logging in Python - Advanced Python 10 - Programming Tutorial

Logging in Python - Advanced Python 10 - Programming Tutorial

JSON in Python - Advanced Python 11 - Programming Tutorial

JSON in Python - Advanced Python 11 - Programming Tutorial

Random Numbers in Python - Advanced Python 12 - Programming Tutorial

Random Numbers in Python - Advanced Python 12 - Programming Tutorial

Decorators in Python - Advanced Python 13 - Programming Tutorial

Decorators in Python - Advanced Python 13 - Programming Tutorial

Generators in Python - Advanced Python 14 - Programming Tutorial

Generators in Python - Advanced Python 14 - Programming Tutorial

Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial

Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial

Threading in Python - Advanced Python 16 - Programming Tutorial

Threading in Python - Advanced Python 16 - Programming Tutorial

Multiprocessing in Python - Advanced Python 17 - Programming Tutorial

Multiprocessing in Python - Advanced Python 17 - Programming Tutorial

Function arguments in detail - Advanced Python 18 - Programming Tutorial

Function arguments in detail - Advanced Python 18 - Programming Tutorial

The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial

The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial

Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial

Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial

Context Managers in Python - Advanced Python 21 - Programming Tutorial

Context Managers in Python - Advanced Python 21 - Programming Tutorial

KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial

KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial

Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial

Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial

Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial

Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial

Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04

Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04

Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial

Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial

Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial

Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial

SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial

SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial

Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial

Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial

Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial

Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial

Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial

Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial

K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial

Anaconda Tutorial - Installation and Basic Commands

Anaconda Tutorial - Installation and Basic Commands

PyTorch Tutorial 01 - Installation

PyTorch Tutorial 01 - Installation

PyTorch Tutorial 02 - Tensor Basics

PyTorch Tutorial 02 - Tensor Basics

PyTorch Tutorial 03 - Gradient Calculation With Autograd

PyTorch Tutorial 03 - Gradient Calculation With Autograd

PyTorch Tutorial 04 - Backpropagation - Theory With Example

PyTorch Tutorial 04 - Backpropagation - Theory With Example

PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation

PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation

PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer

PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer

PyTorch Tutorial 07 - Linear Regression

PyTorch Tutorial 07 - Linear Regression

PyTorch Tutorial 08 - Logistic Regression

PyTorch Tutorial 08 - Logistic Regression

PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training

PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training

PyTorch Tutorial 10 - Dataset Transforms

PyTorch Tutorial 10 - Dataset Transforms

Download Images With Python Automatically - Python Web Scraping Tutorial

Download Images With Python Automatically - Python Web Scraping Tutorial

PyTorch Tutorial 11 - Softmax and Cross Entropy

PyTorch Tutorial 11 - Softmax and Cross Entropy

Select Movies with Python - Web Scraping Tutorial

Select Movies with Python - Web Scraping Tutorial

PyTorch Tutorial 12 - Activation Functions

PyTorch Tutorial 12 - Activation Functions

List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial

List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial

PyTorch Tutorial 13 - Feed-Forward Neural Network

PyTorch Tutorial 13 - Feed-Forward Neural Network

How To Add A Progress Bar In Python With Just One Line - Python Tutorial

How To Add A Progress Bar In Python With Just One Line - Python Tutorial

PyTorch Tutorial 14 - Convolutional Neural Network (CNN)

PyTorch Tutorial 14 - Convolutional Neural Network (CNN)

The Walrus Operator - New in Python 3.8 - Python Tutorial

The Walrus Operator - New in Python 3.8 - Python Tutorial

PyTorch Tutorial 15 - Transfer Learning

PyTorch Tutorial 15 - Transfer Learning

YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1

YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1

YouTube Data API Tutorial with Python - Find Channel Videos - Part 2

YouTube Data API Tutorial with Python - Find Channel Videos - Part 2

YouTube Data API Tutorial with Python - Get Video Statistics - Part 3

YouTube Data API Tutorial with Python - Get Video Statistics - Part 3

YouTube Data API Tutorial with Python - Analyze the Data - Part 4

YouTube Data API Tutorial with Python - Analyze the Data - Part 4

AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial

AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial

Ultimate FREE Study Guide for Machine Learning and Deep Learning

Ultimate FREE Study Guide for Machine Learning and Deep Learning

This video tutorial teaches how to implement a Support Vector Machine (SVM) algorithm from scratch using Python and numpy, covering the concept, math, and code behind this popular machine learning technique. It provides a step-by-step guide on how to implement the SVM algorithm, including finding the hyperplane that maximizes the margin between classes and applying gradient descent to find W and B. By the end of this tutorial, viewers will be able to implement and train their own SVM models usin

Key Takeaways

Implement the SVM algorithm using Python modules and numpy
Find the hyperplane that maximizes the margin between classes
Apply gradient descent to find W and B
Create a class for the SVM model with an init method and default values for the learning rate, lambda parameter, and number of iterations
Define the predict method to apply a linear model and return the sign of the output
Define the fit method to convert the labels to -1 and 1, and then fit the model using the training samples and labels
Initialize W and B with zeros
Update W and B using gradient descent
Calculate derivative of cost function with respect to W and B
Update W and B using learning rate and lambda parameter

💡 The SVM algorithm uses a trade-off between the margin and classification accuracy, and the hinge loss is 0 when the prediction is greater or equal than 1. The cost function is minimized using gradient descent with a learning rate and lambda parameter.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Related AI Lessons

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2

Learn the basics of the TypeScript compiler to write better JavaScript code

Medium · JavaScript

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting in machine learning models with a simple code tweak and understand the difference between Ridge and Lasso regression

Medium · Machine Learning

Stop Overfitting With Basically One Line of Code

Prevent overfitting in models with a simple code tweak, understanding the difference between Ridge and Lasso regression

Medium · Data Science

Learn Deep Learning by Hand (Beginner's Guide - Part 1)