Coding SVM Kernels : Data Science Code

ritvikmath · Intermediate ·📐 ML Fundamentals ·5y ago

Key Takeaways

The video demonstrates coding SVM kernels in R, using the e1071 library and svm function, and explores the effects of different kernels and parameters on the model's decision boundary and accuracy. It also covers the use of cross-validation to find the best kernel and the implementation of SVM with linear kernel for classification using scikit-learn library.

Full Transcript

hey everyone welcome back so today we're going to be looking at coding kernels in svm so we just had a bunch of videos on advanced topics in svm especially kernels and why they're so powerful and help us extend this idea of svm to more real-world cases and in this video we look at uh various experiments involving kernels and coding kernels in svm so you may have already noticed but we'll be using r for this video we use python for most of the videos on this channel and that's mostly because i think it's more applicable for industry i've used it more but at the same time i know that if you're in academia or if you're in college right now r might be the language you're using so let's do a video in r and let's talk about this cool topic of svm kernels first thing i'm going to do is import this library with a funny name called e1070 so i'm not 100 sure why it's called this but this is where most of the functionality of svm lives in r and so we want to import that library the next thing is i'm going to generate some data according to a complex polynomial decision boundary and instead of going to the code let me just show you the results so i'm going to generate training and a testing set according to this complex polynomial decision boundary so here's the training data as you can see the decision boundary is cubic and there is a class 0 mostly on top and a class 1 mostly on the bottom now what makes this kind of applicable and more real world-ish is that the decision boundary first of all is not just linear and the second thing is that if you look at the two classes they're kind of intersecting with each other which means that even a very powerful svm model still won't able to get a 100 accuracy because there's places where the data is just kind of intermingled with each other but we're going to see which svm model gives us the best possible accuracy on the testing data which looks like this so the testing and training data look very similar they are coming from the same decision boundary polynomial but there's different amounts of randomness built into each data point and so they are two different data sets with the same signature so we're going to be building svm models on the training data and seeing what the accuracy is on the testing data so the first experiment we're going to do the first of three is trying out different types of kernels so as we saw in our theoretical kernel videos we mainly focused on two kernels the polynomial kernel which takes into account second degree third degree and so on interactions between your original variables and this radial basis function kernel which takes this idea to its extreme basically we're taking into account the infinite polynomial interactions between your data which is why part of why it's so powerful but there's also other kernels we didn't look at for example sigmoid is another one that people have thought of and the last one in this list is linear linear kernel is actually the simplest because it we're not doing anything we're simply just taking the data and projecting it into the same exact dimensional space so this is kind of our control to see how much better the other ones do so let's look at the output of again let me show you the basics of the code so you understand what's going on we use the svm function here and we train the svm model on the training data then we predict the labels of the testing data and then we calculate the accuracy of those predictions so these three lines very simply are fitting the model using a particular kernel k so let's look at the results by using the four various types of kernels here's linear and as the name linear would suggest we are not transforming the data at all which is why the decision boundary looks linear and you can see for a linear kernel it's doing the best it can do which results in an accuracy of about 69 and let me explain what you're looking at here in this picture this is the testing data so all the points you see here are testing data and the color of the point is corresponding to the prediction from that particular svm model for that particular kernel these smaller dots you see like here and here those are not active testing points but this is saying that if there was a testing point up there it would be categorized in zero or if there was a testing point down here it would get categorized as one so these smaller dots help visualize the split where the model is splitting and choosing to classify zeros versus one so it splits up the decision space so we can see how we're going to do better than 69 percent for example using a polynomial kernel weirdly enough we actually do worse it's doing some weird kind of split i don't think this is because the polynomial kernel is weak i think it's because there's certain parameters in the polynomial kernel that we should probably tune like the degree of the polynomial and some other things the reason i chose not to tune that for this video is because as we'll see the radio basis function kernel just kind of does great right out the gate so i chose to focus on that going forward but for a real world problem you'd want to not only look at the default setting of all these kernels but also various setting of the parameters so if we look at the radio basis kernel as mentioned before it's doing great just in its default state we get an accuracy about 76 right out the gate and you can see that as we saw on the radio basis video it is really kind of trying to match this decision boundary well it's able to take into account these non-linearities in a real nice way the sigmoid kernel kind of fall apart completely but again i don't think this is necessarily because the sigmoid kernel is weak i think there are certain parameters that we should have tuned if we were going down the sigmoid route but we'll go down the radio basis function kernel route for the rest of this video so there's two more experiments now that we're going to run and these experiments are we're going to first of all hold the kernel as radial basis for the rest of this video we're going to be looking at the effect of two important parameters on the effect of the model so the first one is called gamma now it'll be much more clear as i show you the pictures but first let me try to explain at a high level what this gamma parameter is doing we're trying values of this parameter from point zero one to point one all the way up to one thousand the smaller this value is the more points the more training points we're gonna take into account around some testing point that we're trying to classify the bigger this is so this is like 1000 then this becomes extremely localized which means that when we're looking at a testing point we're only considering the training points that are like really close to it when making a decision so it's a question about should i use a lot of my data to make this decision or should i only use very local data and this is another way of stating the bias variance trade-off but let's look at the pictures as we have that discussion so i've printed out the effect of the predictions of the model for all these different values of gamma so we said the lowest one here is gamma equals 0.01 and it looks like a linear boundary which is weird because we're using a radial kernel but remember that with a very low gamma anytime we're predicting a point we're taking into account virtually all of the training data and this is good in the sense that this model is going to have low variance which means that if i were to change some of the training points here and there it wouldn't affect the prediction because those small changes in specific training points get washed out when we consider all of the training points because of this small value of gamma but it also means that this model has very high bias which means that on the training data it's not going to be able to predict even the training data very well it's not taking into account the necessary small changes in the data the nuances the non-linearity it's kind of just averaging everything together so this is probably not the way we want to go the accuracy here is only 69 which is about the same we got with a linear kernel before so let's increase gamma keeping in mind that increasing gamma means we're going to be making the model more and more local taking into account more and more local points and so the decision boundary will get more and more irregular so here's point one you can see there's a slight curvature now to the decision boundary because we're taking into account local changes only and now our accuracy has actually jumped up to 74 which is nice if we do gamma equals one it goes up again to 77 which i think is what the default look like yeah about that let's increase gamma more so we get 77 so we went up a little bit but we see weird things happening now for example even though we didn't actively have any testing points up here if we did they would get categorized into class 1 which is a little bit weird so let's keep that in mind as we increase gamma more now we increase gamma 200 and our accuracy actually goes down and we see kind of weird things happening where this whole space up here is categorized as a one and it doesn't seem natural anymore and let's take it to its extreme so if gamma is equal to 1000 our accuracy has dropped back down to about 69 and this is what our predictions look like what's happening here keeping in mind that a very large value of gamma means we're becoming extremely localized we get this opposite case where our variance is actually huge which means that small changes small noise in the training data is going to have huge impacts on our predictions and our model which is why you're seeing this very noisy prediction space here the good thing about this is that it has low bias which means on the training data it's going to do great but that doesn't generalize well to the testing data as we can see here so the best gamma is probably something around 1 or 10 so something around 0.1 to 10 i would say because that's a good balance between bias and variance other parameter that we are going to talk about is cost so there's a parameter called cost that you're able to change so cost is kind of nicely named because it is the cost of making a mistake now let me backtrack for a second we're looking at soft margin svm obviously in this video because the hard margin svm case cannot even take into account uh cases where the data is intermingled with each other so we're looking at the soft margin case where we allow for errors here and there but we give a penalty to these errors when they happen and by error i mean a point that is classified in the wrong class and so we allow these misclassifications to happen if it means that our model is more generalizable and we're getting the majority of the data points correct but that doesn't mean that we're not allowed to change the weight on these errors so this cost parameter which we're giving the value c where c ranges between .01 and 1 million is exactly the cost that you need to pay for every mistake that you make in your soft margin svm problem now it's going to be much more clear what i'm saying when we look at the pictures so let's look at the result for cost is equal to 0.01 so this looks like not linear exactly but it's not as curved or not taking into account as many of the non regularities non-linearities as before and notice that this case is cost equals 0.01 which means that there's a very small cost you're paying to make each of these mistakes in the soft margin svm case and the next logical place to go from there is that because there's such a low cost for each mistake we're able to make lots of mistakes in our svm case which means our model is able to generalize very very very well in this case maybe it's generalizing even too well which is why we're not really taking into account the irregularity the non-linearity of our data so in terms of the bias variance trade-off the variance of this model is low but the bias is going to be high let's see what happens when we increase the cost the penalty that we have to pay for each mistake so if cost goes up to one then we see that we're able to take into account these non-linearities a lot better and our accuracy actually goes up a little bit and so let's keep increasing the cost and see if at some point this reaches the best possible case so if cost is equal to 100 we get about the same accuracy but we start seeing these weird artifacts where parts of the decision space are labeled in ways that we wouldn't necessarily expect if we increase the cost up to 10 000 then we see that it's actually doing kind of well if we increase the cost up to a million this is where it kind of breaks down so our accuracy has gone back down and this kind of looks like a huge mess but let's talk about where this huge mess comes from now the cost is a million which means that for each mistake we make in the soft margin svm case we're going to pay a massive cost what that means is the model is going to want to make as few mistakes as possible ideally none what that means is the decision boundary is going to have to get really really intricate and complex so that we're not making any mistakes at all on the training data what that means is the bias is going to be low which means the training data is taken care of but the model has gotten so complex has such a high variance that it doesn't generalize well to the testing data as we can see here it's become complete trash on the testing data which is why our accuracy went down so the best cost was probably something around 1 or 100 something around that range definitely not a million and definitely not 0.01 and so that's the end of this video main stuff i wanted to get across is that we can very easily code different kernels there's just built-in kernels but furthermore we can set different parameters on those kernels for example gamma and cost are the most important ones and you're going to want to try out many different values of these kernels if you're doing a real problem and use something like cross validation to see which is the best one okay so hopefully this helped you understand coding svm kernels a little bit better give you a little bit of r code that you can add to your repository this code will be available to you in the description below and i'll see you next time

Original Description

Coding SVM Kernels in R! SVM Kernels Video : https://youtu.be/OKFMZQyDROI Radial Basis Function Kernel Video : https://youtu.be/Q0ExqOphnW0 Code Used in this Video : https://github.com/ritvikmath/YouTubeVideoCode/blob/main/SVM_Kernels.Rmd My Patreon : https://www.patreon.com/user?u=49277905
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from ritvikmath · ritvikmath · 0 of 60

← Previous Next →
1 Math Team Update
Math Team Update
ritvikmath
2 Single Variable Calculus Volume of a Sphere - Proof 1
Single Variable Calculus Volume of a Sphere - Proof 1
ritvikmath
3 Single Variable Calculus Volume of a Sphere - Proof 2
Single Variable Calculus Volume of a Sphere - Proof 2
ritvikmath
4 Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
ritvikmath
5 Multivariable Calculus Volume of a Sphere Proof - Double Integrals
Multivariable Calculus Volume of a Sphere Proof - Double Integrals
ritvikmath
6 The Euclidian Algorithm
The Euclidian Algorithm
ritvikmath
7 Proving the Chain Rule
Proving the Chain Rule
ritvikmath
8 Proving the Fundamental Theorem of Calculus Part 1
Proving the Fundamental Theorem of Calculus Part 1
ritvikmath
9 Proving the Fundamental Theorem of Calculus Part 2
Proving the Fundamental Theorem of Calculus Part 2
ritvikmath
10 Math Puzzle - Poison Perplexity
Math Puzzle - Poison Perplexity
ritvikmath
11 Math Puzzle - Poison Perplexity - Solution
Math Puzzle - Poison Perplexity - Solution
ritvikmath
12 Expected Value and Variance of Continuous Random Variables (Calculus)
Expected Value and Variance of Continuous Random Variables (Calculus)
ritvikmath
13 Expected Value and Variance of Discrete Random Variables (No Calculus)
Expected Value and Variance of Discrete Random Variables (No Calculus)
ritvikmath
14 Array Method
Array Method
ritvikmath
15 Complex Power Series and their Derivatives
Complex Power Series and their Derivatives
ritvikmath
16 Distributions - Intro
Distributions - Intro
ritvikmath
17 The Poisson Distribution
The Poisson Distribution
ritvikmath
18 The Bernoulli Distribution
The Bernoulli Distribution
ritvikmath
19 The Binomial Distribution
The Binomial Distribution
ritvikmath
20 The Continuous Uniform Distribution
The Continuous Uniform Distribution
ritvikmath
21 The Geometric Distribution
The Geometric Distribution
ritvikmath
22 The Triangular Distribution
The Triangular Distribution
ritvikmath
23 The Exponential Distribution
The Exponential Distribution
ritvikmath
24 The Borel Distribution + Notes on Poisson Distribution
The Borel Distribution + Notes on Poisson Distribution
ritvikmath
25 The Gamma Distribution
The Gamma Distribution
ritvikmath
26 The Normal Distribution
The Normal Distribution
ritvikmath
27 The Laplace Distribution
The Laplace Distribution
ritvikmath
28 The Chi - Squared Distribution
The Chi - Squared Distribution
ritvikmath
29 Overfitting
Overfitting
ritvikmath
30 Vector Norms
Vector Norms
ritvikmath
31 Truths Behind the Titanic : K-Nearest Neighbor
Truths Behind the Titanic : K-Nearest Neighbor
ritvikmath
32 The Mathematics of Breakups
The Mathematics of Breakups
ritvikmath
33 Sillyfish
Sillyfish
ritvikmath
34 Finding Optimal Paths - Dynamic Programming
Finding Optimal Paths - Dynamic Programming
ritvikmath
35 HowToDataScience : Scraping Twitter Data
HowToDataScience : Scraping Twitter Data
ritvikmath
36 Decision Trees
Decision Trees
ritvikmath
37 Perceptron
Perceptron
ritvikmath
38 Naive Bayes
Naive Bayes
ritvikmath
39 K-Nearest Neighbor
K-Nearest Neighbor
ritvikmath
40 Evaluating Machine Learning Models
Evaluating Machine Learning Models
ritvikmath
41 Decision Tree Pruning
Decision Tree Pruning
ritvikmath
42 K-Means Clustering
K-Means Clustering
ritvikmath
43 Gaussian Mixture Model
Gaussian Mixture Model
ritvikmath
44 Data Science - Fuzzy Record Matching
Data Science - Fuzzy Record Matching
ritvikmath
45 Time Series Talk : Autocorrelation and Partial Autocorrelation
Time Series Talk : Autocorrelation and Partial Autocorrelation
ritvikmath
46 Time Series Talk : Autoregressive Model
Time Series Talk : Autoregressive Model
ritvikmath
47 Time Series Talk : Moving Average Model
Time Series Talk : Moving Average Model
ritvikmath
48 Time Series Talk : ARMA Model
Time Series Talk : ARMA Model
ritvikmath
49 Time Series Talk : ARCH Model
Time Series Talk : ARCH Model
ritvikmath
50 Time Series Talk : White Noise
Time Series Talk : White Noise
ritvikmath
51 Time Series Talk : Stationarity
Time Series Talk : Stationarity
ritvikmath
52 Time Series Talk : ARIMA Model
Time Series Talk : ARIMA Model
ritvikmath
53 Time Series Talk : Lag Operator
Time Series Talk : Lag Operator
ritvikmath
54 Time Series Talk : What is Seasonality ?
Time Series Talk : What is Seasonality ?
ritvikmath
55 Time Series Talk : Seasonal ARIMA Model
Time Series Talk : Seasonal ARIMA Model
ritvikmath
56 So ... What Actually is a Matrix ? : Data Science Basics
So ... What Actually is a Matrix ? : Data Science Basics
ritvikmath
57 Derivative of a Matrix : Data Science Basics
Derivative of a Matrix : Data Science Basics
ritvikmath
58 Basics of PCA (Principal Component Analysis) : Data Science Concepts
Basics of PCA (Principal Component Analysis) : Data Science Concepts
ritvikmath
59 Eigenvalues & Eigenvectors : Data Science Basics
Eigenvalues & Eigenvectors : Data Science Basics
ritvikmath
60 The Covariance Matrix : Data Science Basics
The Covariance Matrix : Data Science Basics
ritvikmath

This video teaches how to code SVM kernels in R and how to analyze the effect of different kernels and parameters on the model's decision boundary and accuracy. It covers the implementation of SVM with linear kernel for classification using scikit-learn library and the use of cross-validation to find the best kernel.

Key Takeaways
  1. Import e1071 library
  2. Generate training and testing data with complex polynomial decision boundary
  3. Train SVM model on training data using different kernels
  4. Predict labels of testing data
  5. Calculate accuracy of predictions
  6. Try different values of gamma and cost parameters
  7. Use cross-validation to find the best kernel
💡 The gamma and cost parameters have a significant impact on the model's decision boundary and accuracy, and cross-validation can be used to find the best kernel.

Related Reads

📰
I’m an ML Engineer. I got tired of "AI Trading Bot" scams, so I coded my own Cash Swing Trading Engine in public. (No advice, just math)
Learn how an ML engineer built a cash swing trading engine using math, without giving advice, to counter AI trading bot scams
Dev.to AI
📰
Day 28 Part 1: No New Features Again — This Time We Make Everything Faster
Learn to optimize performance in a machine learning stack by profiling and addressing bottlenecks in FastAPI, Redis, Postgres, and ML inference
Medium · Machine Learning
📰
Overfitting & Underfitting — When AI Learns Too Much or Too Little
Learn to identify and address overfitting and underfitting in AI models, crucial for improving model performance and generalization.
Medium · AI
📰
Evolving Algorithms: Next-Generation AI in Predictive Analytics
Learn how next-generation AI is transforming predictive analytics with evolving algorithms and why it matters for informed decision-making
Dev.to · Fu'ad Husnan
Up next
1. Overview of Artificial Intelligence | What is AI? Fundamental Concepts & Complete History of AI
Professor Rahul Jain
Watch →