Classification in Python | logistic regression, LDA, QDA | Data Science With Marco
Key Takeaways
The video covers classification in data science using logistic regression, linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA) in Python, utilizing libraries such as scikit-learn, pandas, and matplotlib.
Full Transcript
hi everyone welcome to data science with Marco today we are learning about classification and working with categorical variables to do that we're going to learn about three algorithms logistic regression linear discriminant analysis and quadratic discriminant analysis we're going to start off with a bit of theory and then we're going to apply all those algorithms in a project setting let's get started we will start this tutorial with a bit of theory about classification first a bit of terminology binary classification is also termed simple classification and this is the case when we only have two classes for example spam or not spam or a fraudulent transaction or not of course you can have more than two classes for example I color we're gonna blue green or brown so you see that in the context of classification we have a qualitative or categorical response unlike regression where we have a quantitative response or numbers now let's see how we can perform classification with logistic regression ideally when doing classification we want to determine the probability of an observation to be part of a class or not therefore we ideally want the output to be between 0 and 1 well it just turns out that there is a function to do that and it's to the sigmoid function as you can see here when X approaches infinity you approach 1 and towards negative infinity you approach 0 the sigmoid function is expressed like this and here we are assuming only one predictor for now we stick to one predictor to make the explanations simpler with some manipulation you can get this formula here so we are trying to get a linear equation in X take the log on both sides and you get the log it as you can see it is indeed linear with respect to X and most importantly the probability is bound between 0 and 1 and now you can estimate the parameters with the following formula but of course we will not be implementing these formulas ourselves it makes sense that you want the probably the predicted probability to be as close as possible to the observed State for example your your classifier says that a transaction is fraudulent then that transaction should really be fraudulent and just like with linear regression we use the p-value to reject or not the null hypothesis of course we can extend the login formula to accommodate multiple predictors and this will most of the time always give you better results since you are considering more predictors so that's it for logistic regression let's move on now to linear discriminant analysis or Lda we want to learn about Lda because logistic regression has some caveats when classes are well separated the parameters estimate are unstable they also tend to be unstable when the data set is small and finally logistic regression can only be used for binary classification with Lda you can overcome these issues because it models the distribution of predictors for each class so you can have more than two target classes and it does so using Bayes theorem bias theorem is explained like this suppose we want to classify an observation into one of K classes where capital K is greater than or equal to two then you let PI K be the overall probability that an observation is associated to the K class you let F K of X denote the density function of X for an observation that comes from the caithe class so the me this means that F K of X is large if the probability that an observation from the cate class has class capital x is equal to small X therefore by is theorem states the equations that you see the probability of the class being K given x equals x is the ratio of pi k FK of x over the sum of pi L F L of X for all classes the challenge here is really approximating the density function so we will assume only one predictor and normal distribution this is expressed by the function you see now so if we plug this function in the formula we saw before and take the log we find out that we must maximize this following equation this is called the discriminant and as you can see it is a linear function with respect to X hence the name linear discriminant analysis when applying Lda we need to be aware of the assumptions it makes and make sure that it applies to our situation here LD assumes that each class has a normal distribution and has its own mean but variance is common for all classes if you had more than one predictor which should be the case then each class is drawn from a multivariate Gaussian distribution and each class has its own mean vector and there is a common covariance matrix so basically we must use vectors and matrices instead of single numbers but the assumptions stay the same now that we understand LD a qd8 should be fairly straightforward the main difference is in the assumptions just like LD a we assume each class is from a multivariate normal distribution and has its own mean vector but also now each class has its own covariance matrix therefore the discriminant is expressed like this and you see that the equation is now quadratic since you have two terms of X being multiplied together hence quadratic discriminant analysis qda is better than LD a when you have a large data set because it has lower bias and higher variance but if your data set is small then LD a should be enough before we move on to the coding portion we must understand how to validate our models in the context of classification we use sensitivity and specificity sensitivity is the true positive rate so the proportion of actual positives identified for example if we are to identify fraudulent transactions then this sensitivity is the proportion of fraudulent transactions that are actually fraudulent on the opposite specificity is the true negative rate or the proportions of actual negatives identified so it will be the proportion of non fraudulent transactions that are actually non fraudulent we can also use the ROC curve where rock stands for receiver operating characteristic we take the area under the curve or AUC why you probably hear about the rock a you see we want the rock a you see to be close to one why well as you can see we plot the false positive rate against the true positive rate ideally we have a false positive rate of zero and a true positive rate of one which would give an area under the curve of one and the curve would hug the upper left corner of the graph so that's it for the theory let's jump on to the code all right so let's start off with this project fire up your Jupiter notebooks or I have already mine open as always I have this folder called data in which I put my data set called mushrooms dot CSV the link for the data set is in the description of the video so in this project we are going to classify mushrooms as either being edible or poisonous depending on different features so you have cap shape cap surface cap collar etc and I put all the possible values here at the beginning of the notebook so I start off by importing the libraries we are going to use open as as PD and owned by SNP of course matplotlib dot by plot as PLT and also Seaborn as SN s now we are going to import SK learn the pre-processing and specifically label encoder you will see how that will be used later on also from SK learn dot model selection import train test split with underscores and also cross Val score awesome from SK learn dot matrix we are going to import the ROC curve and the AUC as well as the confusion matrix shift enter oh sorry first we're gonna put the matplotlib in line so we can see our plots right so shift enter and now I am just going to to define a path for my data set so data slash mushrooms is CSV and now I'm just going to display the first five rows of the data set all right so as you can see these are our first five rows we see the class poisonous or edible and then we see the values for each feature so now I'm just going to do a plot to see how many poisonous and edible mushrooms we have in our data set so this will help us see if the data set is balanced or not so for that I'm going to use Seaborn and as you can see our data set is fairly balanced we almost have the same amount of poisonous and edible mushrooms so that is very good we're not gonna have to do a lot of pre-processing for our analysis now I'm going to define a function that will allow us to see depending on what feature how many mushrooms are poisonous or edible so for example if I if I plot for the cap surface so for all possibilities of values for the cap surface I want to know how many of those mushrooms are edible and how many of those are poisonous so that's gonna give us a bit of intuition as to which feature helps you to actually classify your mushrooms and that's it so that's the function now I'm going to show you how you can use this function but I am NOT going to run it because this will actually run for all the features of the data set so it's gonna generate a lot of plots so I'm just going to show you how to use it and you can run it on your notebook if you want so you set the hue equal to data class so that means that you're gonna have two colors right so poison one for poisonous one for edible and then you simply want to drop the class column and then plot the rest of the data so to do that you just do plot data and then you pass in Hue and data to plot so fairly simple very straightforward like I said you can run this function on your own notebook if you want to see the examples but I will not run it for now so let's move on to pre-processing so you can do this by doing escape - by the way and Center so now I'm gonna check how many null values do we have in our data sets because we do not want any anti values so for calling data columns I want to print the name of the column and then the sum of null values if they are not so you do that by data call that is null that sum if you run it that is amazing as you can see we have 0 everywhere so that means that we have no null values in our data set that's perfect now we are going to use the label encoder so what label encoder will do is that it will transform our class column into one and zeros because we cannot work with letters we have to work with numbers right so you do le da feature fit transform data class and now I will show you the result data head and there you go now the uu as you can see the class is now 1 and 0 so either it is poisonous or not poisonous so what being true 0 being false then you want a one Hut and code the rest of the data set so to do that we do PD that get dummies and then you pass in the data so let's see what the result of that will be and as you can see now we have added a lot of columns so we went from 23 columns to 118 columns because now for every feature we have either true or false and now that is perfect this data is ready to be worked with because we have only numbers everywhere so let's move on to modeling first I'm gonna determine what the target variable is so in this case it is the class values Darvey shape - 1 1 perfect and then the features is gonna be the encoded data the head sorry dot drop the class column access equals to 1 to make sure that we drop the column and then we're going to define a train and test set so we do X train X test y train Y test it is going to be equal to the Train test split that we imported earlier so you pass in your features you pass in your target variable and then you define the size in this case I'm gonna do 0.2 so 20% of the data set will be randomly removed to to use as a test set and we're gonna use the rest to train you can set a random state by the way to keep the results constant so let's apply logistic regression our first algorithm for classification so from SK learn dot linear model import logistic regression now we're gonna initialize the model so logistic regression it's gonna be equal to logistic regression now I will fit the model to our train set so pass in X train and why train dot Rayville now we want to get the probabilities so Y prob is gonna be equal to logistic red dot predict underscore proba and we're gonna use the test set in this in this case because we fit before and now we are doing probabilities on the test set right and now we set our threshold to 0.5 so the actual prediction is going to be NP dot where so in this case if the prediction is greater than 0.5 we're gonna say it's equal to 1 and otherwise is going to be equal to 0 so this is really where we are classifying our mushrooms so we have run that and everything is okay you can safely ignore the warning on the screen and now we're gonna see at a confusion matrix so the confusion matrix is actually going to show you how many mushrooms were correctly classified so confusion matrix you pass in white test and you pass in the Y prediction and hopefully if those are equal you will see that we're gonna have a diagonal matrix and that is actually amazing we have a diagonal matrix so all poisonous and all edible mushrooms were correctly identified so let's check that with actually another metric we're gonna use here the false positive rate and the true positive rate as well as the thresholds and we're gonna set that equal to the ROC curve passing Y test and Y prob and then the rocket you see is simply going to be equal to the AUC and then you pass in the false positive rate and the true positive rate so here we are actually going to use the ROC curve and as you can see when we calculate the rocket you see we get one which is again perfect classification now I'm just going to define a function to plot the ROC curve so we can visualize it see how it looks so this function is going to take in the ROC a you see and here I'm just basically building the plot itself so I'm just setting the fix the figure size to seven by seven setting the title to receiver operating characteristic then I am actually going to plot the false positive rate and the true positive rate and I'm gonna give it a different color here I'm gonna use red we're gonna give it a label called a C and I'm only also going to approximate the DA you see basically so this is gonna give us approximated to the two decimal places and then I'm gonna give it a legend I'm gonna put it on the lower right side of the plot now I am also going to plot a straight line going through 0 & 1 this is just serves us as a general guide to evaluate the ROC curve and I'm gonna make this line dashed now I am going to find the axis or PLT axis tight give some labels so the y label is the true positive rate and the X label will be the false positive rate and that's it for our function so let's actually plot the ROC curve that we obtained above with logistic regression and you should get the following so this is actually a perfect ROC curve so it's hugging the upper left corner and we have an AUC of 1 so that means perfect classification so now let's move on to our second algorithm which is linear discriminant analysis and let's see if the results are going to be different of course it cannot be better right so from a scalar and dot discriminant analysis you are going to import linear discriminant analysis now feel free to pause the video and try it on your own because we are going to basically repeat the same steps as above only this time we're using a different algorithm so you can always pause the video and try it on your own as an exercise so LD a is gonna be equal to linear discriminant analysis so here I'm initializing the model then I'm going to fit the model with our train set so pass in X train and why train dot Ravel then I am going to get the probabilities from the LDA model so le i'll predict prabha and you pass in X test and then you get the predictions and we use the same threshold as before 0.5 so it's actually gonna be the Y Pro NP where Y prob ld8 there you go if it's greater than 0.5 we're just going to classify it as 1 and otherwise it will be 0 from this cell and awesome everything went well as again again you can ignore the warning on the screen so now we're gonna build a confusion matrix here so why test and why print ld8 and let's display the confusion matrix and as you can see perfect classification again so as an exercise we will still build the ROC curve and show it simply to make sure that we get a rock of rock a UC of 1 so get the false positive very true positive rate and thresholds to be equal to ROC curve passing my tests and by sitting the Y probabilities here in this case Y prop Lda now display the rock auc of Lda so first you assign it false positive really true positive rate and now we are ready to display it so Rock a UCL da and we should get one in as acts as expected we indeed get one now we are going to plot the ROC curve using the function we defined earlier and as you can see you get the exact same function which is again expected right because our confusion matrix was the same the around can you see what's the same the plot should be the same so again L da is a perfect classifier in this case and finally we are going to implement quadratic discriminant analysis again I strongly suggest that you pause the video at this point and really try to repeat the steps that we have done before using QD a so as always we're gonna import the model from SK learn so from SK learn dot discriminant analysis import quadratic discriminant analysis now you set the model so you initialize it sorry skew da is quadratic discriminant analysis you can always press tab by the way for autocomplete you fit the model on your train set and then you get the probabilities like I said this types are exactly the same it's just that we are using a different model so why prop you da is QD a dot predict prabha and you use the test set of course and then we get our classifications so NPI where use the same threshold so if it is greater than 0.5 classify as 1 otherwise it is 0 Oh some run the cell and ignore the warning then we're going to take a look at the confusion matrix so let's see if we also get a perfect classifier here with QD a so confusion matrix you pass in Y tests and the predictions just playing the confusion matrix as you can see we get the exact same as before so again QD a is a perfect classifier for our data set now we're gonna plot the ROC curve and get the a rock a you see as well so just like before false positive very true positive rate thresholds it's gonna be equal to ROC curve passing Y test and the Y probability is y prob QD a and then you use the false positive rate and true positive rate to get your rock you see so I'm calling it rock a you see QD a it's going to be equal to the AUC pass in false positive rate and the true positive rate and now you can display the ROC AUC of qd8 and you get one perfect as expected now we are simply going to plot it to make sure that it looks like the other plots and you know that it will intrude and as expected perfect ROC curve AUC of one so that's it for classification thank you for watching the video in the next one we are going to learn about resampling methods and regularization this is gonna help us further improve our models and will also make our data science work for more robust so stay tuned
Original Description
Notebook and dataset: https://github.com/marcopeix/datasciencewithmarco
📚 Theory: 0:00 - 7:07
🐍 Code: 7:08 - 26:36
In this video, we cover the topic of classification in data science. We learn about logistic regression, linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA), and use these algorithms to build a classifier for edible or poisonous mushrooms in Python.
Follow me on Medium: https://medium.com/@marcopeixeiro
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data Science with Marco · Data Science with Marco · 2 of 38
1
▶
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Linear Regression in Python | Data Science with Marco
Data Science with Marco
Classification in Python | logistic regression, LDA, QDA | Data Science With Marco
Data Science with Marco
Resampling and Regularization | Data Science with Marco
Data Science with Marco
Decision Trees | Data Science with Marco
Data Science with Marco
Suppor Vector Machine (SVM) in Python | Data Science with Marco
Data Science with Marco
Unsupervised Learning | PCA and Clustering | Data Science with Marco
Data Science with Marco
Data Science Portfolio Project: Regression #1 | Data Science with Marco
Data Science with Marco
Data Science Portfolio Project: Regression #2 | Data Science with Marco
Data Science with Marco
What Are Time Series - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Basic Statistics - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Autocorrelation and White Noise - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Stationarity and Differencing - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Random Walk Model - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Moving Average Process - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
Autoregressive Process - Applied Time Series Analysis in Python and TensorFlow
Data Science with Marco
ARMA Model - Time Series Analysis in Python and TensorFlow
Data Science with Marco
What is data science?
Data Science with Marco
Answering DATA SCIENCE questions #1 - Why learn SQL when Python and R exist?
Data Science with Marco
R vs Python in the Industry - Data Science Q&A #datascience #datasciencecareer #careeradvice
Data Science with Marco
Data science or data engineering - which is best for you? #datascience #datasciencecareer
Data Science with Marco
Where to find data for data science projetcs? #datascience #datasciencecareer
Data Science with Marco
Data science certificates on resume? #datascience #datasciencecareer #careeradvice
Data Science with Marco
Should you aim for data science or data engineering? | Data Science Q&A #1
Data Science with Marco
Don't waste time on this | #datascience #datasciencecareer
Data Science with Marco
Low-code AI tools - are they good? | #datascience #datasciencecareer #careeradvice
Data Science With Marco
How to grow as a data scientist after 2+ years of experience? #datascience #datasciencecareer
Data Science with Marco
Transition into DATA SCIENCE without a masters or bootcamp #careertransition
Data Science With Marco
How to improve your data science profile?
Data Science With Marco
How to learn Python for data science?
Data Science With Marco
Does Scrum/Agile work for data science?
Data Science With Marco
What are the major roles in analytics and how to choose?
Data Science with Marco
Thoughts and advice for a live SQL coding round
Data Science With Marco
Data science interview question: difference between type 1 and type 2 error
Data Science With Marco
Feature selection in machine learning | Full course
Data Science With Marco
Anomaly detection in time series with Python | Data Science with Marco
Data Science With Marco
Podcast - TimeGPT, predicting the future, and more
Data Science With Marco
Big announcement - Revealing my new book
Data Science With Marco
Get Started in Time Series Forecasting in Python | Full Course
Data Science With Marco
More on: Supervised Learning
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Bloom Filters, Explained Properly
Dev.to · Daksh Gargas
Prefix Sums: The Preprocessing Trick That Makes Range Queries Instant
Medium · Programming
I Thought I Was Ready for the Interview — Then One Simple Math Question Destroyed Me
Medium · Programming
Week 2(Day 10): LeetCode Two Pointers(slow & fast): Remove Duplicates from Sorted Array (Brute…
Medium · Python
🎓
Tutor Explanation
DeepCamp AI