Logistic Regression Project: Cancer Prediction with Python

Alejandro AO · Beginner ·📐 ML Fundamentals ·3y ago

Skills: Supervised Learning90%ML Maths Basics70%ML Pipelines60%

Key Takeaways

This video tutorial demonstrates the use of logistic regression for breast cancer prediction using Python, covering data preprocessing, model training, and evaluation. The tutorial utilizes libraries such as pandas, scikit-learn, and matplotlib to build and deploy a logistic regression model.

Full Transcript

Good morning everyone. How is it going today? Today we're going to be working on a machine learning tutorial and we're going to be learning how to build a machine learning model in order to make predictions on a cancer project. Okay, we're going to be working with a logistic regression model. And this project is aimed at data science beginners and also at software developers who would like to implement the machine learning side to their applications especially in the back end. Okay. So, if that's you, this is definitely the video for you. Um, all right. So, we're going to be using Python um for this tutorial. Um, that's also, I mean, for software developers, it's going to make it easy if you want to take this um code into your server. And also, I'm going to be delving a little bit into a very very mild side of the mathematics behind this so that you kind of understand what is going on behind the scenes. Um, but just for intuition, this is not going to go very very deep into the mathematics. Don't be scared about that. Um, all right. So, without further ado, let's get right into it. So, in this project, we're we pretty much have a data set of a lot of cells um measurements that some doctors have made and we're going to basically predict if based on these measurements the cell is malignant or benign. So that's basically how it works. Um so to do this we're going to be using logistic regression and the kind of the process that follows is very very simple. You basically are going to build a model and you're going to be able to feed it some numerical values. In this case, it's going to be the measurements of the cell. Like I I don't know like in medical terms, I'm not very I'm not very good. But the idea is that you have a lot of measurements, the nucleus, the I don't know the membrane, etc. And then based on all those measurements, you're going to feed them to your model and the response that you are going to get from those measurements and from the model is going to be a categorical value. basically a yes or no answer. In this case, this kind of model. Um, logistic regression is super useful because that's basically the situation. We have all the numerical values and then we what all that we need to have is a yes or no answer. That's basically how logistic regression works. um to make to give you an example real quick. Um it's kind of a good idea to have some sort of background on what linear regression is as well. Um so if you don't know how linear regression works, I'm going don't worry, you can watch I have another video on linear regression. But if you don't, don't worry. This is I'm just going to explain it real real quick to you. Um so let's suppose that you have a data set um a lot of people that um work in a marketing agency and they want they're running TV ads and they're selling their product and they want to see how much um the number of ads that they publish on TV impacts the sales. Okay. So you have this very basic chart in which you have a lot of dots and every single point in this in this scatter plot is a sale. Okay. So here for example we had one sale we had three sales here etc. And you can see a trend. The more TV ads that the company has run the more sales they have they have got. And that basically makes a lot of sense. So you kind of intuitively know that there is some that there is a correlation between the TV ads and the sales. Now how how does linear regression works? Linear regression basically is also a predictive model just like logistic regression but instead of giving you a yes or no answer it gives you a numerical answer. Okay. So in this case for example you have all of your scatter plot and then what the model does is it draws a line across it and then it measures the distance from that line to every single point of your of your data and then it squares that data I mean that that distance in order so that the more distant points are more punished than the shorter than the closest ones. And then it adds all of those together. And then it tilts the line a little bit. And all I mean it does this like several times until it finds the inclination and the slope that has the minimum sum of the squares of the distances. So basically it's finding the line that is the closest possible to every single point. And that's basically what a linear regression does. Now if you remember from high school or call I don't know where you had this mathematics um the idea behind um a straight line in a plot is that you can pl you can you can represent it as an equation like this in which y is your value for this side x is your value for this side um beta 0 would be your intersection with the y- axis and then this one right here would be the coefficient of how important um the value of x is for the value of y. So the higher this one is, the higher the slope. Okay, that's basically how it works. And what the model is doing, it's finding this equation right here. And basically it's super useful because that means that once you have x, you will be able to predict the value of y just by performing a very simple um very simple sum. And basically one I mean another way of seeing it is that if you have this line right here and you have this value of let's say 200 and you want to oops one second and you want to know um how how many sales you will get if you make if if you if you publish 200 TV ads you can basically just come up here tuck tuck and then you come here and then you say all right so with 16 ads we're going to have 16 sales and that's basically basically what is going on. So that's that's basically the intuition and the mathematics behind linear regression. And this is very important in order to understand what is going on with logistic regression because as you can see here the answer from this model was basically a 16. And I mean that's a numerical value. The problem is that with logistic regression we want a yes or no answer. So what do we do? And this is where logistic regression comes um into play and why it's important. So let's say that you have let me just make myself a little bit up um I'll just put myself make myself a little bit smaller here. There you go. Um all right. So let's say that you have um all of your data points and then just as in the linear regression example, you would have your you would have your your scatter plot. The problem here is that all the values that you have are zeros and ones. So basically all of your yes are going to be scattered here and all of your no answers are going to be scattered here. Now um ideally I mean of course you you could technically uh build a machine learning a linear regression for this and it would probably look something like this. that might make some sense and you would be like all right so kind of makes sense I mean it doesn't necessarily fit um as well the line as this one but what you can say is that all right so starting from 0.2 2 or 0.5 or something like that, we're going to predict that that's a yes, a one. And if the model comes below 0.5, we're going to predict that it's zero. And I mean that could work in some cases, but here it's just very very hard to do that. So what we do in logistic regression is that instead of fitting it to a straight line, we fit it to a line that looks a little bit like this. this and this squiggle is basically everything behind the logistic regression. We're going to build a function that that draws this squiggle right here. And then if the value of our prediction falls over 0.5, we're going to predict that it has a value of one. If it goes if it falls below 0.5, we're going to predict that it has a value of zero. That's basically how logistic regression works in a nutshell. Now on the algebraic side, how it's represented with an equation here. Remember that you had the formula for your straight line and the formula for your mathematical I mean for your logistic regression is basically just the same thing but we're using this logic formula in which this I mean this coefficients right here come all come here as the power of e and then here as well. Uh what this makes is that this value will always be between zero and one which is exactly what you want because you want to be able to make this divide between 0.5 and below 0.5. So there you go. That is basically the intuition behind logistic regression. Uh once you have this, we can now get right into the code and start building our model. Okay, so let's get right into it. All right. So, welcome back. Uh, we're going to be working with this data set. All right. In order to build your model, in order to train it, you're going to need some data so that it knows what kind of input um should return a certain output. Okay. So basically what we're going to be working here is a project in which we have several um several measurements of cells. Um this is a data set that is public publicly available. I will put the link in the description if you want to download it. But the idea is that you have this this um huge amount of huge number of measurements from different cells and you also have for each one of these observations you have whether or not it was malignant. Um yeah whether or not of not or not it was malignant. So basically you have this yes or no um set of questions that you're going to use to build your machine learning model which is the logistic regression model. Okay. Um we're going to be I mean in order to use this you just have to download it. I am going to put it inside here my my Jupyter lab but you can use Jupyter notebooks or whichever um thing that you use in order to to build um to do data science projects. Okay. You can use Jupyter notebooks or whatever. Um, and this is how the data set looks like. Basically, you have several ids which is just every single measurement. And then you have the diagnosis which is whether or not if it was malignant. So here M means malignant and B means benign for each cell. And then you have all of the measurements about the cell. You have area mean, smoothness mean, perimeter mean. I mean these are just very complex um sets of uh measurements. I don't intend to understand um each one of them. But the important part is that they're not extremely important. The what is important is that they're numbers and that they play a role into the diagnosis into the end diagnosis. And this is basically what we're going to be using to train our model. Okay. So let me show you now how this works. Um, in order to start, we're going to be using uh pandas in order to read our our data set. This is a a Python library that allows us to you to read data sets and to manipulate them. And we're going to also import Seabor, which is a plotting library. Okay, there you go. Now, of course, you will need to install them first. If you don't have to installed, you can do just pip install or cond install. Um, all right. So, let's now load the data. So, in order to load the data, I'm just going to call it data. And I use PD read CSV like this. And then I'm just going to um write the location of my data set, which as you saw, it's right here. Breast cancer breast cancer.csv. And then in order to see the top part of the data set, I just do data. head like that. Um, let me just zoom a little bit more. That way it's easier for you to see. All right, there you go. Um, so there you go. Um, here I mean head the head command just shows us the top part of our data set which is very convenient. It's basically the same as we saw right here but just the upper part. Okay, so that kind of allows us to start to get an idea of what this data set looks like. Um, what we can do now is also get the information about the data. What I like to do is do data.info and then it pretty much gives us a list of all the variables. In data science, we usually call variables every single column. So it gives us data of all the variables that we're going to have. Um so as you as we saw before we have the texture mean, area mean etc. All these measurements that are already in float 64 format. Um also this comes with um with a notebook that's on kegle with all of this explained. If you want to go through the notebook uh the link is in the description. Um all right. So now that we have this information, we might I mean it it's usually a good idea to understand what your data is, but here it's a little bit um technical. So I don't really know exactly what each um concavity worst, compactness worst, what each one of these things mean. But I mean usually you can use describe to kind of get a feel of what your your data of the ranges of your data. You get like the mean, the count, the minimum value, the maximum value and then your qu your quantiles etc for each of your variables which is very convenient if you if you kind of understand your variables. Now the next step after kind of familiarizing ourselves with the data a little bit is to clean the data. The first part of cleaning the data is dealing with nans. Um in every data set that you're going to encounter in the wild, you're going to have some some fields that are actually just empty or that no one bothered to to put information in or that are like damaged or something. So you're going to have to deal with that. A good technique to deal with NAS is to plot a heat map of it like this and you use SNS which is Seaborn as you saw up here and basically it's going to tell us if you pass the data that you have and you pass is null like this basically it's going to give you a a heat map of with zeros wherever there is data and a one wherever there is a N8. So here you can see that you have all one entire column which is unnamed 32 that is completely empty. It's full of Ns. You can also see it here in the in the Nope, not here. Well, yeah, here too. You can see that everything is an N, but also here in the head. You can see that all of it is NA. Um, you can see it here as well. Basically, yeah, everything is NAS. So, what we're going to probably do with this one is we're going to drop it. And to drop a column, what we do is to data. And then inside here, you pass a list, a Python list or an array if you're coming from a different language um of all the columns that you want to drop. And indeed, we're going to want to drop this one right here on name 32. I'm just going to copy it and paste it like this in order to not make any typos. But also I want to drop this other column that we don't really need because it doesn't give us any information about the data which is the ID. We only care about the measurements and the ID just an integer like to rec to identify the observation. We don't need it for for this for this. So in the list I'm going to include ID as well like this. And then in order for this to be in place, I mean to modify the actual variable that I'm passing in, I'm going to have to specify in place equals true. Oop like this. And then if I see my data head again, you can see um that there was a problem. And the problem was that I didn't specify which axis I was dealing I was going to drop. Um so the drop command basically just deletes data from your from your data set um from your data frame. But here I didn't tell it what a name 32 and ID is. So it doesn't know if it's a row or a column. So I basically have to specify that it's a column. One is for columns and zero is for rows. So I'm going to go with one for columns. And now you see that my data set looks a lot more cleaner. Um we don't have this empty um variable anymore. And we also don't have the ID. And that's basically all that we need in order to build our model. So now let's just convert this one into ones and zeros because remember that's pretty much what we need for our for our model to work. So basically what I'm going to do is I'm going to convert data diagnosis which is this variable right here and I'm going to convert it into I'm going to convert it into a one if its value equals m which is malignant and else it's going to be zero like that and then we just pass in for the for loop which is for value in data diagnosis. There you go. Um this is basically just a oneliner for for a for loop. Basically this is just the same as doing for value in data uh return one if value equals etc. Okay. So this basically is just a oneliner for that. And now that we run it, you can see that if we do a head again, you can see that the diagnosis is now one one for every M. And you know what? I'm going to show you real quick how this looks in a very simple plot. So another way to I mean so far we have been using data dot and the name of our variable, but we can also use the bracket notation. You just add brackets after the data frame that you have and then you just call the the name of your variable. And here it is diagnosis like this. And what we're going to do is we're going to turn it into a category type because so far it's still an integer type because we just say that it's going to be one or zero and that's an integer. So we're just going to say tell Python that this is not only an integer, it's supposed to be a category. Okay. So let's do that data diagnosis like that. Then we're going to say that it's equals to as type and we're going to say it equals to category and then copy false like this. There you go. And now let's just look at it. So, we do data diagnosis in order to build a plot. And then we're just going to do a value counts like this. And then we're just going to plot it. Value counts basically just um returns um a table with the count the value counts for ones and zeros. So, it's going to tell us so ones are this and zeros are this. Let me show you real quick what happens if I don't use um if I don't convert it into a category type. Let's see how this works. So going to do kind equals bar in order to make a bar chart right here. And let's run it. Um well it actually kind of works without converting it into a category type. Um let's see what's going on. data.info. Um, yeah, I mean it's an integer, but yeah, I guess we can leave it like this. I'm used to I'm used to converting it to to a category type before and it works as well. So yeah, I mean basically you have zeros and ones in your in your variable. Um, and this is exactly what you want now that you have your data set that is clean and you have all of your diagnosis distributed in I mean like uh divided into ones for malignant and zero for benign and also you have all of your variables that are going to be useful for you and you don't have any useless variables. You have your data set that is clean and you have it ready for training. Okay, now you're going to start to train your logistic regression model in order to perform some predictions uh based on the data that we're going to input on it. Um so basically um we're going to start dividing the data set into the training part. I mean yeah before doing that actually we're going to start dividing it into the predictors and the target. In data science, the predictors are all of this all of the input values that you're going to include. So here it's radius mean etc all the way till the end. And your target value is going to be the variable that you want to predict the value of which in this case is diagnosis. And in this case the diagnosis um sorry data diagnosis. Let me show you that it now looks like zeros and ones. So yeah, this is the one that's going to be your target variable. In order to do that, we're going to let me just divide into target variable and predict and predictors predictors like that. Okay, there you go. So the target variable is going to be y just to keep it consistent with the mathematical notation. And in our data frame we're going to call diagnosis. Basically this is our target variable and then we're going to add the x variable which is going to be everything except for diagnosis. Okay. So basically actually we have to drop we have everything in data except for diagnosis. So we're going to do data and then we're going to do pretty much the same as we did before but we're going to drop diagnosis like this. And as you can see right here we are not using oh we are not using in place anymore. That is because we don't want the variable data to be modified. We only want X to take the value of whatever it is that is data without the diagnosis variable. So there you go. Now if we run this, we're going to have our both variables. So Y is our ones and zeros and X is all the other variables. So good. So now that we have our X um and Y variables, what we probably want to do is to normalize the data. Why do we want to normalize the data? That's a good question. Um the answer to that is that because all of your data, I mean the predictors that you're going to put into your into your model, they they're basically going to have different units because I mean they're measurements of different things. Um in this case for example you have that some of these are around 20 2011 etc and some are like 122 135 and some are very very close to zero like 0.3 0.08 08. So basically, how would your model know which I mean if you just fit it like this, it's not going to know that this ones are not super important or not much more important in compar compared to these other ones, etc. So basically, you just want to have like a uniform unit for all of your all of your variables. And that's basically what normalization does. It makes all of the units um normalized. And in order to do this, I mean, just to give you another example, um, how this works is that, for example, let's say that you're building a model for a bank. Um, you're building an application for a for a bank and it's supposed to be able to predict whether or not a client should get a credit card or a loan. So, one of your variables is going to be the age of the client, probably. Another of your variables is going to be the balance in their bank account. and the balance in their bank account is probably going to be in thousands or in tens of thousands or in hundreds of thousands or millions and their their their age is going to be in years. So, it's going to be like less than 100. Um, so how are you going to put this into your model? Basically, you're going to have to normalize it so that both variables are like relatively um as close to zero as the other one so that you can fit them to your model to build the the mathematical expression that you want to find. And in order to do this, actually scikitlearn, which is the library that we're going to be using for machine learning, um comes with a very very convenient method. Uh you just have to import it. So we're going to use scikitlearn import from scikitlearn pre-processing and we're going to import standard scaler like this. There you go. And now the first thing that we're going to want to do is we're going to create a scalar object. And we're going to do scalar equals standard scaler. And then once that we have created our object, um important thing, don't forget the parenthesis. This means that you're initializing the object. Um of course, if you're a software developer, you know that that means that you're initializing an instance of this class. And then we're going to fit the scaler. the scalar to the data and then just transform the data. So in order to do this, we're going to declare a new variable that is going to be called I mean a new Python variable. It's not a variable from our data set and it's going to be xcaled and we're going to do scalar.fit fit transform and then we're going to pass it the actual values from our predictors which is x. And there you go. Now x is scaled and now we can actually take a look at how it looks like. Um nope. It's not um ready yet because I made a typo here. It's supposed to be standard scaler. And there you go. Um sorry I returned x not x scaled. See now we do you know what we're just going to run this right here going to show you X of course looks like this and X scaled looks like this as you can see all the variables are relatively close to zero which is exactly what we wanted from our data set from our predictors in order to train our data. So there you go that was how to normalize the data. Uh, let me just add a very nice neat title right here. Um, like this. And we're going to do There you go. And now, right here, what we're going to do is we're going to split the data. Slipped. Nope. Split the data. And as I mentioned before, we want to split the data into a training part of the set and and another one to um yeah, a training part of the set and the testing part of the set so that we can train the model on the training part. And then we can test it on data that it has never seen before. So that we know that it actually is able to predict based on new data. And this is very convenient as well because scikitlearn as well comes with a very convenient um method that allows you to to to do this. So you do from sklearn dot model selection you're going to import train test split. And this right here is very important because what you want to do now is like the video and subscribe because that helps me a lot and I would be very thankful. And then after that, what you want to do is just split your data. You're going to be using this function right here. And this function basically returns four different values. The first one we're going to call it X train. The second one is going to be X test. The third one is going to be y train and the fourth one is going to be wine test. And basically I mean you're just going to declare them like this because since it's going to return four values in Python you can just return all of them um like this and just assign them with with commas separated by commas and it's going to go right into uh where it needs to go. So now you do train test split. Oops. Split like this. And then inside of here, you're going to say your predictors first. And remember that we're going to use our scaled version of the predictors. Then we're going to use the target value, which is the column with ones and zeros. And then we're going to tell it which what is the number I mean what is the proportion that we want for the testing the testing um data set. Um this one right here is going to be called test size. Um so the test size let's say that we want 0 30 that is 30% of all of our observations we want to be um put them into a testing I mean so of all of our data we're going to take 30% of it like randomly and we're going to put it away into a X test and X train um side and then we're going to test our model on those okay and it's going they're going to be selected randomly so that like our model is sure to be working. And then the last argument that you want to pass in here is the random state. Basically, this is just an arbitrary number that you can pass. And it basically means that if you ever need to repeat this exact split, because remember that this is a random split. So, if you ever want to repeat this split split, um you're going to have to pass in the same the exact same number um in order to get the exact same um like division of training and testing. And this is just what whichever number you want. And of course, uh you usually choose 42 because that's the answer to everything. Um so now now what we're going to do is we are going to actually train train the model and also to train the model of course we're going to be using scikitlearn. So we're going to use sklearn dot linear model linear model and we're going to import logistic regression like that. And then we're going to create the logistic regression model like this. We're going to call it LR for logistic regression. And we're going to do logistic regression that we just imported right here. Uh don't forget the parenthesis that is very important. I mean if you're a software developer, you know that this is um this is just used to initialize a class uh an instance of a of this class. Um but just remember that this is I mean it's important to add the parenthesis. And then what we're going to do now is we're going to train the model on the training data. And to do that, we're going to also call our model that we just created, which is LR. And we're going to call this method that is fit. And then we're basically just going to pass in the data that we have from X train. And also we're going to pass the data from Y train. This is of course the predictors and this is the the target values. And then we're basically going to predict the target variable oops variable based on the test data. I mean on the yeah on your test data on test data. And this is very simple. You're going to basically create a new variable that is going to be called Y predictions let's say or Y bread to be short and this one is going to be linear regression dot predict as you can see we're using exactly the same model that we already trained so LR starting from here LR is already trained with our data and right here what we want to do is we're going to predict based on our X testing data. And this one, if it is equal to Y um test, that means that our test was perfect. Now, our predictions were just perfect because every single Y bread is going to be equal to Y test. Uh let me show you how this works. I'm just going to run this. I'm going to run I'm going to show you what Y bread looks like. So, it looks just like zeros and ones. That's exactly what we wanted because every zero means that um means that the prediction was that it's not malignant and every one means that the prediction for that observation was uh malignant. And then if you want to check how y test looks like, it's basically also a list of of z I mean this one is a series, this one is um a list, but I mean they're basically the same when we're going to compare them. So I mean you can start to see that I mean the predictions seem to be pretty close. So the first one is zero, the second one is one, third one is one, then zero. So our predictions seem to be doing quite all right. So let's see what we can do now. Now that we have our predictions from our testing model, we can actually start testing um I mean evaluating our model to see how well we did. Okay. So I'm going to write this down right here and I'm going to do um like this. Let's do evaluation of the model. There we go. And now right here we're going to also from scikitlearn import an evaluation uh method that we can use. So let's do sklearn dot this time we import it from metrics and import accuracy score and that means that the accuracy is going to be equal to accuracy accuracy score of then we have the first one is going to be the actual values and then the second one is going to be the predicted values And it's basically going to return us the accuracy of our I mean a percentage of the accuracy of our of our predictions. So let's do this. We're going to do the accuracy was we're going to say accuracy and let's round it to two decimals. How about that? And I suppose this should work. So let's see. There you go. So now we have that the accuracy for our model was 98%. Which is basically amazing. Um we did a pretty good job here. Uh the evaluation looks pretty good. And now let's get into I mean let's do a little bit more of an evaluation. Let's check the precision um the precision, the recall, the F1 score, which are some I mean if you want to go more deeply into the statist statistical side of it, this might be useful for you. So also from scikitlearn sklearnmetric oops metrics we're going to import a classification classification report. Then we're going to print it. We're going to say classification report. And we're going to call as well the Y test and then our predictions to see how well we did. And here you have it as well. You have the prediction, you have the recall, you have the F1 score, you have the support. Um so I mean this basically the predict the precision for negative um um diagnosis which is like when it's not when it's benign. So when it's not malignant was 99%. And for the malignant ones was 97% which is pretty good. We have the recall F1 score and support. Um so I mean that's pretty much all the evaluation that we're going to do on this tutorial but just so you know I mean I mean it you you can you probably need to to go through some of the steps to actually evaluate it more thoroughly. But this should be good enough for starters. Um, in conclusion, you already have a model uh right here. And this one right here, this LR is the model that you can use to predict um what your what whether or not your cell that you're going to be measuring is actually a malignant cell or a benign cell. And this is extremely useful because now that you have a model inside a v a python variable um I mean you built it in python. So you can basically just put it inside a server in flask for example or whichever other backend technology that you can use in Python and just build a server around it. You can build a REST API around it and well probably not a REST API but you can build an API around this model and then basically just make your users um feed some data to your front- end application. Then you send that data to your backend application. You perform the prediction. Then you return um the value of your prediction. So this can be useful for doctors in the sense or you can basically probably plug it to some automated automated machine that is going to be able to measure the cells and give a diagnosis immediately to some patients. And of course this logistic regression method you can also use it in finance to predict to make predictions about whether or not um a client is going to default about a loan or if they should get or not a credit card um etc. So since it's clearly callable remember that you can use it pretty much anywhere. Um what else can we say? Um yeah, I guess that's pretty much it. Don't let don't forget to let me know if you have any questions. Uh please like and subscribe if you liked it. And yeah, um this is a series in which I kind of show you how data science and some machine learning models work in order so that you are able to use them in your backend applications. If you're a software developer, if you're a data scientist, um beginner, this is I mean, of course, also useful to you to learn how to use these models and what is the intuition behind them. And yeah, I hope this was useful for you and I hope you enjoyed it and just you can subscribe in order to get all of the other tutorials on data science for software development. So, cheers. [Music] [Music] [Music]

Original Description

In this tutorial, we will walk you through a hands-on project using logistic regression for breast cancer prediction. We will be using a breast cancer dataset to build a logistic regression model that accurately predicts if a cancer is malignant or not based on certain measurements. This tutorial is perfect for beginners in machine learning and data science who want to learn how to build a logistic regression model from scratch using Python and the Scikit Learn library. ----------------- LINKS 👉 Here is all the code in the tutorial: https://www.kaggle.com/code/alexandreao/logistic-regression-on-breast-cancer-dataset 👉 Here is the dataset: https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data 💬 Join the Discord Help Server - https://link.alejandro-ao.com/HrFKZn ❤️ Buy me a coffee... or a beer (thanks): https://link.alejandro-ao.com/l83gNq ✉️ Join the mail list: https://link.alejandro-ao.com/AIIguB ---------------- In this video tutorial, you will learn about binary logistic regression, logistic regression models, and how to build one for a data science project. You will also get an example of a data science project that will help you understand the process of how these models work mathematically. We will be using a Jupyter Notebook for our coding exercises, and we'll provide you with all the necessary code and explanations to help you follow along. Whether you're a data science beginner or looking for ideas for data science projects, this tutorial will give you a comprehensive overview of logistic regression and how to apply it to a real-life problem. By the end of this video, you will have a solid understanding of logistic regression and be able to apply it to your own data science projects. Keywords: logistic regression, machine learning, python, logistic regression machine learning, logistic regression model, binary logistic regression, logistic regression example, data science project, data science project from scratch, data science project i

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Alejandro AO · Alejandro AO · 10 of 60

← Previous Next →

Linear Regression in R - Full Project for Beginners

Linear Regression in R - Full Project for Beginners

Configure Webpack 5 in Wordpress (2025) with Typescript and SASS

Configure Webpack 5 in Wordpress (2025) with Typescript and SASS

R Programming 101 - Crash Course for beginners

R Programming 101 - Crash Course for beginners

Convert HTML template to WordPress Theme (2025) - Full Course

Convert HTML template to WordPress Theme (2025) - Full Course

Javascript Interactive Map with Leaflet EASY (with Marker Clusters & Popups)

Javascript Interactive Map with Leaflet EASY (with Marker Clusters & Popups)

Vanilla JS Project: Multi Step form in HTML, CSS & OOP Javascript

Vanilla JS Project: Multi Step form in HTML, CSS & OOP Javascript

How to do AJAX in WordPress correctly (2025)

How to do AJAX in WordPress correctly (2025)

React Leaflet Tutorial for Beginners (2025)

React Leaflet Tutorial for Beginners (2025)

Linear Regression in Python - Full Project for Beginners

Linear Regression in Python - Full Project for Beginners

Logistic Regression Project: Cancer Prediction with Python

Logistic Regression Project: Cancer Prediction with Python

Display Equations in ChatGPT

Display Equations in ChatGPT

Create a Chrome Extension (Manifest V3) for ChatGPT

Create a Chrome Extension (Manifest V3) for ChatGPT

Full-Stack Project | ChatGPT API, React, Node.js, Express

Full-Stack Project | ChatGPT API, React, Node.js, Express

Streamlit Python Course: Build a Machine Learning App to Predict Cancer

Streamlit Python Course: Build a Machine Learning App to Predict Cancer

Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python

Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python

LangChain Memory Tutorial | Building a ChatGPT Clone in Python

LangChain Memory Tutorial | Building a ChatGPT Clone in Python

Chat with a CSV | LangChain Agents Tutorial (Beginners)

Chat with a CSV | LangChain Agents Tutorial (Beginners)

Create a ChatGPT clone using Streamlit and LangChain

Create a ChatGPT clone using Streamlit and LangChain

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

Full Python Environment Setup for AI (or other) Apps + Virtual Environments

Full Python Environment Setup for AI (or other) Apps + Virtual Environments

Langchain + Qdrant Cloud | Pinecone FREE Alternative (20GB) | Tutorial

Langchain + Qdrant Cloud | Pinecone FREE Alternative (20GB) | Tutorial

LangChain Version 0.1 Explained | New Features & Changes

LangChain Version 0.1 Explained | New Features & Changes

Create a RAG Chain using LangChain 0.1 (New version)

Create a RAG Chain using LangChain 0.1 (New version)

Tutorial | Chat with any Website using Python and Langchain (LATEST VERSION)

Tutorial | Chat with any Website using Python and Langchain (LATEST VERSION)

Deploy Your AI Streamlit App for FREE | Step-by-Step (Heroku Alternative)

Deploy Your AI Streamlit App for FREE | Step-by-Step (Heroku Alternative)

What is Google's Gemini 1.5 Pro | 10 Million Token Window

What is Google's Gemini 1.5 Pro | 10 Million Token Window

Chat with MySQL Database with Python | LangChain Tutorial

Chat with MySQL Database with Python | LangChain Tutorial

Stream LLMs with LangChain + Streamlit | Tutorial

Stream LLMs with LangChain + Streamlit | Tutorial

Chat with MySQL Database using GPT-4 and Mistral AI | Python GUI App

Chat with MySQL Database using GPT-4 and Mistral AI | Python GUI App

#1 Harrison Chase: LangChain and The Future of LLM Applications | Alejandro AO

#1 Harrison Chase: LangChain and The Future of LLM Applications | Alejandro AO

CrewAI Step-by-Step | Complete Course for Beginners

CrewAI Step-by-Step | Complete Course for Beginners

Python: Automating a Marketing Team with AI Agents | Planning and Implementing CrewAI

Python: Automating a Marketing Team with AI Agents | Planning and Implementing CrewAI

Build a Web App (GUI) for your CrewAI Automation (Easy with Python)

Build a Web App (GUI) for your CrewAI Automation (Easy with Python)

Early days of RAG and LlamaIndex - Jerry Liu

Early days of RAG and LlamaIndex - Jerry Liu

LlamaParse: Convert PDF (with tables) to Markdown

LlamaParse: Convert PDF (with tables) to Markdown

#2 Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers

#2 Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers

CrewAI + Exa: Generate a Newsletter with Research Agents (Part 1)

CrewAI + Exa: Generate a Newsletter with Research Agents (Part 1)

#3 Joe Moura | Multi Agent Systems and CrewAI

#3 Joe Moura | Multi Agent Systems and CrewAI

Python: Create a ReAct Agent from Scratch

Python: Create a ReAct Agent from Scratch

New Groq Models: Best for Function-Calling Agents

New Groq Models: Best for Function-Calling Agents

Introduction to LlamaIndex with Python (2025)

Introduction to LlamaIndex with Python (2025)

LlamaIndex: How to use LLMs

LlamaIndex: How to use LLMs

LlamaIndex: How to Get Structured Data from LLMs

LlamaIndex: How to Get Structured Data from LLMs

Multimodal RAG: Chat with PDFs (Images & Tables) [2025]

Multimodal RAG: Chat with PDFs (Images & Tables) [2025]

Advanced RAG with LlamaIndex - Metadata Extraction [2025]

Advanced RAG with LlamaIndex - Metadata Extraction [2025]

Learn MCP Servers with Python (EASY)

Learn MCP Servers with Python (EASY)

Create MCP Clients in JavaScript - Tutorial

Create MCP Clients in JavaScript - Tutorial

Create an MCP Client in Python - FastAPI Tutorial

Create an MCP Client in Python - FastAPI Tutorial

How to Build an MCP Client GUI with Streamlit and FastAPI

How to Build an MCP Client GUI with Streamlit and FastAPI

Vibe Coding For Engineers (make it ACTUALLY work)

Vibe Coding For Engineers (make it ACTUALLY work)

LlamaExtract Tutorial: Convert PDF & Images into JSON

LlamaExtract Tutorial: Convert PDF & Images into JSON

Local MCP Servers for Cursor (Step by step)

Local MCP Servers for Cursor (Step by step)

Anthropic: How to Build Multi Agent Systems

Anthropic: How to Build Multi Agent Systems

Deploy Remote MCP Servers in Python (Step by Step)

Deploy Remote MCP Servers in Python (Step by Step)

GPT-5 for Developers: API Changes, Pricing, Model Router & Security

GPT-5 for Developers: API Changes, Pricing, Model Router & Security

Tutorial: Auth for Remote MCP Servers (Step by Step) | OAuth 2.1 with ScaleKit

Tutorial: Auth for Remote MCP Servers (Step by Step) | OAuth 2.1 with ScaleKit

Generate UI Tests with TestSprite MCP Server + TRAE

Generate UI Tests with TestSprite MCP Server + TRAE

#4 Allan Guo | 19-yo YC Founder - Willow Voice

#4 Allan Guo | 19-yo YC Founder - Willow Voice

RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB

RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB

MCP Security | Malicious MCP Servers (Protect Yourself)

MCP Security | Malicious MCP Servers (Protect Yourself)

This tutorial teaches logistic regression for breast cancer prediction using Python, covering data preprocessing, model training, and evaluation. The goal is to build a model that accurately predicts whether a cancer is malignant or not based on certain measurements.

Key Takeaways

Load the dataset using pandas
Preprocess the data by dropping unnecessary columns and converting the diagnosis column into ones and zeros
Split the data into training and testing sets
Normalize the data using StandardScaler
Train a logistic regression model using scikit-learn's LogisticRegression class
Evaluate model performance using accuracy score, precision, recall, and F1 score

💡 Logistic regression is a powerful tool for binary classification problems, and can be used to build accurate models for predicting outcomes such as breast cancer diagnosis.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Supervised Learning

View skill →

Auto Machine Learning (AutoML) Using AutoGluon

Auto Machine Learning (AutoML) Using AutoGluon

Coding the SARIMA Model : Time Series Talk

Coding the SARIMA Model : Time Series Talk

Code With Me : Logistic Regression (from scratch) !

Code With Me : Logistic Regression (from scratch) !

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Machine Learning Tutorial Python - 8 Logistic Regression (Multiclass Classification)

Predicting the Winning Team with Machine Learning

Predicting the Winning Team with Machine Learning

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Related AI Lessons

The Python Dictionary Trick That Makes Interviewers Smile

Learn the Python dictionary trick that impresses interviewers and improves your coding skills

Dev.to · Ameer Abdullah

I Compared 50 Python Courses. Here Are My Top 5 Recommendations for 2026

Discover the top 5 Python courses for 2026, curated from a comparison of 50 courses, to enhance your programming skills and career prospects

Medium · Python

Machine learning for beginners #5

Learn the basics of machine learning through the analysis of self-driving cars and understand how ML is applied in real-world scenarios

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB