Machine Learning With Python Full Course 2026 | Python Machine Learning For Beginners | Simplilearn

Simplilearn · Beginner ·📐 ML Fundamentals ·2mo ago

Key Takeaways

This video teaches machine learning fundamentals using Python, covering skills and concepts necessary for beginners to get started with machine learning.

Full Transcript

Hey everyone, welcome to this course on machine learning using Python. Today machine learning is becoming a part of almost everything around us from movie recommendations, fraud detection to price prediction and smart assistance. Machine learning help systems learn from data and make better decisions. And that is exactly why this skill has become so important. Now it's no longer just for researchers or data scientist. It is now one of the most useful and practical skills for anyone who wants to work with data and intelligent systems. So in this course you will understand how machine learning works using Python which is one of the most popular language for building machine learning models. So we will start with the core concepts then we'll move into regression classification also understanding how to evaluate and improve models properly. So if you want to build a strong foundation in machine learning in a very simple and practical way, this course is a great place to begin with. So let's talk about what we have covered in today's video. First, we'll understand what machine learning is, how it learns from data, and how the basic workflow works using features, labels, training data, and test data. Next, we'll look at some important concepts like overfitting, underfitting, so you can understand why some models perform so well and others do not. Then we'll move into regression where you will be learning how models predict continuous values using techniques like linear regression, lasso ridge and polomial regression. And after that you will explore classification where you will understand how models predict categories using methods like logistic regression, KN&N decision trees and SVM. We'll also cover model evaluation using metrics like accuracy, confusion matrix and mean squad error. And finally, you will see how to improve model performance using grid search, cross validation, and pipelines. Also, if you are interested in mastering the future of technology, then the professional certificate course in generative AI and machine learning is the perfect opportunity for you. This is offered in collaboration with the ENIT Academy IT Kpur and it's an 11-month live interactive program providing you hands-on expertise in cutting edge areas like generative AI, machine learning, AI tools like chat GPT, DIU and hugging face. You'll also be gaining practical experience through 15 plus projects, integrated labs and life master classes delivered by esteemed IT Kpur faculty. Alongside earning a prestigious certificate from IIT Kpur, you'll be receiving official Microsoft badges for Azure AI courses and career support through Simply Learn's job assist program. So what are you waiting for? Hurry up and enroll now. The course link is mentioned below. Now before getting started, here's a quick quiz question for you. Which type of machine learning problem is used to predict a continuous value like house price? Your options are classification, regression, clustering, or filtering? Let me know your answers in the comment section below. So without any further ado, let's get started. >> What machine learning is, which is a subset of artificial intelligence, right? That's uh basically um machines learning from data in order to uh make decisions essentially. Um so this was a big departure from the rules-based systems at the time, right, that were explicitly programmed to make decisions. So just think of an example like a really big kind of if this then that then that then that and and else if this this this right so bunch of rules that had to be pre-programmed in order to um come out with some final answer. Uh with machine learning it's the exact opposite of that. we're actually training something from examples from existing data um in order to predict something or um make some type of decision. Uh and so we're going to learn about the various ways we can do machine learning. But if you guys remember we um talked about some of this like the differences and the uh basically rules-based approaches to learning from data approach. Um and in included in that is going to be uh complex unstructured data. So things like images, text, audio. What handles those really well is uh deep learning which we will get to in the course after this. But uh those are certainly in there as learning from data even complex data. So we had this picture uh and I think this is kind of around where we left off last time was uh just distinguishing between those three terms. We see artificial intelligence, deep learning and machine learning kind of used interchangeably, but this is really how they fit in. Artificial intelligence is kind of a broad anything mimicking human intelligence. Um which doesn't have to be learning from data but uh machine learning is part of that. And then um one way to accomplish machine learning is to use neural nets which is the focus of uh deep learning. Um and so deep learning has been has found a lot of success especially recently with uh those complex data types like images, speech, text, right? So deep learning used all over the place. Even in um modern like generative AI, we see deep learning used quite a bit. Um it really anything that's using neural nets is uh going to be deep learning. Um and again we'll focus on that later but we're going to be mainly focused on machine learning for this course. Primarily machine learning that does not use neural networks. Okay. So just models that are not necessarily neural networks be our focus. So in machine learning we had an example of a game uh essentially um learning what decisions to make uh based on the uh kind of current um state of the board. This could be a um you know machine learning example that uh learns from many previous examples. So a lot of data around these games are used to train these um kind of robots that can play these games and play them at a very high level. Um so there's been a lot of successes actually in machine learning and deep learning um around uh playing games like chess or go um using machine learning algorithms. So pretty cool. All right. So I think this is where we ended. We last time we said there's a bunch of different use cases for machine learning. So um recommendation system is going to be a big one and we will actually study that uh in one of our final lessons of this course. Um chat bots like generative AI doing sentiment analysis chat bots we'll study later but those are certainly an application of learning from data in order to uh generate responses to text prompts right. Um spam filtering that's a good example like classifying an email as spam or not spam. Um that that gets trained from examples and uh learning from data such as previous emails. Um social media posts analysis is another kind of text data um use case but you uh can do a lot with that text like you can predict the sentiment um you can predict uh the category of what what the post is talking about um those kind of things all can be done with machine learning >> and many other use cases not on this list that we will uh cover >> you know as we as we go further. Okay, so this is where we kind of left off. Um, so what's doing all the hard work here? >> Uh, machine learning algorithms. So these are things that will um these are things that will learn from the data. So they are uh they they are basically um algorithms or sets of rules that uh or mathematical rules I should say not formal rules like in the in the sense of a rule system but mathematical um formulas and mathematical uh rules essentially that help us learn from the data. So they correlate the data to some type of outcome. So some type of prediction uh whether that's going to be as we will see whether that could be like a number like we're predicting a price or demand or sales um or it could be a category like is this transaction fraud or not fraud or what's the probability that this is fraud um so we have different kinds of predictions we can make with machine learning um but uh we will study the kind of the differences of those coming up. Um, but machine learning algorithms are really what power they're kind of the models, right? They're the models that help power uh machine learning to actually learn from data. So, we're going to spend a lot of time in this course studying those algorithms like the different models that we can build and what their differences are, what their strengths are, what their weaknesses are. We'll we'll learn a lot about those. Okay. So I guess you can imagine like everything is so data dependent, right? Um we're learning from data. So uh it makes sense that the quality of data really really matters here in determining how strong the model can be. Um so you see this graph here charting kind of the um high quality data um versus just uh any old data but a decent enough quantity of it. Um you can see that performance and the performance is measured by some evaluation metric. Um, so think of it as uh something like an accuracy. Like if we were predicting fraud or not fraud, how accurate can our model get at actually detecting fraud um it gets better and better and better the graph shows that the higher quality of data that we have. So there's kind of that there's a there's a saying in machine learning um called garbage in garbage out. What that means is if you have poor data, even the best model in the world, poor data is not going to result in having a good model that can be accurate and perform well. Um, so it needs to be high quality, meaning um there needs to be a decent amount of it and it needs to be labeled appropriately as we will will talk about um and it needs to not have any, you know, significant outliers. it needs to be clean, not have those missing values, all of those things. Um, you can you have a good chance at deriving good predictions from higher quality data as this kind of shows. Okay. So, one thing we're going to learn um as we go along is quantity matters as well. So, not only quality, but a decent amount of it. And um we're going to learn those kind of rules of thumb like how much data do I need for certain algorithms. Um one thing that we will see is that uh the the basic machine learning models that we'll study don't need as much as a neural network would. It you know neural networks are going to require a lot more um than a basic machine learning model learning model. So uh that's something we will see as we go along. But uh this is something we'll talk about and discuss with each model that we study is kind of how much data do we actually need to produce a high quality model. Okay, any questions uh so far? Okay, let's talk about the different types of machine learning that we're going to discuss. PIM, there's going to be two primary ones that we will study in this course and then a couple others that'll be a little bit more advanced that we won't get to but worth knowing about. Um, so there's going to be four total that we'll study or talk about and they'll be on this list here, which is um supervised learning and unsupervised learning. Now, I'd say the majority of our focus will probably be on supervised learning, and we'll talk about what that means, but we'll also cover unsupervised learning as well. And so, we'll look at the most popular techniques in each of these types of machine learning. Um, and then we'll talk about these two, but not really study them because they're more advanced topics um that that will be beyond the scope of what we'll do. But uh these are going to be um different styles of machine learning that are going to be characterized by um what kinds of predictions they make, what kind of data they need and require. Um and uh what kind of outcomes they're actually producing. Um, so let's let's get into each of these, but uh the the one that we'll probably spend the majority of our time on is going to be supervised learning, but we will study unsupervised learning as well. We'll study both. And we're going to talk about we're going to define both of those um coming up. And again, these will be a little bit more advanced topics that we won't spend too much time on. Um but but we'll discuss their relevancy in machine learning um and give a good definition to it. Okay. All right. Let's start with supervised learning. Now this is going to be uh a term that really refers to using examples. So using labeled examples. So here we say labeled data to help our model train. In other words, help our model be able to predict guided by specific input output pairs. So supervised really refers to the fact that we have answers. We have examples, we have answers with those and we use that collection of data to build our model off of so that we can predict um those kinds of things like a price, like a category, like a spam not spam in this in this slide. like we would be predicting if this shape is a square, a triangle or a circle. Um but but when we build a model for that, we have data that has an answer attached to it. Right? We've talked about this before a little bit with labels. So there's a guide there that can guide us towards building our model. there's an actual every every example has an answer and that answer is really critical to help build our model off of. So, um that's it's almost like you have um a you have a bunch of exercises in let's say like a math textbook. You have a bunch of exercises and you have the answers and that way you can kind of check your work. you think about model training um that is the really a lot of that process of model training as we are going to discover is um basically checking our work against these answers in our data in our training data. Okay. So supervised learning is any type of machine learning that involves learning from labeled data in order to predict outcomes. Okay. Predict outcomes like now the the outcomes can be numerical. They can be like a price, temperature, demand, sales, revenue. They can be numerical, but they can also be categorical. So they can be like spam, not spam, fraud, not fraud, cancer, not cancer. Um, dog, cat, giraffe, those kind of categories. Um, we could predict those. It's some type of outcome. Okay, some type of outcome. The key is we're using labeled examples to guide our model building. That's why it's called supervised learning. So we know in our data we know what the inputs are. Of course, those are going to be think of the inputs as like all of our columns and then we have a special label column that represents the output we're trying to predict. So if you think about that housing price data, the label could be the price. And that's something we would build a model to predict, but we have answers for all of our examples in our rows. We have answers to help guide our model building. They help tweak our model because we know the answer ahead of time. So they're they're really good examples to build our model off of. Okay. So that's that's supervised learning. Uh in this example is circle not in the prediction because it's not part of the test data even though it's in the labeled data. Um no it just not necessarily. It just means that like we learn against all of these examples that have these answers and then when we observe new examples um we can try to predict what those would be based on what we've seen before. So I if there was a you know it's just a coincidence we only have two two examples in our test data like we could have a circle here in which case we would predict circle that's fine or at least we would hope our model would predict circle right that's what we're hoping may or may not get it right um but it's it's only not there because we only like we're just assuming that we only have two examples we're testing against but in reality we would probably do a lot more than two. It's just it's just a coincidence really. In reality, we would test against a lot more data. And we're actually going to see why we would do that. Like why would we train our model and then kind of use additional data um to to evaluate it? It's actually really important that we do that step to get a sense of how good our model is before we take it out in the real world. So if we apply our model that we build on our label data to um this kind of set of test data that we haven't been exposed to before. It helps give us a sense of how good is our model. So it's tested is usually used for evaluation. So that's something that's something we'll study. How do we train? Uh it depends on the model. Um so training will be a sense uh will be an algorithm that will um basically update the model according to the data these labeled examples. Um every model is going to be different in exactly how it trains. So we're going to we're going to talk about that when we get to the individual models that we'll study. But uh loosely speaking, they're going to use the data to adjust itself. Like imagine adjust like tuning a bunch of knobs. Um, like the best example I can give you is we I think I did this one last week where you have kind of a function that predicts the price and let's say it has um weights like weight one with feature one, weight two with feature two, weight three with feature three. So imagine we had three input features and we we built an answer according to that. Essentially what we would do to train the model is adjust these um in order to get this correct based on our our labeled examples. Okay. So that's something we're going to learn about coming up shortly when we when we actually dive into model. Every model is going to be slightly different in how it trains, but at a high level it's going to use the training data with those examples, right? the labeled examples to help guide the formula essentially to adjust to generate the proper kind of model here. The these things are going to be adjusted according to the data in order to produce the correct output. So think about these as knobs that will turn. Okay. uh which type of machine learning is used? Uh probably supervised um which is what we're talking about now. So probably supervised because most people want to um build some type of model to predict something. Uh so yeah, I'd say I'd say supervise. Yes, we're we are definitely going to learn how to train. Yeah, we'll see. We'll do the code. Um, I'll tell you about how it's done. Yeah, we're definitely going to learn it. But what I was saying is it's kind of on a model by model basis. So, I want to wait till we get into the individual models, then we'll talk about how they're trained. But yeah, we'll we'll learn how to do that. But yeah, supervisor used all over the place. Even even for uh generative models, they use supervised learning because um like an LLM is going to use labeled examples in order to train, right? In order to train how to generate responses according to prompts. Um it needs to learn against a lot of text examples. So that supervised learning is what um results in that model, right? Learning from those labeled examples. Okay. It is yeah image image uh a lot of um yeah a lot of image processing is supervised like object detection. So the yolo model is an object detection model. Yes. Um because it has to be trained right. It has to be trained on uh it has to be trained on images with labels such as this is what object is in this image. This is the box around the object. Um yes. So if if it's if it ever uses label data to train and build the model, it is supervised. So YOLO is definitely supervised and we actually we will we will cover the YOLO model later on in our deep learning course. We talk about object detection. So we'll still we'll study that. But yeah, it's supervised Okay. So on the slide we have some common supervised learning algorithms that are we will study. So all of these we will study and understand what they do and how they work but just giving you some to name them. Linear regression is kind of the one I just drew out which is the um this is the prototypical like easiest to understand model that is kind of the um exactly like this where we have a weight times a feature um a weight times a feature and then a weight times a feature and on and on and on. You can have as many as you want. um that is a linear regression. And so that is um that's a supervised model because we need this value here and we need all of our inputs in order to um actually train this model and generate all those weights um that that is uh that uses um labeled examples to help tune all those knobs. Um same with all these other models. So, we're going to talk about decision trees. We're going to talk about logistic regression and and SVMs, which are support vector machines. We'll talk about all of those, but they're all examples of supervised uh supervised learning. Okay, we'll talk about all of these. They're all supervised because they all require labeled examples in order to train them and and then subsequently use them. Okay. Okay. So what are some use case examples? So for for instance in uh supervised learning we may be predicting temperature based on yearly temperature trends. So we would have that yearly data as our um as our labeled examples and those would supervise the learning of a model that predicts temperature. Um, same thing with predicting crop yield based on um, seasonal crop quality changes. So maybe we have a bunch of features relating to crop quality. We could predict crop yield. Um, we would just need historical examples with those labels, right? What the crop yield is for each time period. Let's say we would just need those uh, supervised examples and we could easily build a model off of it. Um uh this this last one sorting waste based on known waste items and their corresponding waste types. Um that's kind of like spam. It's like filtering basically like a spam filtering. Um so think of it like the the shapes example. We sorting things into squares, circles, triangles. Um, same kind of idea here where we have a bunch of examples on what those um what those waste items should uh should belong to, like what waste bins they would go to, for example. Um, and those could be labeled and therefore then we could um understand what category of waste they belong to. Um, same thing with spam. something is fraud or not fraud, spam or not spam, cancer or not cancer. All of those are going to be supervised learning examples because they're going to require in order to train them, they're going to require data that has those labels. Okay? So, anything that has labels is going to be supervised learning. So, again, this is where we will spend probably the the majority of our time is doing supervised learning problems. ones that we have labeled data. We're building a model and we're going to predict those those uh labels essentially. Okay, before we go to unsupervised, any questions about uh supervised Okay. All right. So, supervised requires labels in order to have an example to go off of to build your model. And that's because you're predicting those kind of outcomes like spam or not spam, cancer not cancer. Now unsupervised learning is completely different. It's the opposite. So unsupervised learning is where we do not use labels whatsoever. So we're not using any labels at all. So it's it c it can be completely unlabeled or even if it's labeled, we're not using labels in any way. But um we primarily would say it's unlabeled data. We have no guidance because we're not using the labels in any way. we have no guidance to um predict anything but that's because we're not really predicting anything in unsupervised learning. Generally what we're doing is looking for some structure or pattern. Okay, with unsupervised learning we're looking for some structure or pattern. So um one type of example that's very very popular is going to be this second one which is um identification identification of user groups based on similarities or commonalities. Now this is going to be a problem basically known as clustering and it's a problem we will study quite a bit. There's going to turn out to be lots of different algorithms that can accomplish clustering. So what clustering attempts to do is basically say um we have data that's like this and then data over here and then data over here. Let's just group these together. So like this should be one group. This should be one group and this should be one group. And we can find those structures and say okay this is group one this is group two and this is group three. One 2 3. And we can basically build what we would call clusters of data um based on how close together the points are kind of located in these kind of cluster zones like these boxes I've drawn. Okay. Now that doesn't require any label to do which is really fascinating. So unsupervised you don't need any label at all to accomplish the algorithm. Um so clustering is one good example. Um finding outliers or anomalies is another. So we don't necessarily have any label of what is an outlier or what is an anomaly. We are deriving that from the features alone. There's no guidance. There's no label um to doing like outlier detection or anomaly detection. Okay. So that's another good example. One that's not listed on here um but is also really important that we will study is something known as dimensionality reduction. So dim reduction and what that what this focuses on is basically compressing the data set a bit. So we take our data and basically compress it. Um so that but we do it in such a way that we retain as much information as we can. It's a very like smart compression and what it does is it lowers the dimension. um dimension. Think of the dimension as like number of columns. Number of columns. So imagine we had 100 columns in a data frame. What we could do is actually reduce that down to 10. So like 10% of that. So we reduce it down to 10. And um but those 10 are it's not like we chopped out um 90 other columns. we um smartly kind of compressed all that information into these 10 new columns um that are compressed versions of the hundred that we used to have. Um so dimensionality reduction is is another unsupervised technique. It requires no guidance, no label to do, but is um a really useful technique to reduce the size of your data if you're doing things with it. Um so this is another one that we will we'll study how to do it and basically more details behind it what the algorithms are. Um we'll so probably those two in unsupervised will spend the most amount of time on clustering and dimensionality reduction. uh and supervised if some data is present but we didn't label it means in example we had circle triangle square in the training data we add pentagon but we didn't label that in that case uh yeah so every um in supervised learning every row think about it as like every row in our data frame name needs to have a label uh associated to it. It needs to have a a column that represents the label. So if we've never seen Pentagon before, I can't use that as a label. So it has to the pentagon has to exist in the data if I'm going to be able to predict it. Right? So, it can't predict, right? If we've never seen it before, we have no examples to go off. We have no guidance. So, how could we predict that, right? We can't predict it if it's if it's in there. So if if we have labels of Pentagon let's say then yeah we could predict Pentagon we could remove what We wouldn't if it was talking about the Pentagon, we wouldn't remove that. No, let me go back to that page. We wouldn't remove it. Um, it's just if it's not in our labels, we're not going to be able to predict it. So, Pentagon's a good example here. Uh, Pentagon is not one of our labels. So, it currently is not in our data set as one of the labels. We only have data that's either a triangle, circle, or a square. We don't have Pentagon. So I would never be able to predict Pentagon. I'll never be able to do that if I haven't seen examples of it before. Okay. But let's say we had that in there. So we had Pentagon. So if we had Pentagon, um we could have an example of it in our labels. And then yeah, we it could be then we could predict it. Yeah. Yeah. The the don't get worried. Don't worry about the test data. So the test data is just saying here's a new here's a shape. What is it? Okay, that's a square. Here's a shape. What is it? Okay, that's a triangle. And we could have as many of those examples as we want in our test data. So we could have a circle and say, okay, what's this should be circle, right? The test data can be whatever it whatever it wants. But yeah, if if we've never seen Pentagon before, we're never going to be able to predict it. These are the the label data and labels are basically the talking about the same thing. The labels just mean what are the categories that are present in our data. So in this data we only have three labels that are present. So the labels is are relative to our label data, right? It's saying what labels, excuse me, what labels uh do we have in our data and we only have those three circle, triangle, square. So so pentagon would not be part of those labels. We couldn't predict it. No. So unsupervised is not going to make a prediction. That's the big difference with unsupervised. They're not going to make a prediction like this. Um so unsupervised is not going to make a prediction. It's going to do something different like um basically say like these guys are similar, these are similar, these are similar, this is a cluster, this is a cluster, this is a cluster. It's not going to make a prediction. That's what supervised learning does. Clustering, yes, which is unsupervised. Yes, clustering does not require any labels. Unsupervised just means we don't have any labels. We don't require any labels. So the other thing unsupervised might do is it might say and again without the labels it might say that this is an outlier. it might say that this guy is an outlier because there's only there's only one of those and they're not like the other. So that that's something that um that's something that uh unsupervised could do. Um it it yeah and no. It kind of labels a cluster in the sense that um it would basically assign a number to it like this is cluster one, this is cluster two, this is cluster three. It'll assign a number to it, but it's not a very meaningful. It doesn't assign like a prediction label in in the traditional sense of a label. It does provide like a numerical index for the cluster to because what we want to know is like okay this guy has the cluster of one. This guy belongs to cluster one. This guy belongs to cluster one. This guy belongs to cluster two. This guy belongs to cluster two. Does that make sense? So there needs to be some like index of what cluster you belong to. So it's kind of like a label but not in the traditional like prediction sense. Okay. Very good. So again, unsupervised, no labels. You're doing things like identifying clusters, um identifying outliers, doing dimensionality reduction. These are all like structure and pattern oriented things. They're not predictions of a label. Okay? They're not which is what we would see in supervised learning. Okay. So an example would be that we take we put in the data um we can group together uh data such as images into categories based on similarities um which would be like those clusters. So there's no these would be groups that we don't have any label on ahead of time like we don't have we don't say that this image should belong to this this image should belong to this we derive that from the characteristics of the data. Um so think like a good example is um customer groups. So we would identify customers based on like okay do they have similar spending levels? How many days do they go shopping in a week? How much money do they spend? And we could kind of group together customers based on similar qualities. Clustering will find those groups that should exist. um it will discover those groups based on um the similarities in the data, but there's no labels that that say like this person should be in this group, this person should be in this ahead of time. There's no labels of that. It gets derived during the algorithm. It's unsupervised, right? There's no unsupervised really literally means no guidance. There's no guidance to doing it. We just derive that from the structure of the data which is the similarities. Okay. All right. So, a couple more for you. So we had um supervised which uses the labels. We have unsupervised which uses no labels looking for structure. And then we have something that's kind of in between which is um what is known as semiupervised learning. And this is where you use a combination of a little bit of label data, but most of your data is actually unlabeled data. Um, and you try to get some use out of that label data in order to um build a model out of it. And so, uh, it uses the, um, it uses that label data to, um, generally provide some guidance on usually what happens with semi-supervised learning is you use your label data to kind of predict what the label should be for the unlabelled data and then you can go from there. So you can create artificial labels on this unlabeled data and then you can use all of it once it's all been labeled kind of like a supervised learning uh approach. So but but this is semi-supervised basically refers to the fact that you start out with most of your data not being labeled but you do have some labeled examples and what you can do is basically extrapolate those labels into the unlabeled data set and then provide some artificial labels and then now everything has a label you can do supervised learning. Okay. So, it falls kind of between um supervised and and unsupervised. Uh and there so this is this is kind of rare. Most of the time you're not going to do that. You're actually just going to um prefer to just start with all label data. That's usually the preferred approach. Most of the time you'll actually just be doing supervised learning, not really semi-supervised learning. So, it's pretty rare, but um it it could like if Yeah, it could if the if we had a lot of examples of Pentagon and we wanted and so they were unlabeled and then we tried to guess what kind of shape they were um and provide an artificial label uh and then um then use that whole data set to build a model off of then then yeah, it could it could fall into this category. Okay, they Oh, going back to the question, they still use some kind of label data like age, gender. They use uh that's those aren't those aren't really labels. That's the features. So, yeah, they still use the core features of the data. They just don't have any like labels in the traditional sense of a label. Like you should think of a label as something we are trying to predict. So whether that's a price, whether that's like a category like spam, not spam, cancer, not cancer, it's something we'd be interested in kind of predicting. And so um in our data, we would have an answer for every row. We'd have one of our columns would be like the the result like the outcome answer that we're trying to predict. That's the label. So at unsupervised, we don't have any of the labels. We do have just the regular features like gender, age, income, square footage, bedrooms, bathrooms, all those things. Okay. So, we have semi-supervised that falls in between supervised. Now the reason it falls between is be is because there's a decent amount of data that's unlabeled. In fact, a majority of it unlabeled. But what we can do is try to label it. We can try to take what we know from our existing labels and predict an artificial label and then use all that data together in kind of a supervised fashion for a model down the road. So that's kind of what this picture uh says is we can try to take um you know maybe we try to infer some labels based on we have some some labelled data here. We have most of our data is unlabeled and we try to supply some labels to it. Um like maybe we have a babies category of teens, a tween, uh you know youth and um adults. Um and then we try so we we take our our labels and we try to extrapolate those into artificial labels for this unlabelled data so that we can use it now because then everything has a label at this point and then we can just go ahead and do supervised learning from there. So we can do supervised from there. What we would prefer to do and what we'll do in this course um is just start with supervised. We'll just start with the labels. We won't try to derive artificial labels usually. We'll just start with labels. So one example in the real world is something like Google photos which um whenever you take a picture it can provide uh uh labels based on previous uh images in your library. So it can it can produce tags or um labels on those. Uh generally when you take that picture it's kind of unlabeled unless you go in and specifically provide some tags and some labels. But um if you don't do that it can still it can still uh make it can artificially create one of those based on the other label data that you already have. So that's um that's an example. Okay. All right. Last one in terms of machine learning. So we have supervised, we have unsupervised. Uh then we had semi-supervised which is somewhere in between a mixture of having some unlabelled data and label data. Um now we're going to talk about reinforcement learning which is completely different. Um it's it's completely different than the other three. It's a type of machine learning where we uh basically learn from interaction with the environment. And you might ask what are we learning? We are learning what actions to take in the environment. Um and the way we do that is by reinforcing positive actions that lead to a a reward. Um, so that's where the word reinforcement comes from is we we basically uh imagine like a child that's, you know, learning from trial and error. Like they're trying to crawl, they're trying to walk and they keep falling down. um eventually they learn how to do it through trial and error and they might get a reward or they might um reinforce some of those positive movements that lead them to walk or crawl um or they might learn from the penalties, right? They might learn from uh some type of feedback. So they might learn from falling down like, "Oh, that hurts. I should uh support myself a little bit better, right?" Or be a little more coordinated. Um and so they they learn from those actions and their interaction with the environment. Um uh so this is a complex um algorithm essentially uh it's it deals a lot with um again taking actions. Usually when you take an action something changes in the environment um then you kind of observe some type of feedback. So, think about like a a board game where you're trying to figure out what move you should make or another good example is like with a robot um trying to navigate a maze. So, like what route should it take? Should it move forward? Should it move backward? Should it move left or right? Those are different actions it can take. Also, like a self-driving car, should it should it turn? Should it speed up? Should it slow down? Those are all good examples of things that have been trained from reinforcement learning. Uh yeah. So real world examples would be like in a board game, uh a a reward would be like if you win the game. Um or if you like capture a piece like in checkers or chess, that's a reward. A penalty would be like if you lose the game or lose one of your pieces, that could be a a penalty. um in a board game or sorry in like a a robot navigation task, it could get rewards for um moving in the right direction um towards the exit or like when it like let's say you wanted to train a robot on how to open the door and navigate a room. Um you would penalize it for bumping into the wall. Um you would give it a reward for moving usually oh like oh the algorithm themselves usually it's like a a step function um it's usually it's like a discrete function that kind of is based on the state so the reward it could be like um like depending on the let's let's go back to the board game example like the reward could be like or even the maze let's say like a navigating the maze like getting to this let's say this was the exit and this was the entrance. Then if they make it to here, they get a numerical like if they make it to the exit, they get a numerical reward of like plus 100, let's say. So it's just a number. And then if they uh like if they bump if they go into here, like let's say this is kind of like a death trap or like a pit, this this would be like a minus 100. So it could be like discrete numerical values could be the reward. If they're moving in the right direction like let's say we want to encourage going this way then we could give smaller intermediate rewards like this should be a plus like if you move forward this is a plus five this is a plus 10 this is a plus 15 if you're moving in the wrong direction away from the exit. Um that would be like a minus5 or a minus 10. Does that make sense? So they're they're numerical in nature. And what you're trying to do is collect the most reward. You're trying to get the largest reward you can through trial and error. So you you try this out many many many times. You basically simulate running through this maze many many times. And what dictates it what dictates like where I should go is based on what I've observed in the past. It's almost like you're a child remembering like, okay, what move should I make from this space? Like, if I'm here, if I'm here, which way should I go? Should I go down? Should I go right? Should I go left? You kind of know that from experience. Does that make sense? Based on the reward that I've seen in the past, like when I've moved down, I've gotten a higher reward than moving left or right. That make sense? So, yeah, it's it's a numerical value as a reward. Yeah, that's a great question. Um, how does it differentiate rewards based on gain and loss, i.e. chess? So it's it's a very comp complicated uh answer but essentially every so in the chess board you can think of the board as like every every um space is a state. So I could be in this state I could be in this state and then it's not not only is every every uh space but where all the other pieces are. So there's lots of states that are possible. Um, so the way there's a way to quantify essentially what's the value of taking a certain action like moving my piece left, moving it right, moving it up or down um given the rest of the state. So you're you're right, it may be beneficial to sacrifice. Um, but we would learn that through experience that okay, the best move in this situation is to sacrifice. We would we would have to learn that through trial and error many many many times which is to say like okay if I'm in this current state of the world right all these pieces are distributed in this way the best move for me right now in the long run to get the most reward in the long run is to actually sacrifice my piece and move it right move it into like a bad position theoretically but we know from experience that's actually the most long-term reward is from that position like moving it right may be the best action for me. So what you learn is how to take actions. And actions are usually like move right, move left, move up, move down. You think about like a self-driving car though, that's going to be like slow down, speed up, turn your wheel 10°. Um those kind of actions. So the the short answer is it's there's a calculation there that you learn what the long-term value of every state is every unique state and then you're trying to basically say what action should I take from that state given that current state of the world. Okay. And I really I really like reinforcement learning. It's actually probably my favorite field of machine learning. Unfortunately, we won't be covering it um in our main uh course. We have offered uh electives around reinforcement learning in the past. So, um stay tuned. Maybe when we get to the end of this program, uh we'll offer an elective on it and if enough people sign up for it, we'll we'll run it. But, um we it's not part of our we don't really cover reinforcement learning as part of our main topics. It's it is an advanced uh more advanced topic than than what we'll cover. But, um I I really enjoy it. Find it very fascinating. Okay. So, all of this is kind of um illustrating what I was saying, which is um you think of like uh the thing that's interacting in the environment like the robot or the car or the human moving a chest piece is known as the agent. It's interacting with the environment by taking actions which updates the state um of of the environment. So that's that's why you see this word state here. This gets updated constantly every time you take an action. Um ultimately what reinforcement learning is trying to do is learn the best action. Like what would be the best action to take? Um and the best action is is the one that leads to the most long-term reward. That's the best action. Um, so you have to uh you have to learn what you know what leads to a good reward by kind of experiencing this over and over and over through trial and error. So there's a lot of um kind of simulation or letting the robot try something a lot um in order to kind of learn what's rewarding and what's not. Think about it again like I think a good example is like with children, right? you kind of have to let them try things until they learn on their own what's what can they do and what can they not do what's the best actions right so reinforcement learning has made its way into other places so I I said like a good example is self-driving cars or ro robotics a lot of reinforce reinforcement learning is used there one place it's found its way into recently is recommendation systems have kind of merged with reinforcement learning Um, and this is because you you can imagine there's kind of a built-in reward for you clicking on a video and kind of watching it. Um, so that kind of reinforces that recommendation and then uh that's where um you can then kind of recommend a similar thing and see if that's rewarding and generates a click or generates some view time or watch time or whatever. Um so reinforcement learning has found its way into a lot of areas. Um recommendations being one of them because it's just natural for the idea of like what um should I recommend next to generate the most reward. In this case the reward is kind of correlated to did they click on it or not or did they how long did they watch for longer it's more rewarding. um those kind of things but uh place places where reinforcement learning have been used I said self-driving cars um games so uh one of the most famous examples if you want to look it up is the um Alph Go this was in 2016 um the Alph Go uh algorithm was a reinforcement learning bot that beat um some of the world's best Go players which go if you're unfamiliar or go is a um board game that is a little bit more uh complex than chess. It has more more uh it's a larger board um more pieces to it. Um but they there was a reinforcement learning powered bot that actually um learned how to play the game so effective it could beat um world kind of masters at the games was pretty amazing. Um that's the alpha go and that was by deep mind Google and deep mind in 2016. That was pretty that was only in 10 years ago not that long. Um so certain uh we said recommendation uh even autocorrect um learning to predict like what is the best correction uh to generate a reward which would be like you accept that correction or you reject it would be a penalty. Um so reinforced learning has been adapted to these kind of problems very successfully. Let's take a look at the packages that we will use throughout. So um of course we will rely on these three which we've already relied on to do a lot of things like numpy to do numerical manipulations and calculations. Uh mapp to do any plotting and not only mapp but maybe seabor as well. both of those to do plotting. Um, pandas is a big one because that's where all of our data is going to be manipulated and prepped before it goes into modeling. So, all of that stuff we learned from pandis is definitely going to be applied here in this course uh as we actually build models. Um, so of course like these old ones that we've been working with quite a bit um still going to be useful here in the modeling stage. Um, mainly for different reasons though, mostly to get our data prepared to do some type of modeling or maybe to visualize it before we do modeling to get a sense of what it looks like, those kind of things. Um, scien uh um processing like in unsupervised learning, we'll actually use scypi a little bit to do dimensionality reduction or help us do that. Um so scypi will be used here and there and we've seen it before with hypothesis testing. We use scypi like the t test and z test came from there. Um some of the unsupervised learning stuff will come out of there. But the package we will use by far the most in this course is going to be scikitlearn which is here. Um and we've already seen a little bit about scikitlearn in terms of its pre-processing capability. So we use the uh minmax scaler and the standard scaler from there from the pre-processing module in scikitlearn. But it has um many different models built into it that we can use to help uh do our training and predictions. Um so it's a incredibly useful machine learning library. It is the industry standard machine learning library. Um if you're going to do anything in machine learning, it would be expected that you know how to use scikitlearn. Now what's really lucky about that is that scikitlearn is a really easy package to get used to. Nearly everything we do in scikitlearn will mostly follow the same pattern and so um the code will be extremely simple. They did a great job with that package of making things really user friendly, really simple. Um, it's a really fantastic package and we're going to get a lot of practice with it uh as we go along. Every model we build will essentially be from scikitlearn and not only like the models but um doing the training, doing the predictions and then doing the evaluation will all come from different uh scikitlearn u modules. So that'll be really nice and we'll get um good exposure to that package throughout the course. So if anything will come away from this course as um scikitlearn uh uh experts that'll be very nice. So this is this will be the new one for us learn but we'll get a lot of practice with it. Okay. All right. Right. So just to recap that lesson before we move on to lesson three. Um we talked about machine learning as learning from data. Um which is included underneath the AI umbrella but deep learning is also included under machine learning because it's still learning from data but it's learning using neural networks. Um we talked about the four different types of machine learning. We had supervised, unsupervised, semi-supervised and reinforcement. So those are the the different types of machine learning that are out there. Um and then we talked about some of the pi Python packages uh that we will use. The main one being scikitlearn and of course we'll use our older like pandas to manipulate our data and get uh pass it into our model training etc. But scikitlearn will be uh our goto for anything machine learning. All right. So I have some questions for you guys, some checks. So let me know in the chat what do you guys think? Uh which of the following best describes machine learning? Which choice do you think makes the best is the best for this? Very good. Very good. I see I see a lot of choices for A and A would be the correct choice. So machine learning is definitely um a a subset of AI. It's underneath that AI umbrella, but of course we're learning from experience and of course that experience is recorded in the data um without being explicitly programmed. Uh so it's the exact opposite of BNC. We're definitely not learning from rules and it's definitely not just used for image and speech speech recognition. It can be used for many other things beyond those. So yeah, A is the best choice there. What we say here? Okay. What do you guys think about this? Which example illustrates the use of machine learning to enhance customer experience in an ecommerce company? In other words, what would be some what would be some uh typical use cases of machine learning? good. So I think uh C is going to be the best answer here. Definitely C. So it's using machine learning to do uh fraud transactions. So So that would be a prediction probably a supervised learning, right? if if this is fraud or not fraud. Um and then maybe some customer behavior uh that might be unsupervised. So maybe grouping together customers uh clustering them based on their data like their shopping behavior and characteristics. Um that that might be unsupervised but either way it's machine learning. Okay. Okay. Final one. What distinguishes deep learning from machine learning and artificial intelligence? So what's unique about deep learning? Oh, very good. Yep. So, deep learning uses neural networks as so you guys are right on top of that. Neural deep learning uses neural nets. That's what makes it unique. So, machine learning would be part A. Machine learning is focused on learning from data. underneath of that is learning from data using neural networks which is what uh deep learning is. Very good. All right, let's go to lesson three. And lesson 3 has two notebooks. We're going to be starting with 3.1. So, you'll want to open up that notebook. I'm going to go over to it now. Give you a moment to open that up. So, we're going to open the 3.1 notebook. Um, there's two of them. We'll see how far if we can get into the second one today. Probably will. Um, but we're going to do the uh we're going to start with 3.1 notebook. Do you guys have this notebook? Should be in your materials for for this course. Let me give you a moment to open that one. Do you guys have it? All right. So, we're going to start by talking about uh supervised learning um in our machine learning journey. So remember, we're going to talk about uh supervised and unsupervised after we do supervised. Um and there's going to be a lot to cover with supervised mainly because um there are uh two different types of problems we can tackle uh which will be uh we'll talk about in a moment predicting different kinds of values. Um but let's talk about the kind of what we're hoping to learn here which is um talk about the different kinds of problems that we'll study which are these these categories of supervised learning. Um those two categories are going to be called classification and regression. We'll talk about those and their differences and then talk about some applications and some uh example algorithms and that's just within this notebook. Um 3.2 two we'll get into uh regression in particular um which will be uh very very interesting. Okay. So that'll be our first models that we'll build will be over there in 3.2. Okay. So if you guys remember um supervised learning is where we learn from labeled data. So we have input and outputs in our in our data set. Um and so you train a model on this data that includes input features and corresponding outputs that are that are the labels, right? So um the goal is to learn a relationship between the input and the output. Of course, that's what any model is trying to do. Um, and what this allows us to do is then take that model and use it to make predictions on never-beforeseen uh data. Right? So then we have a predictive model out of that that we can use um going forward on new examples. Um so remember we will have in our data a bunch of features which are columns and then generally one of those columns will be the label that we're trying to predict. And our model is going to try to learn some type of relationship between those inputs and the output label. So the output label could be like fraud not fraud, cancer not cancer, uh a price, a temperature, those kind of things. So let's talk about that. inside of um supervised learning there are two different types of learning that we can do and they're really based on the label or sometimes that label is known as the target that we're trying to predict. Um and depending on that type we get these two different categories of learning or two different types of learning. One is known as regression. So that's generally when we are predicting something that is continuous or something that is a numerical. So numerical numerical value. So think of price, think of temperature, think of revenue. We're trying to predict something like that. Um versus something that is categorical. So that the predicting something categorical would be like fraud, not fraud, spam, not spam. um those are discrete categories and the problem of predicting categories is is known as classification because we're trying to classify examples as belonging to one category or another. So we have these two main types of supervised learning problems. We have regression and we have classification. and they're going to be handled slightly differently um for many reasons that we're going to uncover. Um one of the primary reasons is that of course we're predicting something that's continuous in the regression case versus something discrete. So the models have to be slightly different to account for that. Um but then a step beyond that is the evaluation has to be different too. Um I kind of alluded to this last week, but when you're predicting a regression, it's very very difficult to to get the exact numerical answer. So um generally we don't care about that. Um generally we don't care about getting it exactly uh we don't care about getting it exactly right. um we just care about getting it um we're just we care about getting it nearby, getting it close enough. Um whereas classification, we do care about getting it exactly right because it's a discrete category. So we're going to be able to evaluate that a little bit differently to say did we get the answer right or wrong. Regression is going to be did we get close? Um because it's we assume it's going to be nearly impossible to predict a a continuous number. Um that's very hard to do. Okay. So any questions on uh that? Any questions on those two differences? Let me give you some examples. Maybe it'll it'll help too. So again, the classification is going to be predicting uh something that's categorical. Regression is going to be predicting something that is continuous. So think about trying to predict the price of a house based on those other features we talked about before like square footage, bedrooms, bathrooms, all of those things we predict the price. That would be a regression problem because the price is a continuous value. Let's take a look at an example here. Um, imagine we were trying to uh predict the temperature tomorrow. That's going to be a regression problem, a a supervised learning kind of regression problem because we're trying to predict a numerical temperature. Okay? And versus a category like a discrete category would be this would be a classification. So this is a regression on the left. This is a classification on the right classification um because we are um predicting one of two categories. Is it just hot or cold? Now, we're not saying exactly where that threshold is on what's hot or cold. That would be a decision on on what we want to what our discrete categories actually mean. But, um we only have two choices, hot or cold. versus predicting the entire temperature which would be um a numerical prediction of some exact number. Right? So that'd be a regression and then on the right would be a classification. Um now again why is this so different? You can see the types of predictions we're making are completely different. One's a number, one's a category. But again with evaluation it's like if the if the true answer in our labels was 84 and we predicted 83 that's a pretty good result. That's still pretty close. That's pretty close to this. So from an evaluation perspective that's pretty good. Um whereas like if I predicted cold and it's actually hot that's that's a wrong answer. So they're evaluated slightly different. Um, and that's something we're going to see as we talk about evaluation of our models once we build them is depending on if it's classification regression, there's going to be different ways of evaluating them. You can kind of see why it's very difficult to say, okay, we got exactly 84 when it could be any number. Our model is going to be predicting a number. That's really hard to pin down an exact floatingoint number. So, the best we can do is kind of say, how close did I get? Like, this would be a worse answer. If I got something all the way down here, that's a really long distance to here. That's bad. That's a bad prediction. But if I get something really close, that's better, right? That's a decent prediction because it's pretty close, right? Of course, being perfect would be getting exactly right, but that would be nearly impossible to do. Okay. All right. Any questions on this? Does it make sense on regression versus classification? We're going to use those words quite a bit as we go along. So, regression predicting that continuous value. Classification predicting a category. And they're going to be um different models that do that different models being used for regression versus different models being used for classification. All right, let's talk about supervised learning uh applications here. So just to name a few, we have HR operations. is imaginary recruiter tasked with finding the best candidates. Um so supervised learning can help by um rejecting or accepting candidates. Now this is something that happens quite a bit even today. Um and that it's kind of like uh how recommendations happen like this this resume should be um recommended this should not um from a whole pool of applications. Um so there's those kind of use cases of of um predicting a category that would be like a classification. Should we should we accept or reject the the candidate? Um finance you see this all the time with things like risk and loan approvals. Um you can uh predict the the the category of like if the if the loan if we should accept or reject the loan application. um you know that would be a classification. Um what's interesting about classifications by the way so it says here like we can predict the likelihood of a of a loan being repaid um is a lot of classifications um we we say that they predict a category but under the hood they can actually predict a probability and we turn that probability into a category. So, um, you know, like we could say what's we could say the likelihood of her loan being repaid is very low. Let's say it's less than 50% probability. Um, then we could label this as reject, right? We could label that as a rejection. Um, if it's greater than 50%. Then we could label this as accept. So we can set a threshold there and say okay truly we're predicting a prob like our model spits out a probability but we turn that into a category by saying should we accept if it's less than 50% we should reject if it's greater than we should accept. Okay, so that's something we will see with some of our classification models is that they actually produce a probability and we turn that probability into a category label um by by doing something simple like this putting a threshold on it um for the for the category. So finances is used all over the place. Not only just loans like fraud, we talked about fraud, not fraud. That would be a classification. Um predicting sales revenue, that would be a regression, right? What is the revenue going to be in the next two quarters? That's going to be a regression problem. Uh emails like spam, not spam, that's going to be a classification. um that's going to operate on the that's going to take the text input and predict if this email is a spam or not spam. That's going to be a uh supervised learning problem, but it's going to be a classification problem, right? Uh manufacturing supervised learning is used to inspect and uh quality and classify products in different grades. For example, a factory might use a model to check for defects. So this is actually something that happens is you look at images of products as they go through the assembly line and you can take a look at those images and predict if it's a high quality, low quality, medium quality. Um so they can be this is a classification, right? They're going into different categories of quality. Um so it's much much like a manual kind of intervention by some uh QA or quality control uh specialist. Okay. But that's a classification. So in the maritime industry, supervised learning can be used to predict current. So current level um and that can be used to forecast uh supply and demand. Um so those would be like regression models that are used to predict um kind of like temperature but in this case like title levels. We talked about fraud already, so that's there. Um, that would be a classification. Okay, any questions on these uh examples? Of course, there's many more. Um, recommendation is kind of like a supervised learning problem uh where you are taking examples of things that people have viewed in the past or or reviewed in the past and using that to predict what they would want to watch in the future. Um, so recommendation is supervised learning. Um and it's like a classification, you know, trying to predict um uh certain number of categories of of uh shows or movies that you would want to watch. Um and that's something that we will study in the future. Recommend we'll we'll have a whole lesson dedicated to recommendation as well. All right. So when it comes down to the uh actual models themselves, so there's going to be lots of different models that we are going to cover. Um and they are um going to be different in their purpose and kind of their uh what kinds of problems they're used for. Um and uh their their how they actually train is going to be different. Um, but at a high level, they're all trying to do the same thing, which is learn some sort of relationship between the input data and the and the label, right? That's really what they're trying to do because they're all supervised. They're they have those labels, trying to build some relationship there. Um, they just do it differently. And what we're going to study is the pros and cons of a lot of these models, like when would I use one of them, when would I use another. Um, so we'll try to talk about that as we go along. Um, but they're all trying to learn some relationship between the input features and the output, right? So we have to keep that in mind. They're trying to model that relationship. They just do it in different ways. Okay? So as we go along and learn about new models, um, we will learn the details. will learn the ins and outs um and those pros and cons, but they're no matter what, they're all trying to uh learn that relationship, right? And be able to make predictions on new data. Okay, so here's a list of models that we will cover and work on throughout the uh the sessions that we have. um we're not going to do them all in one one sitting, but um the first one that we're going to start with and that we'll cover today is going to be linear regression. So we will cover linear regression and then we'll cover the rest of these guys mostly in the context of uh classification. So, um, what's interesting is some of these guys can actually be used for both regression and classification as long as you make, um, certain adjustments to them. They have variations that can be used to do classification and regression is very interesting. Um, but we're going to start with linear regression today and then work our way through the rest of these models when we do um, we're going to do a separate lesson four on classification. So these all these guys will come from lesson four. Um and then uh we will do this guy in lesson three in the 3.2 notebook. We'll do all about linear regression. Yeah. I so logistic regression is a classification. Um which is kind of strange that its name is regression but it's doing a classification. But the the reason is that the logistic regression um computes a probability. So it does a regression to predict a number but that number is actually a probability. So it it produces a result that's between it produces a probability that's between um obviously uh zero and one. So it uh and then we take that probability and we turn it into a category like a spam not spam fraud not fraud. Um but so so logistic regression is kind of special. It's sort of like a regression but it's predicting a very specific type of value which is a probability. So for for that reason it's a classification uh algorithm primarily. So we'll study that one in lesson four. Uh but yeah, that's that's why it's under that kind of umbrella of classification is because it's it's producing a probability as its main output which we can then turn into a category as long as we interpret that probability as um in the right way uh like the probability of spam, probability of not spam. Okay. Okay. So, let me focus on um let me focus on linear regression. I'm not going to go through all of these other use cases because we haven't learned these models yet. Um so, I don't think they're good. Uh I don't think it's good to read about them yet until we've covered them. So, once we cover them in lesson 4, I'll come back and describe these examples to you guys and we'll see why it makes sense. But I think for linear regression um which is what we'll cover next, let me talk about that example. So a prototypical example would be like predicting the house prices that we've seen in that house price data set. So um if we wanted to uh if we wanted to predict um if we wanted to estimate the market value of a house so the price um we could do that by using the features such as number of bedrooms, square footage, location, age of the property. Um and you know then when a new when a new house comes on the market we could estimate what the price should be based on those features. So linear regression is a good one to predict the price like a housing price. Um and we'll actually practice that in the next uh notebook. So we'll we'll uh and then all these other now there's descriptions of these other models but again we haven't covered these guys yet. So I don't want to really go through those until we get to those models. So we get to those I'll come back and mention the example. Uh can K andN be used for clustering? No. So um the clustering model is going to be different. It's going to be uh K means K means that's the primary clustering model. Not K nearest neighbors. K nearest neighbors is used for uh it can be used for regression. It can be used for classification. So we'll we'll talk about K andN which is the K nearest neighbors in lesson four. It sounds really similar. Yeah, it sounds really similar but K means is a clustering algorithm that's that's slightly different different uh there's no labels used at all. This K nearest neighbors is a is a supervised learning algorithm. that uses uh labels. Good. Any any other questions so far? Okay. So that being said, let's move on to the 3.2 notebook. Let's move on to that which will be our um first discussion around uh regression. So going into supervised learning and regression. Give you guys a moment to pull up this notebook. But yeah, you want to pull up the 3.2. We'll do this one next. So we'll focus in. So our plan is to do regression first and then we'll talk about classification in lesson four which we will cover all those other models which you you could use for classification uh on that list. But then we're going to talk about linear regression uh first. All right. So we have a a big agenda. This is a big notebook um to go through a lot of material here surrounding regression. So we're we're going to start with linear regression and see um how we actually perform it, what that model is doing. Um which we've kind of seen the idea of it a little bit already. So it should be somewhat familiar. Um and then we'll talk about how to adapt that linear regression idea to um nonlinear what's called nonlinear regression which is going to be using like polomial uh features. We'll talk about how to do that. Um and then a big big big topic for us is going to be evaluating the model. So it'll be it'll be quite easy to actually build it. building the model will be really easy but evaluating and interpreting that will be uh a lot of interesting work there um because we want to know what the performance of that model is once we have it built right we want to know how good of a model is it is it worth using or do we need to retrain it or get new data or change the model up we're going to talk about that um how do you determine what to do based on that performance um and then we'll talk about here um a couple things. We may not get to this today, but regularization which is used to boost the performance uh in certain situations um whenever the model is kind of uh performing um poorly against test data even though it performs pretty well on training data. In that scenario, you can use offshoots of linear regression that do some uh what's called regularization. We'll talk about that. Um and then we'll talk about hyperparameter tuning uh generally as a strategy which is something you generally do want to do when you're training machine learning models. Um so again these two we may not get to today but um quite a quite a lot to get to prior to that mainly centered around evaluation and building linear regression. Okay. So pretty cool. we'll get to our first kind of model here. This linear regression to start with. Okay, so let's start with uh linear regression here. Um, and really what linear regression is attempting to do and I want to show you this in this picture is draw this line sometimes what is known as the line of best fit. So this is our model that kind of goes through the data and it's generally a good predictor um because if you give me um features uh if you give me new features and let's say they are let's say you give me a feature that's right here. So you say, okay, I have a feature that's this value on the x- axis. Then I know all I have to do is plug that into my line equation, and I will generate a a value that's like right here. Okay, that's pretty that's on that line at that input. And that's going to be my prediction for what the output variable should be. It's just going to be something on that line. And what you can see is this line is a decent estimate for this data because it slices through this pretty evenly. So it's a good guess as to what the output should be given any one of these inputs. It's a it's a good estimator this line. And so our goal building a linear regression is to kind of build the equation of this line. So we want this equation. Equation of this line is going to be our model. Yes, it's going to look just like that. MX plus B or yeah, MX plus C. It's going to look exactly like that. uh except that it's going to be more than just MX because we have um generally more than one feature. So you think of X as a feature um it will be more than just MX. It'll generally be like uh it'll generally look like this and then plus maybe some bias here plus an intercept. Yeah, it'll generally look like that. So, yeah, you're exactly right. MX plus B is the right idea. Exactly right. It'll generally look like that. Nonlinear. It can be adapted to nonlinear. Yeah. If we transform, we're going to talk about that. If we transform all of our features in a nonlinear way, um we can apply linear regression to it. Yes. And and that would be a nonlinear regression. So yes, we can do nonlinear things too. We'll talk about that. Okay. So linear regression again is the art or science I should say not really art but it is an exact science of finding the equation of this line that fits through this data. Um now why one thing you should be thinking about is why is this line a good predictor and the argument is that if you take a look at this distance from these blue points so let's say these blue points are our actual data points this line is going to be found such that it minimizes this distance from the points to actually I should draw it this way from the points to the line. So, we want this distance to be um actually I should draw it that way. This way, we want this distance to be kind of at a minimum. So, it would be bad to draw a line all the way out here because then that's a lot of distance, right? So, and that would be a lot of error um contributed from not being able to predict those points in our data set very well. Um which is our training data. That's why we have labels, right? that that guide us in building this line. Um so our goal is to build that line especially so that this error or this distance can be as minimum as possible. Right? Which are all these distances from these points to the line. We want those to be as minimum as possible. So our goal is to find this equation. So we're going to build a model that's going to find this equation. of the line um such that our error is minimal. And what is the error? The error is the distance of our data points to to the line that we build. So essentially what we'll do in order to train this will be to adjust the parameters or the or in that like I think is really good you brought up the MX plus C. Basically the M and the C will adjust. So we adjust those accordingly to make this distance as small as possible. Okay. to minimize that distance as much as possible. Okay. So, um where is regression use? So, we've already seen some examples. Here's some more. Uh advertising like predicting sales, predicting um oil and uh oil production and demand. Those are like forecasts. Those are regression problems. Um retail like demand forecasting for inventory. Um healthcare predicting um uh the levels of certain um uh blood markers or you know something like that. Um real estate predicting prices based on those uh talked about like square footage, bedrooms, bathrooms, those things. So regression is used again whenever we want to predict a number a numerical output um that's a regression problem. So this kind of regression we're talking about here is generally um known as uh a when that equation is linear that is known as a linear regression. So go back to that picture when we have a when that equation of the line that we find is a linear equation meaning that it is exactly the form I've been telling you. So it's it's something like um weight time feature plus weight time feature plus weight time feature and then maybe some intercept um term like some some bias term there. Um this is a linear equation because all of the features are to the single power. So it's a linear power and this is a linear combination of features with with those different weights. So this is a linear model because it is uh it's what in math we would call this a linear equation right everything is to the first power it resembles mx plus b it is a linear equation or linear model. Um so when we talk about linear regression that is a regression model so we're predicting some continuous target that assumes we are model our model is formed from this kind of equation a linear equation. So this is going to be our our model for a linear uh regression. Okay. And so when you when you train a linear regression, your goal is to learn these weights so that you can plug in um you can plug in any one of your uh input features and you um can generate a prediction. You can which is going to be something on that line, right? It's going to be a value that's sitting here on this line. We put in all of our features and we end up there somewhere on that line. This output. Okay. Okay. Let me pause there. Any questions on the linear model here or why it's called linear regression? Okay. And by the way in these notes um this bullet point here where it says it uses the least squares criterion to estimate the coefficients that is exactly what I said earlier with the distance. So the distance is based on the square of this this quantity like how far away you are from the line is based on this square distance here and here and here and here. So what we're trying to do is find the least distance or least squares which is that minimum distance. So that's how we find all of these weights is from minimize. We basically tune them enough using our labels. So here's our label which is the y. We basically plug in our data and tune those enough to minimize the error. It's it's a it's an optimization problem, right? We we're trying to find the minimum of this quantity which is that best fit line. Okay. So we have linear regression um and we can do a simple linear regression that only has one feature. So if it only has one feature that's exactly the so if there's only one input feature sometimes that is known as um simple regression or simple linear regression and there's only basically there's only one feature. So one independent variable is the feature. There's only one feature. And so this equation resembles the exact equation that you guys just put in there, which is um mx plus b, right? It resembles exactly that. um we're just using different symbols for those like beta beta 0ero and beta 1 but um basically exactly that simple line there's only one feature. So and that's because that line is going to um that line is going to be generated uh according to that equation. So here's kind of what it looks like. This is the best fit line through all of these blue dots. This is something we're going to be able to build. We're going to be able to build that equation um pretty easily in scikitlearn. So we'll be able to find that um and it won't be too hard. So this line will be um y = beta 0 plus beta 1. So some weight beta 1 times the only feature we have x1. Okay. So in this case um we would be predicting sales. So sales would be the value basically the label that we're trying to predict and the feature that we're putting in is uh I think it's the number of TV expenses. Yep. TV expenses which is on the x- axis. So there's one feature which is um TV expense. So um on this graph this would be this would be our model. Okay that would be our model. We only have one feature and we have um these two weights. We have an intercept B 0 and or beta 0 and then a one weight which gets applied to that one feature beta 1. And so our model would have certain value for beta 0 and a certain value for beta 1. That's what get that's these guys get learned learned during model training. Okay. So those are what get learned during our model training and they get learned by a a a least what's called a lease squares algorithm that is trying to minimize that distance. It tries to tweak beta 0 beta 1 to minimize this distance of this line um this line to all of these points trying to minimize this. So imagine taking a line and kind of moving it around and turning its its slope, its angle um to try to find that best fit, which reduces that error the most. Right? That's kind of what we're doing. Uh can I explain? Yeah. So uh sales is in dollars and and TV expense um uh TV actually I think it's the other way around. I think the sales is actually a quantity. So I this is number of sales that we have and TV expense is um I think I think it's in dollars. So how much money how much expense um did we put into the into the product and then this is how many sales did we have of that product. So I think it's the other way around. But what this what this graph is showing is the blue points are our actual data points. Okay. So so we have a collection like we have a data frame that has so imagine we had a data frame that has the uh true values. So it has the um TV expenses. Um, it has points that are like one. So, it has points that are like 120 and then the sale sales could be like 700 700 units, let's say. And then it has um so this is just our data set, right? This would be like in a data frame that we have. And then we had ones that were um 50 and then this could be um this could be 400, let's say. And on and on and on, right? So this is our data and this data is plotted in the blue. So these are these blue points here, right? So these are the blue points here and the red points are is our model. So we built a linear regression model um where we are putting in some values. We're putting in some e fake x values here and generating some predictions which is this line, this linear uh regression line. Right? And that line is derived from this data. Right? It gets learned from this supervised uh examples. Does that make sense? That line is derived from the data. It's actually um learned from like the line of best fit is learned from that data and the actual data is in the blue. So you can see we're trying to build this such that this distance is kind of a minimum. So it's an optimal fit to balance out these distances. So it's just plotting. So it's just building that relationship between the input and output. like when the when the expenses are higher, um we seem to have more sales. Uh what's perpendicular like the distance? This should be this should be perpendicular because it's a distance here. Is that what you mean? Like the distance from the real points to the line. Yeah, that should be perpendicular because it's it's a it's a distance formula. Okay. All right. So, more generally now do do we usually have one feature? No. So generally we expand this to the more general case where we have more than one feature like what we see in the housing data right where we could predict a price but we have many different inputs like bedrooms, bathrooms, square footage etc. So more broadly instead of simple linear regression we have what's known as multiple linear linear regression which means we have multiple variables or multiple features. Um so this is exactly the equation I've been talking about. Um so we just extend that that one into many features. So which is this case and then a intercept term which is uh um there as sometimes known as the bias. Um but this is the intercept term to kind of orient the line to start out in the right place. Um and uh but this is the um this is the equation that we would be building the model. This is our model essentially right this is the equation we would be learning. intercept is like a constant. Yeah. So if if all of the features were zero, um this is what our our data would be. This is what our result would be. If basically if this was zero, this was zero, this was zero, it would reduce to this as the prediction. Yeah. It's like a constant. Yes. So in in geometry the intercept is actually really important because it it orients where your line should start. So it orients like so so these values are kind of like the slope. They orient the tilt of it like should it be tilted like this or should it be more sloped? But the intercept orients where it should start like vertically like should it start all the way up here? Should it start more down here? Um, that's what the intercept kind of tells us. Okay, so this is the situation. This is going to be our linear regression model that we will be building most of the time because we will have again these are all going to be features. So this is some feature the X this is some feature this is some feature X1 etc. These are all features and what gets learned during the training are these coefficients. So all of these coefficients including the beta 0ero um will get learned. So these will get learned um from our data right they get learned they will be trained from our data um in order and and how do they get trained it's from reducing that distance we try to get that line of best fit by tweaking those betas enough to uh until we reach a minimum distance but there's there's an algorithm behind that um that that psychit will run for us to find that best fit Um, so we don't need to do that manually, but that's that's the process is basically tweaking those weights to end up with that line of best fit. So in higher dimensions, instead of a line, you get more of what's called a plane here. Um, which kind of looks like this. So the best fit is actually this plane where all um, it kind of dissects all these points just like that um, in higher dimensions. So this is uh instead of a line you get this in in three dimensions you get this plane like this but it's still it's like a line of best it's just a more general line of best fit. It's still the same idea. Um we're still trying to um come up with the best coefficients to minimize that distance from our from our points to the line. Although in higher dimensions it's no longer a line. It's more like a plane like this. So you're trying to minimize this distance from here down to the plane here up to the plane in higher dimensions. So I want you to keep in mind what we're trying to do before we go into the code because the code's going to make it seem really really simple and that's because scikitlearn is great and that's what it does. But we should realize that there's something really complex going on which is again finding the best value of these weights that minimizes the distance of this line to the data points that we have. So there's an algorithm there that will keep trying to make adjustments to this based on those distances. So it's going to use those distances as a guide to kind of tweak them to find the one that results in the lowest amount of distance. So we keep making tweaks, keep making tweaks, keep making tweaks and eventually we try to find we converge to the set of weights that gives us that best fitting line. Um and and there's an algorithm there that occurs. Now luckily that gets abstracted for us a bit behind um scikitlearn um finding that best fit. So there'll be a function that we use in scikitlearn when we build the model that will go ahead and find the best weights for us and that's then we now have our optimal model. Right? that then we can just plug in different values of these features and generate a prediction which is going to be this uh result. Right? So so that's what we're ultimately trying to do is uh train the model which will uh find all those optimal weights and then uh we can predict with it which would be plugging in different feature values to to generate a prediction. Okay. So let's see how that happens. It's actually going to be super easy um with scikitlearn. So uh in this scenario we have um we're going to import our pandas because we're going to load our data from that. Um so of course we need some data to work with. So we're going to load this uh CSV. Um I uh so I was not actually able to find this CSV for this example, but I mean that's okay because we'll do some we'll do other examples where we'll work with the data. If you happen to have it um great, I didn't see it in in my files. So just have to take the word for it that these are the this is that TV and sales columns here um from this data set. Okay. um as an example. So um just to see how it's fit um what we're going to do and this is going to be a very standard process for us for building a model. These steps are going to be very very standard for us which is going to be first of all splitting the features away from the label. That's the first step that we always will take. So if you take a look at this code, it's taking all rows but only the first column. Okay, so it's extracting all the features from the data frame um which happen to be which is just the first the first column uh which is the TV uh column right just that column there and our target variable which is our label. So our target variable aka the label um is the second column, right? It's that that sales column. Um and so our first step here, let me call that out here. First step is to always split apart features from labels. Okay, so we put all those features into a data frame called X and we have all of our labels into technically a series but uh sort of like a data frame, right? Um called Y, which is just the um which is just the uh uh labels. So that's just the TV values. Um now you're going to see why we do that. It's because we need um our our features and labels split apart to put them into the model building function. It expects our independent variables or our features to be separated from our answers or our labels that guide the model building. That's the first thing you got to do is separate those. Okay, so this code will separate those out into a capital X and a lowercase Y. And that's actually pretty industry standard notation. Whenever you split apart all your features, usually you put them into a data frame called capital X and then you have a lowercase Y to represent your labels. That's actually pretty standard. So it's pretty standard that um X represents features and Y represents labels label column whatever our label column is in this case it is the sales because we're going to be predicting sales using the TV column the TV quant uh expense quantity. Yeah. So what it so the assignment is that we are um the assignment is that we are uh we are um splitting apart our data. So that when we first read in the data um it is a data frame right that has two columns TV and sales. Oh perfect thank you Tim. I will I will go ahead and so if we look at this data it only has those two columns right it only has those two columns. Okay. So what we're doing with this is we are splitting apart our our independent variable our features. So this this x will contain our features and y will contain our label. Does that make sense? We're splitting this data apart. So, we're only grabbing that first column here to be our features. And then we're we're grabbing the second column, which is the sales, because we're going to predict the sales. This is our label. We're going to we're going to build a model to predict the sales given the TV input, TV expense input. So, the first thing we have to do is split apart the features and the label. Okay, that's the first step we usually will take. And the reason we have to do that um just to reiterate the reason we have to do that is because our model will expect our our data features to be separate from the label. We will pass those in separately. X is TV. It's the first column because we're using eyel This is what we're predicting. We are predicting the sales given the TV expense value. Yeah. which is why we split it into so this is the second column right the index one column uh you just put in read CSV and pass in the URL so you could so exactly the code that was up earlier from temp um you just do this and then data equals ddread CSV URL So we split our data into X and Y here. All right. Now, one other step that we're going to take that's a very very critical step and you're going to we're going to see this step over and over and over and over again. So splitting apart into X and Y will become we'll do that over and over and over and over again. Not only that, but doing this next step, which is what's called a train test split. Now, let me show you what the train test split does. It takes our data and it's going to split apart our data that we have, our X and our Y data. It's going to split it apart into a percentage that will be used to train the data and then a percentage that will be used to test. Now, why would we want to do that? It's mainly so we can do evaluation. So, we build the model over here and then we test it on data that has not seen before. So, we reserve a percentage of the data to be used for test. Usually this this data is um somewhere between uh 20 to 30%. So somewhere between 20 to 30% of the original data. So that means the majority of it is used for training. So the majority of the of that X and Y over here is going to be between 70 to 80%. will generally be used for for uh for training. Okay. So somewhere between 20 to 30 the industry standard is some anywhere in between there. Um a lot of people like to use 30%, some people like to use 20%. Um anything in that range is acceptable. Um we will I think we generally will favor like 30%. um to be used for testing. But um the the point is we don't we don't want to mix those together. We want those to be separated out so that we can have a fair evaluation, right? We want to train our data on this train our model on this data and then see how well it performs on this data that it has never seen before. Right? So in order to have data it's never seen before, we're going to take our x and our y and we're going to split it using this function called train test split that will do this kind of splitting for us. Okay, so scikitlearn has a function called train test split that will go ahead and we're going to pass our x and our y and we'll pass in a percentage like 30% that we want to split out into a test set and then the remainder of that the 70% will be used for training the model. Okay. So what we're going to get let me redraw that. So what we're going to get out of this for the train test split is we're going to we're going to have an X and a Y per training and test. So we're going to get now we're going to get an X train and a Y train. So we're going to get training features and training labels. And then we're going to get test features to plug into our model and and test answers or test labels to do evaluation. Because what we should be able to do is build the model over here and then apply the model on this data. Meaning we can take these features and plug it into our model and then see what answers we get and compare those answers to this testing data. Right? We should be able to do that to generate an evaluation. Okay? Now you may be wondering why do we do any of that? What's the purpose of that? Evaluating it on this test data gives us a good sense of will our model generalize to new examples. Right? If it performs pretty well on this data, that's a good signal like when it's performing pretty well on data it's never seen before, that's a good indicator that it's going to perform pretty well when we use it on brand new examples um in the future. Right. So that's a that's why we do this evaluation on this data that it has not seen before. It's going to see this training data, right? We're going to train the model on that data. But that model will never be exposed to this test data until we do the evaluation and and generate some metrics to see how good is this performing and does it have a good chance of generalizing to never before seen examples which is what we want right because we're going to use this model in the real world. It's going to be being used on new examples that it hasn't seen before. We want it to perform well. So, this is kind of our test, our evaluation. Okay. Any questions on the We're going to do this in a moment. I'll show you what it looks like in the code, but any conceptually, any questions on the train test split idea? It's a very very important idea that we um basically use part of the data to train it and then another part of it to evaluate. It's very important we do that. By the way, this has a term um this in machine learning this is called cross validation because we are using one data set to train the model and then we're cross over we're crossing that over into another data set to validate it which is the uh the the testing set. So this is called cross validation. Um there's actually many ways to do cross validation. That's something we'll study. This is a very simple way of doing cross validation. There's more complex ways. You can take your data and you can actually divide it into many sections and basically train it against most of these and evaluate it against one at a time and then rotate. So that's another way to do cross validation. We're going to study that. Um but this is the this is the simplest way to do it here. Okay. So let me show you what you get when you use train test split. So uh we're going to import from sklearn. We're uh from the model selection module. Now we haven't used this before. This is our first time using it. But here's our model selection. We're going to import this train test split function and we're going to use it on our X and Y and we're going to set a test size of 30% which is which is.3. So our test size is 30%. Converted to decimal right converted to.3. So that means we're reserving 30% for that test set. Um you can set a random state. Now that's completely optional. Um the random state is for reproducibility because what the train test split is going to do is it's actually going to shuffle the data and then split it apart into the 7030. So um yes, the seed. Exactly. It's like a seed. So it's it's saying like when you do that shuffling every time I run this notebook I'm going to get the same result but it's going to be random the first it's gonna be random but I'm gonna be able to reproduce that randomness with that random state. Yes, it is like a seed. Uh it's you can choose any number to be your your um your random state. It 42 isn't important. You could choose zero. You could choose one. Um, you could choose any positive integer. Um, 42 is kind of like the uh industry standard. It's it's you'd have to look it up why it is. Um, apparently 42 is a special number. Um, in in kind of the history of development of this stuff, there's nothing really special about 42. You could choose a random you could choose a random seed to be uh zero. That's fine. It it doesn't really it doesn't really matter. Um you just want you could choose it to be uh one two three. Um you could choose it to be 15. You can choose it to be anything you want it to be. It's really so that your your shuffling is consistent. Every time you run this notebook, you get the same shuffle result. So I'm always going to get the same rows in these splits. Hitch. There it is. I knew it was from something. Yeah. So 42 is kind of like a it's it's just used ubiquitously uh you know as kind of a um paying tribute to the Hitchhiker's Guide to the Galaxy, but it's no it's there's nothing that special about 42. It doesn't it's not going to change our result or anything. It's just so that this train set split is going to shuffle our data and split it apart into 7030. You just want to set this to something so that you get a cons every time we run this notebook, we get a consistent shuffle. And so the data in these sets are uh consistent. That's all. Okay. But do you guys see how we pass in our X and our Y and we generate four we generate four different data uh quantities here which is we generate training features, test features, training labels and test labels because again we are generating these four different we're generating data on these two different sets. A training set and a test set. So we have training features, training label, and then test features, test label. Okay, that's why it's so important to split apart our data into the X and the Y. We need those split apart in order for this part to work. So by the way, these two steps we will always do for any model we build. We'll generally do X and Y and then train test split in order to generate the data that we will use for building our model. Okay. So this this data here is going to be what we actually use to guide the training of our model. So it's definitely supervised, right? Linear regression um we we will use that Okay, so we haven't built the model yet. We're just getting our data split apart and ready for the training. We haven't actually built our model yet, right? That'll be coming up uh in a moment. But this is getting our data ready. We started with our data frame. We split it apart into uh an x and a y. And we split that into a train test split. And um you know then we can uh then we can go ahead and um pass in to our model training which we'll do in a moment. Um you that's a good question. You could run so what you could do is you could run um should we import numpy? Let's see. We did. Okay. You could run the average on the um you could check the MP mean on the X train and see how it compares to um see how it compares to X. So you could you could do that and see what the average of this feature is um compared to the average of the original. They may not be perfect because we are taking a reduced data set size. So I don't think there's really any good there's not like a one-sizefits-all validation we can do because we're taking a random shuffle and taking a percent. We're taking 70% of the data out. So we're not guaranteed to maintain the same statistics. We can see if they're close. Um but does that make sense? Like we're not guaranteed to get the same stats because we're taking a slice of it. We're taking 70%. So it's not guaranteed to to to be the same distribution really. Delete that. Uh is it good practice? Yes, it is. It is. Uh 30% is the industry standard. Anything between 20 to 30. So 0.2 0.25.3 any of those are acceptable. It's really up to you. Um I mostly see 30%. Mo I think.3 is is a good good practice to use for sure. Um I did explain random state. Uh random state is so that you get consistent shuffling. Um you can set this to any integer that you want it to be. It it doesn't really matter. Um you can set it to uh 100, you can set it to 10, you can set it to 15. Um it just ensures because what this split will do is it will shuffle the data first. It'll shuffle the rows and then um split it apart into the into the train and test sets. So you set the random state so that the next time you run this you get the same consistent shuffling. That's the only that's the only thing it it helps you with because it is randomized but when you set a random state um it's so that like if you run it again you'll get the same shuffling. You'll get the same the shuffling matters because it it it uh dictates what ends up in in these sets. Okay. All right. So let's see let's do let's build the model. Um and let me show you how easy this is going to be to build the model. And this is really how it's going to be for every single scikitlearn model will basically look the exact same for training it which is what's going to make it really really nice. So the first thing we have to do is import our model. So from scikitlearn we're going to be using a linear from the linear model package or the linear model module I should say within sklearn we're going to be importing the linear regression and we're going to create an instance of the linear regression here. Okay, so linear regression and look how easy this is going to be. Nearly all nearly all sklearn models use ffit function to train. So every one of them, no matter which one we use, like the decision tree, like the um logistic regression, any of those like we use for classification that are going to be coming up in lesson four, they're all going to look the same in terms of it's going to run.fit, which is um scikitlearn's uh generic function for training your model. So this will execute the training once we run this code. And what that again the linear regression training is going to do that least squares distance procedure or algorithm to try to find the right weights. So it's trying to find those weights that minimize that squared distance uh from our line that it's trying to build to the data. And what I want you to notice is what we put into the ffit. See how we put in the training data where we put in the training features and we put in the training labels. Now this is supervised. So of course we put in the labels, right? Of course we put in these labels here and of course we put in our features here. So, we're putting in all of our examples from our training split into this.fit, which is going to train the model uh so that we can we can use it for prediction. Okay, it's really fast. If I run this, it's going to be pretty much instant. Pretty much instantly it gets trained. And you can see here we now have a linear regression. you can see in this little box. Um, and it and this information says that it has been fitted. So, it's now ready to be used, right? So, we now that's it. We've trained our model. We tr That's how easy that was. We did fit. Now, what we should realize is there's a lot of work going on behind the scenes of this ffit. Okay, there's a lot of work being done there to do the least squares algorithm and find those weights and and create that line of best fit. Right? So there there's a lot of work being going on there that's going on there behind the scenes, but scikitlearn is abstracting it away for us. Right? And all we have to do is fit when we're using this code. Really easy. Really easy. Fit. And there we go. We've trained our linear regression model. And by the way, if you want to see what the coefficients are, you can actually extract them if you do so if you take your lin regression and you do um coefficients like this coeff with a with an underscore. So this gives us the trained weights coefficients also known as the coefficients right. Um so if you run this you can see uh right now we have this coefficient here um which is the only coefficient we had on our feature. So we only had one feature coefficient there. And we can take a look at our intercept which is this. So this gives us the train weights. And so we can look at the intercept, we can look at the the the coefficient. Um so obviously if we have multiple features our model has many features it's going to have more values in that coefficient but the intercept is just the single value 7.23 and then the coefficient is 0.046. So that's the weight that gets learned. Is there a size limit? No, not really. There's no size limit. Um, no, you can use as much data as you want. There's really no size limit other than what like what you can fit in memory. I'd say that's the only limit is basically what the amount of data that can fit in memory. Okay. All right. Were you guys able to run this? Were you guys able to run the linear regression ffit? Okay, perfect. Perfect. Do you Okay, great. Great. So, we have a model and we can use it to predict. Um, and so that's actually what we're going to do next. If we go down here, um we're going to have a function that's going to um build a scatter plot of our original test data. Um so we're going to have our test data here. Um, and we're going to then take our uh we're going to take our training data and plot we're going to use the uh this data versus our sales predictions. So you can see we're going to you this is how by the way this is how you use the scikitlearn model to predict. You have a fit to train it and look at the function you use to predict. It's literally just called predict. That's how easy it is. and you pass in your data, all your features into this predict and it generates a prediction for every row. So every row in these features in this data frame um will end up with a prediction using our model. So what we're going to do is plot our training date uh features against the predicted sales to see how good of a fit that really was. Okay. to see to see the regression fit. Okay. And so there's the regression fit. We have all of our test data here plotted in the green. We have our blue, which is our um we have our our blue, which is our uh um training data line that we built our model on. So that's a pretty decent fit. Um, and then our test data is here. We just plotted in the green scatter. But the thing I want you to see is this prediction, right? We we were able to generate some predictions on that training um by running our predict function with our model. Now, this model has been trained. So, we've already fit it and now we're using it to predict, right? And so, we're predicting the sales and plotting that on the y- ais. So the sales are we're using the predicted sales there which is our blue line. So this is our line of best fit. So this is our model prediction. This is our model predictions. Right? You can see it's a pretty decent uh line, right? Pretty decent line of best fit. Of course, there's some error here. Like there, you know, it's not perfect, but it it does a decent job of being a best fit line. Okay. So look how easy that was to just to recap this to fit our model was a linear regression.fit and of course we're going to do more examples. So no worries uh on that we're going to see this many many many times throughout this notebook but we have linear regression.fit to train it and then we have linear regression.predict to and we pass in our features and that generates a predicted output. Right? So what this is actually doing is is computing this quantity. We could do either. We could do either. Um, so we could do so one thing we could do is plot uh so we could swap it out. We we could do either one. It doesn't it's not a big deal to do the training set. We could do so we could plot X test and then we could plot linear regression X test. So it's it's a similar line. Um it's just different input features, but the line is going to be the same. Just different inputs, but the coefficients are the same, right? It's the same line. It's just we generate different outputs. So yeah, you could do either one. This is this is honestly this is probably better. I see what you're saying. This is probably better because this is the line of best fit through this data. So that probably makes sense to do to do predict on the test set. Agreed on that. Probably makes about most sense. But you could do either one. Yeah, I think that would be the most I think that makes the most sense is for it to be on the same one just to validate. So like we could do we could do training here and then train and train just to see how that data lines up. Really what we're trying to do is have our scattered data and then our line of best fit on the same plot. That's all we're trying to do, right? So yeah, I think I think they should be the same. I think that makes sense. These values or which values do you want to see? Yeah, we could uh we could generate those if we just do um let's go down here. So the the line values um are going to be uh the prediction. So, um the the uh test predictions equals um test predictions equals linear regression.predict X test. And then we could uh we could print out our test predictions. Yeah. So, we can see what those actual values are on our uh on the test set. Yeah. Um, we will do that. Yeah. So, you thought we were checking how well our data was trained. We will do that. Yes. We haven't learned how to evaluate this yet. We're going to talk about that coming up next. Yeah. We will do that. We just haven't learned how to do proper evaluation of a regression model. But yeah, it's something we're going to talk about for sure and see how to do in our code. Okay. All right. Any other uh questions on this example? Again, big takeaways. fit to train it and then predict to use it predict on the features to use the model and make predictions with it. So here is an example we we made all the predictions. This these are all the values that are on that line. These are all our predictions and notice they this is a truly regression right? These are all floating point values. Um, so this is definitely a regression, right? Okay. Uh, that's a good question. Um, I'm not sure if there is If there's like a verbose there's not really no there's not really a verbose you can I mean you can look at the source code if you really want to see you can view the source code to see um how it's done I can tell you I mean so generally linear regression is done in two ways either you use a formula um to to solve the optimization problem of minimizing like this this distance from the points to to the line. Um, or you use something called gradient descent, which is how a lot of these things do it is they iterate through a bunch of different iterations where they update these weights according to um a certain uh basically a gradient of the the error function. The error function in this case is the is the squared distance from the line to the uh to to the points. So uh we can compute the gradient of that and do um gradient descent. So if you really want to look into it, I would do some research on like linear regression gradient descent. Okay, linear regression gradient descent to see how that's uh how that's being done. Yeah, it it's it's a pretty simple procedure. Um, again, you have the the notion is that you want to minimize minimize the loss or the error. Uh, in this case, the loss is the square distance. So, it's like um there's like a uh it's a formula. It's like a sum of a square distance from your prediction um or your label sorry to your model which is the uh beta 0 um plus beta 1 x1 plus beta 2 x2 etc like your model and then squared. So this square this is the squared distance here and you're minimizing this guy which is like a calculus problem. You you find you basically find the this is this is I'm getting so far into the weeds of this, but this is like a parabola and you work your way No, no, you're good. It's it's it's a good question. Um you work your way down to the minimum of it. Does that make sense? Like you're working your way down here and you do that through a descent process, like a descent iteration. Um so that's how these are found. Um, but you don't see that happening in the background. But if you look at the source code, it I guarantee you it would be it's either going to be this or they're going to use the they're going to use a a a matrix formula to basically solve an equation um that involves this basically the derivative of this set equal to zero and you find the minimum. Either way, you're finding the minimum of this. Okay. But yeah, I don't think Psycharn has like a uh maybe there's some type of verbose flag you can look for. I don't think they have that though. Not that I've seen. All right. So I have uh an important um concept to talk about next which is going to be uh called overfitting and underfitting um which is a really important concept that's related to the training and test data we just split apart to do evaluation. And um essentially the the issue with machine learning is that it's not perfect and it can struggle in different ways. And the two ways that it primarily struggles is going to be overfitting and underfitting. So overfitting is a situation where the model basically memorizes the training data so well that it's it fails to generalize to new examples. So what we see with overfitting is this exact sign here where we have really good performance on the training data. So when so when we do that train test split we see a really good accuracy or really low error on the training data but it does not perform anywhere near that on that test data split. So what that means is that the model is overfitting to the training data. it's basically memorizing it and it's not able to generalize very well. Now, why does that happen? It's usually because the model is way too complex. And that means generally you need to do something to reduce the complexity. Either you need to use a simpler model or you need to use some type of technique to mitigate overfitting. And we're going to we're going to study some of those techniques coming up in this notebook. Uh we might not get to it today, but we're going to study particularly what can we do to prevent overfitting because overfitting is the more common issue with machine learning models. They tend to do so well at learning from data that they pick up on small details and patterns in the training examples that they're exposed to. They don't do a great job at generalizing to new examples. they can struggle with that. So that's overfitting is struggling to generalize to new examples, but you do really well on your training data. So it appears like you have a good model, but it it's not able to go and make predictions on test data very well, which means we would not want to use that model in the real world, right? Because it's not able to generalize outside of what it's already seen. And that's not a good thing if we're trying to use it for real world examples, right? So overfitting is a real issue. Um you see it all the time. I've seen it many many times in the real world, real industry uh work that I've done. Overfitting is a is a challenge for a lot of machine learning models. And so we need some techniques to overcome overfitting. And we're going to study some of those uh coming up shortly. Um, one of the things that we can do, one of the one of the things that we can do to detect overfitting is exactly what we just did, which is you split apart your data into training and testing so that you have a chance to do an evaluation to see if you're even overfitting in the first place. You want to see that performance be consistent from train to test, right? You want to see consistency. What you don't want to see is performance that drops off on the test data. It's much worse. You don't want to see that. That means that your model is overfit uh to your training data and it's not going to perform well in the real world. Okay. So, we're going to have a couple ways to uh overcome that. Talk about that. Um now, the opposite can actually happen as well, which is called underfitting. And underfitting refers to the fact that a model is too simple and it actually just performs poorly across the board. So if we see poor performance on the training and testing data, that's a good signal that the model's underfit and that means it's too simple usually and you should try using something more complex. Um, so the best way to combat underfitting is to use a more complex model. And as we go through and learn about the models, we're going to learn about which ones are simple and which ones are complex. So we're going to have a scale of kind of complexity. And if you're underfitting, you want to bump up to the to a more complex model. If you're if you're overfitting, one way of combating that is to actually go down to something more simple. Go the opposite way to something simpler. So we need to learn right now we've only learned linear regression but we will learn other models you know in the future and we'll we'll talk about uh their complexity and how they're related to each other. Okay, but these are two issues we see just to draw that out again is if we have a train test split where we have 7030 split let's say and we perform really well over here but we go to apply that model over here and it fails it's accuracy drops off significantly more error that's that's definitely overfitting which is not good right and then underfitting is just not performing well in either case so even on the training data itself self your your accuracy is not very good. So you're not really learning effectively. You're underfitting your model. So that's that's um underfitting case. Okay. All right. Now the issue is that it can be very difficult to balance these two and get it correct. That's what makes machine learning a little bit challenging is getting this balance correct of simplicity and complexity. So you don't want to be overly complex that you overfit, but you don't want to be overly simple that you underfit and you're not able to learn effectively. So there's a bit of a tradeoff there. And this trade-off is typically known in the community as bias variance trade-off. Um in which case, uh it's basically like a complexity simplicity trade-off. That's another word for that. Um, and so, uh, it's it's thought that, um, if you, uh, if you have very, um, if you have a situation where you're able to fit the training data very well, you risk not being able to generalize. In other words, you risk overfitting, and it's hard to um, it's hard to combat that in a way. Um and um on the reverse side, if you have something really simple, um you risk not learning enough. Even if you're trying to combat that overfitting, you risk not learning enough and your model just doesn't perform as well as it could. So, there's a bit of a trade-off there of trying to find the right balance between something complex enough to learn, but something not overly complex that it's going to not generalize to new data. That's the challenge. Um, like I said, we are going to have techniques to overcome this. So luckily there are things to basically overcome this trade-off and um and help us along the way so that we don't overfit. They basically prevent overfitting um and allow us to use complex enough models um that that won't be overfit. This is in the um this was in our uh lesson 3.2 notebook. So you want to pull that one back up. We were working on Monday. Um, and just to recap this a little bit, remember we were building a linear regression, I wanted to recap some of the steps we took there, um, that we will be doing over and over again. And really the same kind of steps, uh, that we do here, we'll do in a lot of our model building. Pretty much all of our model building um, that we do, whether it's regression or classification, doesn't really matter. um we'll still be doing a lot of these steps which are um remember first we split apart our data into kind of a features and a label uh x and y and the reason that's important is because um the model training uses the features and the label um to help train the model right they use those separately um so we want to split those apart whenever we can and so we have usually uh it's a good practice to call your features capital X and your labels lowercase Y. And what we do with that is remember we immediately split that into what we called a training in a test set. And the picture we had for that was something like this where we had about 70% of the data we used to train the model against and then the other 30% of the data we use to test the model against. Meaning that we build a model over here and we apply it to this set over here um to make predictions. And then the that's where the supervised learning really comes into play, right? is on this test set. We already have the answers. We already have the label and so we can apply our model to this to the features over here. Predict uh what the the label should be and compare that. We can get a a metric, right, that compares how close we are in our prediction to the actual values. Um and that was some of our performance metrics. So I'll recap some of those that kind of measure that distance away from our predictions to what the actual label is. Um but remember we had this train test split function which helps us split apart our features and our labels into these uh four sets of data. So we have our training features, our testing features and then our training labels and our testing labels. So we have all of those and um really these two guys are going to be used to train the model. That's why they're called underscore train going to be used to train that model and then the then we're going to predict on these set of features and then com use those predictions to compare to this set of labels right that's on the test test set. Um and you notice here our test size is set to 30%. Um, that's a pretty standard number. Anywhere between like 20 to 30% is pretty standard. Um, we'll typically use.3, but it could be 02. Anywhere in between is fine. Okay, so we had that. Hopefully that uh we remember that from Monday. So we had a train and a test set. And then building the model was actually really really easy. Once you have those train and test sets, um, we just import our model object. So from uh scikitlearn sklearn um linear model uh module from that package we import the linear regression model and then we do um linear regression.fit and we pass in our features and our labels and this is again this is where that supervised learning is really coming into play because we're passing in these labels. That's really what makes this work, right? We need those labels to help guide the model to make those updates. If you guys remember, the model is something that looks like this. So, this was a bunch of different coefficients um times the features, however many we have. Um, and so these labels are really taking the place of this and they're helping us um make the correct updates to these to these coefficients or sometimes we call them weights. Um, these B 0, B1, B2. Um, we find out what the optimal one is to get the best fit, right? To get the line of best fit. Um, that's what the model training when we call this fit. That's really what it's doing in the background is finding all those coefficients, right? To end up with the line of best fit that has the lowest amount of error. Okay, so hopefully that makes sense. That's just a fit um to train our models. And that's really going to be um the case for uh pretty much every single model that we uh train with scikitlearn. It's pretty much going to be a fit. we pass in our training uh features and our training labels. Okay, so we had that and this was the visualization of that where we had our test points kind of scattered and we see our line of best fit is the one that goes through there with that minimal error. That's that's the whole goal. Pretty decent predictor. Okay. And then we talked about overfitting, underfitting. So just to recap this, overfitting is the concept of our model basically memorizing our training data. It performs really well on that training set, but it is not able to generalize outside of that. So it performs poorly on the test set or data that it's never seen before. Um, and that's overfitting. So the reason that it overfits is generally the model is too complex and it needs to be um it needs to be simplified a bit. And one of the things we're going to do today is see a couple of ways we can alter the linear regression model um if we are overfitting to prevent overfitting. Um so there's going to be ways to handle this. Um and so we're going to explore some of those today. uh underfittend is kind of the reverse of that. Remember, it's where the model is not learning enough. So, the performance is poor even on the training data. It's not good on the test data either. Um that is a sign that the model is probably too simple and maybe we should use something more complex like go from a linear regression maybe use a polomial regression. Um or maybe use an entirely different model altogether. Um, if we're underfitting, our performance is poor, it's a good signal we should try something else. Um, okay. So, we talked about those and one of the things we also talked about was evaluations. If you guys remember, we had different metrics that we could compute to get a gauge of how good our model is actually performing. Um, one of those was MSE, which is this mean squared error function. Um so we did this example during class last time on Monday um where we uh were able to generate the mean squared error. That's one of our metrics. And we can see what the mean squared error is on the training set and see what it is on the test set by um just passing in our um training predictions and our training labels, our test predictions and our test labels and pass those into this mean squared error function and it computes the MSE and that's that's a helpful function from the scikitlearn metrics um package um or module I should say and we'll be using that quite a bit to do you know evaluation of of especially of regression, right? Mean squared error is pretty is probably the most common uh performance metric we can have. And if you guys remember what it's really doing is measuring these distances. So mean squared error is kind of like the average distance away from our our points to the actual um to the predictions which the predictions are all on this line. Um, so it's like measuring on average how how much error do we have on average, right? Um, and the idea is the closer to zero the better. Generally means that the distance away from our prediction to our points is pretty low. The closer to zero it is. Um, which is pretty desirable. So a low MSE is kind of what we're looking for. Um, closer to zero the better. And so um if one model has if one model has um a low lower MSE than another, it's it's a better performing model, right? It has less error. Okay. And then we also looked at the R r squared or sometimes known as R2 um score. Um this is another metric that we could use that measures the the variability um of uh the predictions and if our model is capturing that variability um well um and so R squar is has a range of 0 to one one is better that means the model is capturing the the changes in in the um output it um our predictions follow along with those same changes um so they're pretty close um so closer to one would be a better score. So we have those kind of metrics. So like on this data um this would this would show that this model was underfitting remember because this mean this MSE was bad and this MSE was bad. Um and what we should think of these in the units of what our labels are. um especially if we take the square root of this the RMSSE that was another metric we had um the square root of this is actually in the exact units that we um have for our labels. So, uh, in this example, this was the, um, this was the the units or the sales versus the TV products, right? Um, and so this would indicate that on average, if we take the square root of this, um, and the square root of this, um, we have, uh, um, we're on average about 11 sales units off squared. So if we take the square root of that um it's somewhere around three to four um somewhere in in between three and four units off and this is as well um and because both of these are still not close to zero um this would be under fit and this shows that as well. This isn't that close to one. It's decent but it's not um not that close to one. So we would say and performance is poor on both training and test sets. That's the key indicator of underfitting. It's poor on both. Yeah, exactly. High MSE correlates to underfitting. Yes. Yes. And it what's key is it's high MSE on both on both the training and the test sets. If you have a high MSE on your test set, but a low MSE on your training set, that's overfitting, right? where it's not generalizing from the training set to the test data that it hasn't seen before. That's overfitting. So the key is high MSE on both sets. All right. So we talked about that. Um we did polomial regression last time. So that was um doing that was uh making a curved graph um by transforming the features into polomial features and then doing linear regression with that. So you guys remember from Monday we did this where um we took our features and uh transformed them according to this polomial features from scikitlearn. So we can go all the way up to degree whatever degree we want. So we put in four here but there's nothing special about four really. This is just testing it out. Um, and we generate the the polomial features and we can fit a linear regression on those polomial features and we get a slightly better model, right? Um, it fits the data a little bit better than just a straight line. This curved line with the polomial features um performs a little bit better. And we could see that with the MSE, right? we could evaluate the MSE of this um and it would be lower. It would be lower than the curved line. And so that's something we could do. Um we would just have to pass in these test predictions, the training predictions and then the the test labels and training labels and pass those into the mean squared error function and we could compute that, right? Wouldn't be hard to do. All right. And then finally where we left off um you know is on our performance metrics. So we talked about mean squared error. That's that average distance away from the labels to our predictions. Um and we take the square root of that. It's it's basically measuring the same thing but it's the square root of it is um more interpretable because it's in the same units as our label. Um mean absolute error is is the average distance of the absolute value. So it's not the square distance formula like a uklitian distance but it is a absolute value. So it's a little bit um less sensitive to outliers. They don't get magnified as much. Um but it's not typically used as much as a mean squared error would be with regression. Um we talked about that last time because um the distance formula or that distance is actually what's used to train the model. So it's a more natural um fit for a performance metric for it. All right. And then we had R square. We just talked about that closer to zero would be um worse. Closer to one would be better. That means that the model explains um all the variability in the in the predictions. It captures those predictions um closely to the labels um very well. So, uh, one would be better. Closer to one would be better. All right. So, that's where we left off. Um, we're going to pick up from there with cross validation. Um, we've actually already seen one method of cross validation. So, we're going to study u we're going to kind of recap that and and then um talk about cross validation in general um and look at some more sophisticated techniques of it um coming up next. But before I do that, any questions about anything we've covered um to this point in in the recap or anything from Monday? Any questions on that? All right. So, let's talk about uh cross validation. Um now this term cross validation refers to a technique that evaluates performance and what it does is it divides our data into essentially um training and test sets which we've kind of already seen and then we are able to train a model on on the training set evaluated on the test set and that's that's where we get the name cross validation because we're crossing over our model from one batch of data used to train it over to another set of data used to validate those predictions. Um, and there's actually different ways to do cross validation. So cross validation is a bit of an umbrella term for multiple ways to do that. We've already seen one way of doing that um, which I'm going to scroll down to is um, known as a hold out cross validation. So that's um, what we've been doing so far. So this is just um generating a train and a test set train test um split. Um that's the that's what's known as the hold out cross validation method. Um and and this is exactly what we've been doing so far, which is you split your data into some type of split, usually 7030 um of a train and test and then you um train your model on this section of data and then apply it to this to evaluate performance. Right? So that's that's what's known as the hold out method. Um it is uh you know relatively simple. It's pretty fast to do. Um, but there are more robust ways to try to divide up our data a little bit uh more evenly. Instead of just having one split, we can actually do many splits, which is the idea of um the next kind of cross validation I'll cover. But hold out method is one that we've already studied. It's the most basic type of cross validation you can have. Um, so hold out. This is the most basic and we we've already been we've already been uh working with this type. Okay. So we've we've already seen hold out method. Let me uh explain to you a more sophisticated method, a little bit more advanced of a cross validation um which is known as Kfold cross validation. So this is um going to be a little bit more advanced of a technique but this is the idea of kfold is that you take your data set and you split it into k number of what are called splits or folds. So you take your data and you let's say it was let's say k equals 5. So we have five splits here. Okay. So let's say k equals 5. we have five splits. So what we're going to do is we're going to we're going to train our model on K minus one of those folds. So if K was five, we had five splits. We're going to take our model and train it on four out of five of those uh splits. So let's say it's these four. We train it on these four. Okay. And then what we do is the one split that's left over, we will we will test our model against that split. So we'll test here. Okay. Now, this sounds very similar to the hold out method where we're doing a train test split, but it's a little bit this kful cross validation a little bit more sophisticated because we repeat this process that I just mentioned over and over for all combinations of the splits. So then what we'll do, this is just one trial that we'll do it again, but this time we will pick um four different splits. So, this time we might pick, let me do blue. This time we might pick this one, this one, um, this one, and this one. And then those four we will train our data on. And then we will test against this one. Okay? And we'll do we'll repeat this repeat for all combos of the folds. Okay. So we'll repeat that. So essentially what we're doing is rotating through. Every time we rotate through one of the folds is going to be left out as a test set. Now this is a little bit more robust than just a train test split, right? because we are exposing our model to more of the data in in doing this, right? Because we're going to split it evenly into five or 10 splits. Those are pretty common um number of folds to use. 10 or five. Um those are the ones I've most commonly seen. Um but we're going to by rotating through which folds are being used for training, which ones being left out. um we are exposing our our model to more of the data this way than just doing a single train test split. Right? So now what do we do with with the results is every time we do this we we generate um an MSE let's say or some type of performance metric. So let's say we generate an MSE from this guy. We generate an MSE from this version and we generate an MSE for all combos. each combo we generate MSE and then what we do is we average the metrics or the in this case uh if we use MSE we would average those together. So every time we do a fold combination and we keep four of them for training, one for test and we rotate through all those combinations, we are going to generate an MSE for every combination, then we're just going to average those MSC's to get a final. So the final MSE of cross val of this kfold. So the final metric is just the average of the uh performance on all of the fold combinations. Okay. So our final MSE, we just average all those MSEs from all of our combinations. Okay. Now, what's the advantage to doing this? It's way more robust of a estimate of the of the performance of the model because we're exposing it to all basically all of our data, right? We're getting a sense of how it performs across all those different folds. Um rather than just doing a single train test split, which is a bit it's basic, it works, but it's a bit basic. Um so this is more robust estimate of the performance. Now, what's the drawback to doing this is that it's more intensive. So, if you have a lot of data, this is going to be pretty expensive to do because you're going to have to especially you have a high number of folds, right? You're going to have to divide your data into k number of folds and you're going to have to do this over and over again. Um, and if it's a large data set, it might take your model a long time to train. It's going to be a little bit more uh computationally intense than if we just did a train test split. Okay, we just did a single like 7030 split. We only do that once. We only train the model once, right? We train it on the 70, apply it to the 30% test data, and evaluate performance that way. Um, so we're only really using the model and training the model once, but in this kfold, we're going to do it um, you know, k number of times essentially or I should say one for every combination that we have to work through of of all the folds. Okay. All right. Does that make sense? Any any questions on kf fold cross validation? So K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K is an important uh number here. It it's how many folds how many splits do you have? A typical value for K is going to be somewhere like five or 10. So 10 folds or five folds. Those are pretty pretty standard from what from what I've seen. But does the does the concept make sense or is there any questions on it on in terms of um you're always going to leave one fold out. You're going to split it up into K number of folds. Always leave one out. Train on the rest of it. Evaluate on that one that gets left out and then rotate those through. And you're going to do that for every combination and average all those metrics. And by the way, there's going to be an easy function in scikitlearn that will do this for us. So managing all these combinations will be really easy. It's actually just built into scikitlearn. So we don't have to um we don't have to do this all by hand. Okay, this will be in scikitlearn. It'll handle doing all these combinations of folds for us and computing the average metric will be really easy. So um we don't have to worry about that. We're going to see an example of this coming up shortly. All right, of kfold cross validation. But this is a this is a really widely used technique. And again like the purpose you may be wondering like what's the purpose ultimately of doing this? It's to get a sense of if our model is going to perform well on new data. That's really what we want to know. like is the model going to perform well when I start to use it on new data that it's never seen before and this kffold is a decent indicator of that because we are varying which data it sees across many different folds right so it's a it's kind of a good um proxy to exposing it to different kinds of data each time and seeing how it performs right because we're working our way through each one of the folds there's always going to be one fold left out. We're going to change which fold gets left out each time. And um that's sort of mimicking the idea of we're going to apply our model to new data and see how it performs. And it's it's new data every fold. um how we know which model is best suits for which scenario because we have Yeah, that's a good question. Um, so my we're going to learn this as we go along because we haven't covered all the models yet, but generally the best advice I can give on that is you you generally want to start as simple as you can get and then if it's not performing well, then work your way up to something more complex. So we are going to have models that are simpler. We're going to have models that are more complex. The rule of thumb is to start with the most simple model that works. So you're usually going to have the same ones that you're going to try in the beginning. And linear regression is a very simple model. It's usually the first one you want to try for regression because it's the simplest. Um, and for classification, we're going to have a similar like logistic regression is the simplest kind of classification model we could have. So usually want to start with that and then if it underfits like if we see it's producing a lot of error then we work our way up to a more sophisticated model. So um that's the way we that's the way it should usually go is simple to complex it based on their performance. So we evaluate it and then we can repeat the process. If it's not performing well we can try something different that's more complex if it's underfitting. Uh this is a good question. Does a model reset after training each K minus one fold? Um yeah, it's essentially like a blank model every time uh every fold. So um we imagine like you have a brand you have a fresh model every um k minus one combination. Yes. And the reason the reason it has to be that way is because you don't want the other folds influencing the model that like on on the next combination. You don't want the previous combination to influence the results on the next one, right? Um you want it to be a fresh evaluation on every combination of folds. Okay. All right. So, let me describe to you a variation on what we just um talked about with the K-fold. So, there's another cross validation known as stratified k-fold. And um this is the same exact procedure as k-fold except that when we this is used for classification. Um so when we do classification uh we want to make sure that the different categories are going to be um split amongst those folds in a proportional way. So we don't what we don't want to happen is um when we split apart the data. So, let's say we have let's say we're predicting um spam not spam. What we don't want to have happen when we do our splits is we don't want to have all of the spams end up in one fold and then every other fold has no spam, no spam, no spam, no spam, right? That's not very good. Um because if we if we train against all these guys, we have no shot at predicting spam when they've never seen spam before. So stratify kffold is is used in classification and it's to um it's to make our splits ensure that they have basically a balanced number of categories for each split. Um so that we don't end up with certain splits with way more spams than not spams. Um so we we do what's called stratifying where we make sure the proportions are balanced across each uh split. So this is only really useful in classification, not really necessary in regression because we're predicting a value. But if we were predicting a category, like in classification like fraud, not fraud, we don't want to do the split and have every single fraud example um by bad luck in our shuffling and split end up in one split and every other um every other split has no examples of fraud. Right? So we want to stratify this to spread out those um frauds against all the other splits. Um so uh again um scikitlearn will take care of that for you. Um but if you're doing classification and you have an imbalanced data set um you you really want to make sure you stratify k-fold. um imbalanced meaning that you have a a um different number. Like if you're doing fraud, not fraud, you have way more not frauds than frauds. Um when where that category is imbalanced, you want to make sure it's balanced across all your splits. Um so this is this is useful in classification only, not really regression, which is what we're talking about right now. Um but it's just a variation on this that ensures when we do those folds um the data is distributed evenly amongst those folds as much as we can. The labels are I should say. Okay. So that's stratified kfold. It's the same same procedure once we have our splits. It's the same where we do k minus one of them. We train test on that last fold um and then rotate through all the folds and and average all the metrics. the same exact procedure. It's just the splitting itself um is going to be balanced in a stratified kfold. Okay. So, hold out we've already talked about um is just doing a single train test split. We've talked about that. One more variation that is a bit of an extreme version of K-fold. So it's actually the same process as K-fold but it's an extreme version is if you set K equal to the number of data points. So you basically are um this is a really really extreme kfold where you um basically are training on all the data. Um so you're training on all the data except one point and then you test against that one point. Um now why would you ever do this? Um it's mainly so for this reason here. It's to um maximize the amount of training data that your model gets exposed to because instead of just doing instead of just doing five splits um which would be like you know these four folds are going to be used and then we um test against one fold. Um we're essentially going to use 99% of the data, right? one point is going to be left out. 99% of the data gets used to train. Um, and then we're always going to leave out one point. And and the issue is we're actually going to do that over and over and over again and rotate that one point to cover the whole data set. So we're going to train on 99%, leave one that one point out, and then rotate through every combination of points until we've left out every single point and then average all those together. Um so this is a this is an extreme kffold. Again the number of folds is actually equal to the number of data points in this case. So we have every point is its own fold and we train on everything but one test on that one. This gets you the maximum size of your training data because you're basically going to have every point but one used in the training. This gets you the maximum size. However, it gets you the maximum uh expense. especially for large data sets. This is going to be usually you're not going to use this. Um especially for large data sets because it's just too extreme. It's going to take you a really long time to work through every single point being left out. Um it's just going to take a while to do. So for that reason, the leave one out um that that's why it's called leave one out because it's you're leaving one out every single time. Um is rarely used. I I don't really see it used that often, but it is an extreme version of K-fold cross validation. Okay. But rarely ever actually used. I think the the ones that get used the most are definitely the hold out method with just a regular train test split. Um and then uh the other one that gets used quite a bit is is K-fold or stratified K-fold if you're if you're doing classification, but certainly K-fold in the in a regression case. Okay. All right. Um, we're going to do an example with these guys. So, we'll do that next. Um, with with the different cross validation techniques. Um, but any questions on what they are doing conceptually before we actually do the code example? Okay, very good. All right, so let's see some examples. Um let's go into our code and build a model and do the different cross validation techniques on it. Um you're going to see it's actually going to be really easy to do and we it sounds complex like doing the kfold and leaving one out and testing it sounds kind of complex but I promise you scikitlearn makes it really easy to do. Um and so uh we won't need to do too much besides just use the right uh tools from scikitlearn. Uh so we're going to we're going to see that. Um so here we have some imports. The um primary uh thing that's a little bit new for us is going to be these um different kinds of cross validation techniques. So we have our kfold, we have our stratified kfold, leave one out um which are those different cross validation techniques. Um these are going to be used in combination with this cross val score which is going to keep track of the different um metrics and then average them uh while we do one of these um cross validation techniques. So this guy gets used in combination with one of these to um as as we're going to see in the code uh to average those metrics. um doing the different folds, right? Perform doing performance against the different folds. Okay. And then of course we need a model using linear regression. That's that's the one we've studied so far. Um and then we have just a regular metrics if we want to compute those. Um using maybe just hold out, right? And hold out um which which is just a regular train test split. Um we could use these guys to evaluate performance. But in a more sophisticated K-fold style of cross validation, we're going to use this to evaluate the the performance. Okay, let's see. So, we're going to be working with this housing with ocean proximity data. Um, you guys should have this one. Uh, so you guys should have this one. So, if you want to follow along and run it yourself, um, you can load that one in. Um, I want to make sure that I have it. Let me pull that one in. So, it should be this guy. I'm going to load that in so I can make sure I run it with you guys. Um, let me run this. Do you guys have that data? The housing with ocean proximity? It's another it's another housing data set. Um, but it it's a little bit different than the ones we've seen before. It has a a special feature for how close it is to the ocean at different locations. So, it looks kind of like this. If we load it in and do our head, which is usually what we do, right, we can see um we can see that it's got these features. So, it's got uh uh bedrooms, total rooms, um it's got uh median age. Now, this is this is looks a little strange for total rooms and um uh bedrooms and population, etc., but it's um it's it's got those uh it's got those because it's representing an entire neighborhood. So, it's an entire neighborhood and we're looking at this. Um this is actually going to be our label is this median house value for the entire neighborhood. So what's that median value uh in the neighborhood and this is the total number of bedrooms, total number of rooms, um population, households. So how many houses are there? Um median income and of course these are scaled. So these are um likely times you know uh thousands um but um that's our data. We could describe it. So we can see the average age average median age um which sounds a little um weird but that's it's because again this is the median of data within a neighborhood. Um so the average of those is about 28 or 29. Um we have u total bedrooms. The we can look at the men. There's some data that only has one. So it's likely only one house in there. Um which is what this represents. There's only one house. So there there is some neighborhood that only has one house. Um, and we see the median, um, we see the minimum, uh, median house values there. And then the maximum down here, um, is a pretty big number. 6,000 households is the largest that we have in any any one of these neighborhoods. Okay. So, just a little bit of description of the data. Okay. So then we can run.info. So this is um let me ask you guys, were you able to load this? Were you able to run this? If you're following along, were you able to load it and take a look at dothead? Okay, great. Great. Okay, so we're able to load that and then look at head. Perfect. Um Okay. Um and then we run describe which gives us that uh usual kind of statistical description. Uh so we can see some interesting stats about those. What do you guys notice about the info? Anything interesting that we see from there? Is there any missing data? Any features that have missing data? Can we see object? Yeah, object type usually is string. If it's an object type, that usually means string. Python when we read it into pandas it usually is just a string. So that that makes sense like we have mostly numerical features but then we have a this ocean proximity which is a string. Yeah. Total bedrooms has nles. That's right. Because you can see here this does not equal the number of uh rows that we have. This is the number of rows. There's about 20,000 rows. That's a good size data set, right? 20,000 rows. That's decent. Um, we're definitely missing some data here for sure. Um, we could count how much we're missing exactly by running this is NASUM. Um, and so we see that total bedrooms is missing about 200 uh 200 rows are missing total bedroom uh value. Okay. And then one thing I wanted to look at is yes, this is a string. So what remember what we can do with those? That's a categorical. So ocean proximity is a categorical string feature. So we can take a look at its value counts which is usually a good idea to take a look and see what possible values that feature could be. So if we look at our um what are we calling this? Housing data. Housing data. Ocean proximity value counts. So, here's the different types that that one can be. So, there's some neighborhoods that are less than 1 hour from the ocean. There's some that are inland. There's some that are near the ocean. There's some that are near a bay. There's even five of them that are on an island. So, these are the different values of the ocean proximity. So, remember, you can always do that. If you see a string feature, you can always take a look at what its um categories are. And it looks like most things are less than 1 hour from the ocean, but it's kind of evenly distributed here. Um otherwise very few islands. But as you can imagine like this feature is probably going to be important for determining um what the value is, right? Probably going to be important. Okay. So, um, we need to deal with these nles. If we're going to build a model, right? So, um, this is all of our typical data prep. If we want to build a model, we're going to have to deal with these NLES. What do you guys think we should do with the NLES? What would you what do you think for total bedrooms? What do you think is a good strategy to do? Keep in mind, we have 20,000 points, 20,000 rows I should say, and about 200 of them are null. Right. So about 200 are null. Um, so what do you what do you guys think would be like a good strategy to deal with those nles in that case average? We can't ignore it because we can't ignore that column. We can't ignore the whole column. So, something needs to go there. Probably don't want to make it zero. I think average is a decent average is a decent idea. probably don't want to make it zero because um that would indicate that there's no bedrooms and yet we still have a bunch of total rooms. So, it probably doesn't make sense to do zero. Average I think average could be a decent one. Now, in this example, what we're actually going to do is we're rows. We're actually going to drop the rows alto together. Now, why are we doing that? It's because we have so much data and only 200 of them are null. Okay, only 200 of them are null. So, we're actually just going to drop the rows. Now, that's a choice. Um, that's a choice, right? Is that we could fill in with the average like you guys are suggesting. What we're actually going to do is just drop the rows. it it makes up less it makes up about 1% of the whole data. So it's not that much of it is missing. We can drop those rows. So that's actually what we're going to do here is we remove all the roles with the NLES by doing drop NA. So this just drops them. So those rows are cut out. Um it's arguable that we could replace it's arguable that we could just replace it with something. And I think you guys have good thoughts, which is the average, a default. Um, assume total bedrooms. We could we could try that. Yeah. Assign a value based on comparable home value. Yes, you could do that, too. That's a good strategy is to look at the other rows that are similar to it and fill in a value. That's absolutely fair. Um, in this example, we're actually just going to drop those rows, but I think that's totally um totally valid. This is a choice. We could fill NA with different values such as the average total bedrooms um derive a value etc. So we could derive something which I think Brent you have a good suggestion that's a good suggestion. Um we could derive something like that uh and fill in the blank and that's I think that's totally valid. Um, we could take the average of the um bedrooms. Uh, I meant total rooms here. Sorry, total rooms. Um, we could fill in we could fill it in with the total rooms for that category um or for that row. Um, many options. In this case, we're actually just going to drop those rows because they make up such a small percentage relative to the 20,000 rows that we have. It's about 1%, right? 200 rows is about 1% of 20,000. So, we're just going to drop them. But that's a choice. We don't have to drop them. We could fill in with something. Um, and if we did that, we would use fill NA rather than drop NA, right? Uh after dropping the rows, how many? So it's just so after we drop the rows, um after we drop the rows, it's just going to be we still have all our other rows are intact, right? So if we look at this now, we now have um slightly uh slightly less entries. So now we have this this many um rather than rather than this many, right? We dropped those 200 But they're all filled in. Yeah, they're So all the other columns are still filled in. We're just we're we're cutting out the whole row. So if you think about our data set, um we have all these rows and all these columns. What we're doing is like if there's a null here, we're just we're just getting rid of that whole row, right? And so we still have all the other rows intact. Uh, we can drop them because we have a good sample size. Yes, that's exactly right, Ronald. Yep, we can drop them because we have we have 20,000 rows and only 200 are missing values. So, that's totally fine. Uh, drop a removes all rows that has any null. Yes, that's true. It it will go ahead and just drop any row where there's any null, no matter what column it's in. Yes, index. Yeah, the index is not getting reset. Um, that's true. So, um, what we what you can always do is you can reset the index. So, um, if you want to, it's optional. We we're not really going to use the index for anything that important, right? But what we could do is, uh, reset index. Uh, we could do that, right? Which will reset it. So now now it gets reset. But um let me actually I don't I don't really want to do that. I'm going to reset this. Um yeah, we could do that. Okay. So now importantly there should be uh no missing data of this of this new one where we've dropped NAS. Right. So now this is good. If you now the reason we had to do this is because if we try to build a linear regression and we have nles in there. Um the the issue is like how do you build a model where you have something like this and these are null like what do how do you multiply a number by a null? Um, we can't really do that, right? We can't really do that. So, um, so therefore, uh, we need to get rid of NLES like the the null is not really going to work in there. So, uh, we need to get rid of them for linear regression to to really have a chance to work, right? To train it and be able to use it. You got to get rid of those nles. All right. Any questions so far? So, we haven't done any modeling yet. We're doing some We're doing some data preparation before we get to the modeling. And we haven't done any cross validation yet. We haven't set that up. We're just doing our data preparation before we get to the modeling. Right? So, we've dropped some NAS. We've checked it. Um, we're going to do one more prep step, which is to um change that ocean proximity feature into something numerical because again, how do you build a model where you're inserting a string into those like beta 1, beta 2, beta 3 times the features? You can't really do that when it's a string. Um, so what we're going to do, and I'm going to get rid of this because I don't think we really need that. um is we are going to uh run this get dummies function which is our um our get dummies function is our usual one to uh our git dummies one is our usual one to um uh get our one hot encoding. So this is our uh one hot encoding here. We now are going to have data that's like this, right? So we have ocean. So So by the way, this prefix um this prefix is OP, which which is short for ocean proximity, right? So we have ocean proximity uh less than 1 hour from the ocean. Ocean proximity inland, ocean proximity island, near bay, near ocean. So these first five rows are near the bay. Um, so they have a one there and a zero in the other spots. So this is good. This one hot encodes that feature into these numerical uh values, right? Were you guys able to run that one? the get dummies. So the reason that Yeah, that's a great question. How did it go ocean proximity? It's because um that is the only uh string feature we have. That's the only one we have. So it it's going to look for any non-numericals and one hot encode those however many however many there are. So whatever objects we have which are strings, it's going to automatically oneh hot encode those. Yeah, we could have right we could have went here and did Right. We could have done ocean proximity, but we only have one of those features. So it's just going to do that to the whole data frame uh on that one feature. So what we're going to do is um go ahead and split it into an x and a y um which the x is always what includes our features. The y is what we are trying to predict which is the label. Now, um, in order to separate those out, what we're going to do is assign X to be the variable that is, um, our data frame minus this median house value column. So what this is doing is um uh it's not permanently dropping because we're not uh dropping it in place but it is returning us a copy of the data frame with the median house value column left out right it's dropped. So this is this is uh something we want to do because that will the rest of it will contain our features, right? So um this will temporarily or I should say return a copy of the DF with um median house value dropped, right? Median house value dropped. Um so we go ahead and drop that one. Uh now remember it's not permanent. It's just giving us uh the remainder of it which is this housing data dropping this and it's assigning that to x and then we're taking the actual median house value column from the original data and assigning that to y. So this is going to be our labels. Right? So this is what we are trying to predict. Okay, so that is our Y and that's always how it is. X is our features, Y is our labels. Um, hopefully that makes sense. What this is doing is this is going to get rid of that label column and everything else will be our features and then this will get rid of this will just assign the label column to Y. All right. And then what we can do is pass X and Y into our train test split function. And this will generate the hold out set. So if we want to do the hold out cross validation, this is how we would do it is we would split the data into X-ray, X test, Y train, Y test um using train test split. So this is what we did last time. This would be this would be for hold out cross validation, right? where we are uh uh just have that one one set for testing, one set for uh one set for training, one test for one set for testing I should say, right? So this is pretty standard train test split. Um we pass in that X, we pass in the Y, we use a 30% test size, which pretty standard and random state so that we get the consistent shuffling. If we were to run this multiple times, um we we get that uh consistent randomization. Okay, so we have that. And so now our X train is a percentage um of the data frame of the 20,000 uh rows. And the X test is uh 30% of that. So it's only about 6,000 rows, which is what um the shape of that is. Yeah, X. So X is our features. So we're we're putting all of our data in that is our features into X. And so the the um most efficient way of doing that is um the most efficient way of doing that is to uh just take our data and drop the median house value column because that's our label column. So we just remove that. The rest of the data is our features. So that's what that's what this X is, right? It's all of our feature data. All of our columns that is not the label column essentially is what that's doing. And then Y is our label column from our original data, right? Y is our label column. And so this this will um contain all of our labels, which is the median house value. X X X contains every column but the one we're going to so we we ultimately decide that but X contains um X is everything that is not our dependent variable which is what we're predicting. So we're removing what we are trying to predict from X. X should be everything else. That's always how it's going to be. X is X is always going to be all of those independent variables that we're using to predict the median house value. So we are going to predict the median house value. We need to remove it from X. So we're we're taking everything but that column. So it's the whole data frame. It's the whole data frame minus this one column with just the dependent variable. Right. Exactly right. Removing the dependent variable and keeping all the independence. That's exactly right. Exactly right. So think about it in terms of the model. Let's go back to the features, right? Think about it in terms of the model. We are trying to predict this this value. We're building a model to try to predict this. So we are going to make sure x is everything but this right. So this is actually just y. That's our label. That's our dependent variable. Right? That's y. Everything else is belongs to x. Everything else belongs to x including all of these. Right? We choose this one to be y because we're building a model to predict that. That's our label. All right. So, we have our we use X and Y to do our train test split. So, we have our our training features and our test features and then our training label and test labels here. Um, pretty standard there. Um, okay. So, this is what's new is if we want to do k-fold uh validation, what we're going to do is create a kfold object. So, we have this kfold from scikitlearn that we already imported. we are going to create a kfold um where we are going to specify how many folds we want. So that is the in uh insplits parameter as this says um this is going to be uh uh in this case we're going to do 10 folds. That's pretty standard. So I think the typical number of folds that I've seen and I've worked with in my in my career is usually five or 10. Five or 10 folds is the standard. Okay. So, we're doing 10 folds in this case and we're setting a random state because we're going to do shuffling. So, in order to produce those folds, we're going to shuffle the data first and then split it into five folds, right? So, this this kffold object is going to manage creating these splits for us, right? These even splits. I know I I didn't draw it even, but um it's going to manage these five folds for us and it's going to shuffle the data and assign them to these different folds and we're and then what we're going to do is use those to do our training. We're going to execute the cross validation using this k-fold object. Okay, so we create the kfold um we initialize our model as well. So, of course, in order to train something uh in the K-fold, we're going to need a model. In this case, we're using linear regression, right? Which is which is the model we've been studying so far. So, you have a linear regression. Um, now look how easy it's going to be in order to execute cross validation. All we need to do is um all we need to do is create a cross file score function um or I should say use the cross file score function from scikitlearn. So we use that with the model we want to train. So our model goes first. So that's the linear regression object. Then our data. So our extra our features and our label for our training. And then um let me skip over this for a second. I'll explain what this is in a second. Um but then we are using uh the cross validation technique is our kfold. So this is where our k-fold object goes in the CV parameter which is cross validation. So what cross validation strategy are you using? We're using kffold. And the kfold we're using is this one we defined up here KF. So we're putting that right here for this. And then um in jobs um allows us to parallelize this. So if we set it to negative one that's the that that's the default um it will do it will actually train across the different combinations in parallel. Um which speeds it up. So you want to you want to keep this to negative one if you can. So um now let me describe the scoring. So what this means is we put in our metric here. Um and so you can put mean absolute error, you can put in mean squared error. Um those are the two that we can use. And um the reason we it has a negative in front of it is because we want to find the one that has the lowest score. That's going to be our best model is the one that has the lowest score. So, we take the absolute value. I'm sorry. We take the abs the the the metric and we take the negative of it. Um because the highest scoring one is going to be the closest to zero. Um so, it's just a we use the we use the negative of the of the metric. Um because on the number line like the the highest um scoring one should be the least um or I should say the maximum negative that we can get. That's going to be close to this zero. So if here's zero this will be like -1 is better than -10. Right? So something that scores um the maximum negative uh absolute error would be closest to zero and something that has more is going to be on this side. So this is only the reason we need this is only just to keep track of the scores of each individual um fold. Okay. So the one So the reason we can do that is at the end we can kind of see which which combination performed the best. Um it's going to be the one that has the highest uh highest value of the negative which is closest to zero. That's just a convention. Yeah. It's just because um it's because the cross validation is looking to maximize the metric. So whatever has the best score um whatever has the best score is considered the best uh performance. Um but we are using uh something where lower is better. So we we take the negative and like the the highest negative would be closest to zero, right? The highest negative is going to be closest to zero. So that so it's it's just because like we want the lower score to be the best. The lowest score should be the best. So we take the negative of it. Um and so something that is more negative is going to be worse. Yeah, that's the reason. So something that's down this way is going to be worse. Okay. So it runs this And what you can see is if we actually print this out, if we print out our kfold scores, what we should get is 10 different scores. And you can see um we have 10 different uh scores here, which are all negative because we're taking the negative of the absolute of the mean absolute error. Um so what we would be looking for here is um we want to take the average of these scores but take the absolute value of them to get the best performance. So this is capturing like this is the score on the first fold combination. This is the score on the second fold combination. This is the score on the third fold combination and on and on and on and these are the absolute errors. Okay, these are the absolute errors. Um, so if we take a look at computing the uh average, which by the way, we don't need this import because we're using the numpy average. So that's fine. Um, we can take the absolute value of those um and take a look at the average MSE or sorry, MAE. Now I want you to think about this this uh average performance. So this is our performance right here on the cross validation. This is our average M AE across all of our fold combinations. So that's a that's an indicator of our performance, right? Um for the cross validation. Now what are the units of our original uh the original median value? They're already in the thousands, right? So if we go to that feature, they're already in these hundreds of thousands. So this is not a very good error. It's it's kind of high, right? Because it's in this is 49,000. Um that's that's how far away we are in absolute value on average is 49,000 um dollars on the median value. That's not very good. So this score this score is um not very good. So this model is not performing that well and we can see that by comparing this error to our actual uh data. So this is right around 50,000 and our median uh house values are in the hundreds of thousands. So on average we're 50,000 off when we make a prediction. That's a significant amount, right? It's a significant amount on average um when our when our data is in about the hundreds of thousands here. So we are um we have a significant amount of error 50,000 relative to the h to our units that our our data is in. Right? Um so this score is not very good. Um and so we see that from the cross validation. So look how easy the cross valid is. Again we just do cross file score. We put in our model. We put in our data. We put in our cross validation uh strategy here which is k-fold. And we can generate these metrics across all the fold combinations. So it's this function is taking care of rotating those and doing every combo with just the 10 different combinations here of the of the folds. 10 different instances where you have you know 10 different folds are the ones that are left out for evaluation. Um so it's managing that for us using this data right using this training data here. Um and we uh we generate these um generate these scores. Okay. So that's kfold. It's not hard to do. All you have to do is um just use a cross file score and we could change this to mean squared error. That's you know we could do that too. That'd be pretty easy. Um so that'd be no issue. We just happen to be using the absolute error here. Of course we could use squared error. Were you guys able to get this to run kfold scores? It produces an array of 10 10 different scores, which should make sense because those are these are the um we're splitting our data into 10 different folds, right? 10 different folds and leaving one out to do our evaluation on. So the one that gets left out every time is what's producing these scores. So there 10 different ones get left out when we rotate through all the combinations. And so we average these scores and we get this amount of we get about 50,000 in error on average. Um what do you think would be what do you think would be acceptable? So, if our if we're predicting the price, like if we're a real estate agent, we're predicting these prices and they typically are Yeah, close to zero would be great. That'd be fantastic. Closer to zero would be better. The average is um 206,000. So, 50,000 is a decent percentage of that. Um so, you know, you can compute it as a percentage, right? So, 50,000 is a decent percentage of that. Um, probably you want this to be less than 20,000 would be about 10% error. 20,000, right? So maybe like 30,000 somewhere in there. Yeah, 10% would be 5% error. 10,000 would be 5% error. That's true. That's true. So that would be that would be much better. So being closer to zero like the smaller the better, of course. of course. Um but yeah, I would say an acceptable percentage of error is probably 20%. Probably 20% which would be um like 40,000 or less would probably be acceptable. Usually when we usually when you build models um 80% accuracy is usually uh considered decent. Usually considered decent 80%. So, I'd say 40,000 or less would be kind of ideal. Does that make sense to answer the question? That's a good question. What value is acceptable? I think probably less than 40,000 would be ideal. That's right around 20% error. All right, so that's K-fold. Um, let's do just a regular hold out now. So this is just using our training and test data um doing model.fit and calculating an MSE on the test data. So this this is just the um hold out strategy here where we just have um this is less robust but it's a lot quicker to do and easier to set up. Right? So um this is using the hold out strategy. So just a regular um train test split. Are we going to rebuild the model? No, not necessarily. There's some things we could do most likely. And like one thing we did not do was scale our features. Remember I said that's a pretty important thing to do is to scale our features. We did not do that. So that would be an enhancement to this that we're going to So I I actually do think we'll do that later. Yes. So I think we will actually do that now that I'm thinking about it. Yes, one of the things we can do is scale these features using like a minmax scaler, a standard scaler. That's actually going to help us um that's going to help us do better predictions. So that that's one thing we could do. Um but yeah, we will we'll try to see if we can get better. It should help it. Yeah, usually you want to scale. You want to scale the data. That's something we didn't do in our preparation step. We did a lot of the things we should do. We removed nles and we did one hot encoding to the proximity feature like this one. Um those are good to do. But we didn't scale any of these other we didn't scale any of the features, right? We didn't scale any of them. Um it you it will have an effect. It usually when we scale it, it'll be a better model. it'll it'll learn a little bit better if we can scale the data. Um so that way like these um like ages aren't you know drastically different than like in scale than total bedrooms or income uh those kind of things. So we usually want these to be in a similar scale range. So we'll we will I think we'll scale them coming up in a bit and it should help the model. We've talked about that before, right? Scaling usually is a good idea to do when you're prepping your data for modeling. No, you want to you want to scale your test data as well. You're going to do both. You're going to scale your training data. You're going to scale it. So that's actually a good point you bring up is any transformations you do on your training to build your model, you should also do on your test set so you get an applesto apples comparison. You should always do the same transformations. Yes. Would scaling data impact K? Yeah, it could it could make it better. It could uh Yeah, it should impact it. We should get a better model. So when we do the different folds, we'll get different we'll get better scores. Yeah, it it will impact Uh yeah, if they're so that's a good point. If they're going to use our model, then yes, they have to scale the data as well. If they're going to if we build the model on the assumption that the input is scaled, then yes, they have to also scale their data when they're using it with our model. That's true. I mean, not really. I'll show you why. There's something that's actually going to make it easier. um that that will automate doing the scaling for them. So they don't they don't have to do the scaling manually. It'll just it'll happen automatically when they use the model. I'm going to show you something that's going to automate that which is going to be called a pipeline. So that part will be automated and they won't have to do that. So it won't be heavy on the user. No, in theory it is, but has a really helpful tool to make it easy to do that. So I'm going to I'm going to show us that um later on in the notebook. No, the data data is not for a single house. It's for like a neighborhood. So there's a certain number of households in the neighborhood. And this is the we're predicting the median house value of that neighborhood. Yeah. So there's a there's certain number of households. There's there's like an a median income, a population, certain number of people that live there. um proximity generally of where that location is. It also has a latitude and longitude. So, and a median age in that neighborhood. So, yeah, it's not just a single house. Okay, let's go back to this was the hold out strategy. So this is a lot simpler. This is just model.fit, right? This is just model.fit on the training uh data and then we um can predict on the test features and generate test predictions and then we can compute our error on those um we can compute our error amongst the test predictions and our test uh label. So that's our useful mean squared error function, right? To to compute the MSE. Um let's see what the MSE is. So MSE is right here. Um now what we could do is we can take the MSE and we can take the square root of it. So let's actually do that. Let's um do MP. square root of the um test MSE and we get um 67 we get 67,000. So that's pretty high on this. So when we just now look at the difference of that, right? When we just do a train test split, um when we just do a train test split, we get a worse score because it's not as it's not as robust, right? We're not showing that to many of the other uh folds. So, we get a lot more error this way on the test data. So, this is um actually worse performance just doing the train test split. This is a really higher. Yeah, we can. We can. I'm going to I'm going to show us how to how the scaling will be done automatically. Yes, we can. Um there's there's a really easy tool to do that will scale it automatically. It's going to be later in this notebook. I'll show us it. All right. So, just to recap this, this is fitting the model. This is fitting the model. This is making the predictions, right? Model.predict. So, this is making the predictions. And then this is calculating the error, the mean squared error, which is looking at our test labels versus our test predictions, right? And this is computing the distance, the average distance away from these values to these values, right? And then we can also compute the R squar R R 2 and we see that it's not a very good R squar. 65 uh is not a very great model um because it closer to one would be better. So this is still this is not very good. We know that we knew that from the cross val score. But this is just doing um this is just doing a hold out uh where we do a train and test split, right? So it's a little bit simpler, but it's not quite as robust. Um it's not quite as robust as the crossf, but it works. Um it's, you know, we can do hold out. Um, we can do hold out uh to to quickly evaluate a model and see if we need to make any adjustments. It's a little bit quicker to run. Okay. And any questions on it? Does it make sense what we're doing here? Model.fit to train it predict to get our predictions. Um, this is pretty standard, right? To train is the model.fit it and then to use the model to predict. We predict on the test features. Um, so this is passing on on all of our features into this model to generate predictions for every row. That's something I also want to point out that may be a little bit confusing is this is a data frame. So we're passing in a bunch of rows of features with columns, right? So um, we're passing in a bunch of data that looks like this. And what we're doing is essentially making a prediction for every row. So this will generate a prediction. This row will generate a prediction. This row will generate a prediction and on and on and on. So this this predict will predict for every row. And so we end up with this collection of predictions here for each row. And we're comparing those to the labels that we have for those rows from our from our supervised learning, right? From our data set. So that's truly supervised learning, right? We have the examples and we're comparing those to what our model is predicting to to get our performance. All right. So let's uh let's try the other just so you can see it. The leave one out. Now the leave one out cross validation is going to actually work the same way where we put in the leave one out um strategy inside of the cross val score. Now here we don't need to specify how many folds there are because we know how many there are going to be. It's going to be the number of data points, right? So which is actually going to be quite large because there's 20,000 rows. So this is going to be extremely uh extremely um intensive because we are doing um you know 20,000 examples and leaving one example out to be our validation and then um doing that across every 20,000 uh examples. So we could do it though just to see how it works. Um we have this again leave one out. We generate our crossfile score from our model our data and then same scoring that we had before and but this time we change our cross file to be instead of our k-fold object we have our leave one out object which is this um and then we can run this. We can compute our average uh across the all the folds. Now this is going to be a lot bigger of an array. It's going to be a 20,000 size array and we're going to compute the average across it. So, let's do that. It's going to take a moment because there's lots. So, if you notice it when you run, it's going to take a little bit of time to run because it's running across all 20,000 examples and leaving one out. So, you have 20,000 and then one left out to uh test against. So, it's quite intensive. You can see it's taking a lot more time. It's still running. It's taking a while. Okay, just let that run. Still running. So, if you guys try running this, it's going to take a little bit of time. Hopefully, that makes sense why it's taking so long, right? It's because it's instead of doing 10 folds, it's it's putting every data point but one is the training set and then iterating through all 20,000 points. This takes a while to do. Let's see what our RAM our memory is a little increased. Okay, still running. That's okay. I'll let it run. Come back when it's finished. Yeah, exactly. This is a This is for This is giving us a performance evaluation. This is like the average error across all of our uh different folds. Um now this is the extreme case where we have the number of folds equals the number of points. Right? So it's an extreme case but yes it's just like kfold. It's giving us that performance estimate. Okay. It's about the same. Right. This is still around 50,000. Not much difference, right? Still right around there. But look how much longer it took. That took 2 minutes to run. The other one was pretty instant, right? So this this took about 2 minutes to run. So um definitely uh yeah, definitely don't want to run this uh too often. I think that it's generally prefer to do kfold if you're going to do cross validation. Generally want to do k-fold or just the regular hold out train test split. U generally better than doing leave 1 hour. It's just going to take too long and um it results in about the same kind of score as the kfold. Okay. Any questions about um the cross validation that we just did. Okay. Good. And as it says here that the stratified kfold is usually used for classification. Again, we're not doing classification yet. That's going to be in lesson four. So, we don't need to worry too much about that. Just for regression, um regular kfold is preferred, right? Because we don't need to um worry about distributing categories amongst our folds uh in any regression problems. And as we see, the error is kind of high. Um there's going to be some things we can do to improve that which will be uh later on we'll learn about some more advanced models. This signals that the performance is bad. We probably need a more complex model. Um one thing we could try before we try a complex model is to do scaling. We will try to do scaling. I'm going to show us how we can do that coming up um in a in a nice streamlined fashion. Um, but uh outside of that, if we still had bad performance, we would likely need to use a more advanced model. And we'll learn about more advanced models uh in the next lesson. And what's great is some of those advanced models can actually be used for regression. So they have variations that can be used for both classification and regression, which is pretty cool. So I'll point those out when we get to them. Um, okay. So what I want to talk about now is a way we can combat overfitting. So if we have overfitting which remember that is the case where the uh the we see good performance on the training data but then um it doesn't generalize over to the test data. We get poor performance on the test data. Um there's there's a drop off there. Um that would signal overfitting. overfitting and one way of um combating overfitting is to do something called regularization which we're going to talk about next. So the key idea in regularization is to change our uh the change the way we train. Essentially, what we're going to do is modify our training uh error function or sometimes called the objective function or loss function. We're going to change that to add a penalty to penalize excessive complex complexity. Essentially the the way that we're going to penalize is by making sure the size of the coefficients doesn't grow too much which should mitigate overfitting because remember in linear regression what we are learning are the coefficients right we're learning the beta 0 the beta 1 the beta 2 and on and on however many betas there are beta n we're learning all of those guys um through the regression error function we're trying to minimize that error function. That's how it trains. We talked about that on Monday. Um so what we're going to do is um basically penalize the these guys growing too big and making sure we kind of keep them small so that no one coefficient has a dominant uh effect on the model. And this should help with overfitting and complexity. It should make the model simpler because all the coefficients are going to be encouraged to be smaller. They're not going to grow too big. Um, and this this has the effect of making the model so basically make the model simpler. Make the model simpler is what these regularization techniques are essentially trying to achieve is is remove complexity, make them a little bit simpler, make these coefficients smaller so that you can generalize a bit better and and prevent overfitting. So we want to prevent uh overfitting, right, is what we want to do. Um so there's going to be a penalty and I'll show you where that penalty gets added and kind of what it looks like. Um but uh to control the level of that penalty we are actually going to introduce another parameter to our model um called alpha. Alpha is going to scale the penalty. So if alpha is really high that imposes a stronger penalty on the coefficients um which will make the model a lot simpler. So the higher the alpha the simpler the model we will get and we the the risk with that is we actually underfit. So if alpha is too big we may underfit the training data um a bit too much because it will make the model way too simple. Um and I again I'll show you what this means mathematically in a moment. Um but on the other hand if we have a lower alpha this will have a lower penalty. it's a weaker penalty term and that'll lead to a model that is um a bit more complex. Um which could um risk some level of overfitting. Um so there's so there's still the risk of overfitting if you have a low alpha and of course if alpha goes all the way to zero there's no penalty at all. So you're back to your original linear regression um which could risk a lot of overfitting. Right? So, you you generally want to pick an alpha um effectively. And actually, we're going to see h what's the best way to pick alpha. Um we're actually going to learn how to do that. I'm going to show us how doing some tuning techniques to pick what alpha should be. Um but um a a pretty industry standard alpha that most people default to is alpha equals to one. So just just one, which signals that there should be some penalty. we just have alpha equal to one is a standard penalty. We don't want it to be too high. We don't want it to be too low. Like we don't want it to be a fraction. Um but a penalty of one is usually uh good enough. Okay, I'm going to show you where that comes into play in a moment. Um but the whole purpose of doing this is to mitigate overfitting, right? Um that's what and and doing this penalty is is called regularization. So adding so going beyond just regular linear regression adding this extra penalty to to the training process um to penalize large weights large coefficients um is known as regularization. Okay. Um and there's two common penalties that are added. Um so there's actually two different variations on the penalty. Um we're going to study both of them and um they're they're known as lasso. So if you take linear regression and add a particular type of penalty, it's known as lasso. If you add another type of penalty, it's known as ridge regression. We're going to study both of those

Original Description

🔥Microsoft AI Engineer Program - https://www.simplilearn.com/ai-engineer-course?utm_campaign=Y0gpVkBxm1M&utm_medium=Lives&utm_source=Youtube 🔥Partnership is with E&ICT of IIT Kanpur - Professional Certificate Course in Generative AI and Machine Learning - https://www.simplilearn.com/iitk-professional-certificate-course-ai-machine-learning?utm_campaign=Y0gpVkBxm1M&utm_medium=Lives&utm_source=Youtube This video on Machine Learning With Python Full Course 2026 by Simplilearn will help you learn machine learning using Python from beginner to advanced level and understand how to build predictive models from data. The course begins with an introduction to machine learning and explains how algorithms learn patterns from datasets. You will learn the fundamentals of Python along with important libraries such as NumPy, Pandas, Matplotlib, and Scikit-learn. The tutorial covers key concepts like supervised learning, unsupervised learning, and model evaluation techniques. You will understand popular algorithms such as linear regression, logistic regression, decision trees, and clustering methods. The course also explains data preprocessing, feature engineering, and data visualization techniques. You will learn how to train models, test performance, and improve accuracy. The tutorial also includes real-world use cases of machine learning in business and technology. By the end of this machine learning tutorial for beginners, you will clearly understand how to build, evaluate, and deploy machine learning models using Python. Following are the topics covered in this machine learning with python full course 2026: 00:00:00 - Introduction to Machine Learning With Python Full Course 2026 00:03:00 - Machine learning foundations 00:12:41 - Types of machine learning 01:03:26 - Python libraries for Machine learning 01:12:11 - Supervised Machine learning 01:35:49 - Regression notebook 02:04:09 - Implementing linear regression 02:44:36 - Model fit and evaluation 03:08:12 - Cross-vali

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Simplilearn · Simplilearn · 0 of 60

← Previous Next →

Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn

Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn

AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn

AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn

Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn

Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn

Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn

Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn

Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn

🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn

Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn

Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn

🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn

🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn

Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn

Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn

Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn

Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story

Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead

Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead

Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn

Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn

🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts

🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts

🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn

🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn

Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn

How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn

How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn

Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn

Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Integrating AI & Music | Diego's Story

Simplilearn Reviews | Integrating AI & Music | Diego's Story

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn

SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn

PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn

PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn

Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn

Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn

🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn

🔥Git vs GitHub – What's the Difference?

🔥Git vs GitHub – What's the Difference?

What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn

What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn

Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn

Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn

Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn

PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn

PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn

Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn

Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn

🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn

🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn

SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn

SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey

Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained

Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained

🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn

🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn

🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn

🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn

Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn

Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn

What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn

What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn

How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn

How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

🔥What Is Phishing? #shorts #simplilearn

🔥What Is Phishing? #shorts #simplilearn

Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn

Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn

Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji

Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn

VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn

Related AI Lessons

After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.

Learn what makes a standout ML candidate after interviewing over 100 applicants

Medium · Machine Learning

How AI Learns with Less Labeled Data

Discover how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Medium · Machine Learning

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2

Learn the basics of the TypeScript compiler to write better JavaScript code

Medium · JavaScript

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression

Chapters (9)

Introduction to Machine Learning With Python Full Course 2026

3:00 Machine learning foundations

12:41 Types of machine learning

1:03:26 Python libraries for Machine learning

1:12:11 Supervised Machine learning

1:35:49 Regression notebook

2:04:09 Implementing linear regression

2:44:36 Model fit and evaluation

3:08:12 Cross-vali

Learn Deep Learning by Hand (Beginner's Guide - Part 1)