Machine Learning With Python Full Course | Machine Learning Tutorial For Beginners | Simplilearn

Simplilearn · Beginner ·🔢 Mathematical Foundations ·11mo ago

Key Takeaways

This video teaches machine learning concepts with Python, including supervised and unsupervised learning, neural networks, and deep learning techniques

Full Transcript

Hello everyone and welcome to our machine learning with Python full course. Have you ever wondered how Google finishes a sentences or how Netflix knows exactly what shows you love or how chat bots seem to understand you so well. Now that's the magic of machine learning. And in this course you will learn how to make it happen using Python. Now you might be wondering why Python. Well, Python is the go-to language for machine learning for a few really great reasons. First off, Python's simple syntax means you don't have to get lost in complicated code, making it perfect for beginners. Plus, Python comes with a ton of powerful libraries like TensorFlow, KAS, Scikitlearn, and Pandas that make building machine learning models a breeze. So whether you're just analyzing data, building models or visualizing results, Python has everything you need to know to get the job done efficiently. Knowing machine learning with Python will be an essential skill in industries like healthcare, finance, marketing and many more. Companies are looking for people who can use Python to build smart systems and create AIdriven solutions. And the best part is the career opportunities are huge. In India, machine learning professionals can earn around 10 lakh to 25 lakh peranom. And in the US, salaries can easily top up to $120,000. Now, this course is designed to take you from a beginner to someone who can confidently build machine models with Python. We'll start by breaking down what machine learning is and dive into some key techniques and concepts you will use in real world projects. So, let's get started. Before we comment, if you're interested in launching a high growth career in artificial intelligence and machine learning, this program might be the best thing you'll ever come across. The professional certificate in AI and machine learning offered by Pur University online in collaboration with SimplyLearn and IBM isn't just another course. It's a complete career transforming experience. Ranked one online AI and ML certification by Career Karma. This program is designed to help you master the most in- demand skills in AI, automation, chat, GPT, Genai, LLMS, deep learning, agentic framework, and so much more. So whether you're just starting out or looking to upskill, you'll get hands-on with 15 plus durable projects. Explore tools like hugging face, tensorflow, midjourney or even build lm based applications. So what are you waiting for? Hurry up and enroll now and you can find a link below. >> Hello and welcome to machine learning tutorial part one. This is part one of a machine learning series put on by SimplyLearn. My name is Richard Kersner. I'm with the SimplyLearn team. That's www.simplearn.com. Get certified get ahead. What's in it for you today? Well, we'll start off with a brief explanation of why machine learning and what is machine learning. And then we'll get into a few of the types of machine learning. machine learning algorithms, linear regression, decision trees, support vector machine, and finally, we'll do a use case where we're going to classify whether a recipe is of a cupcake or a muffin using the SVM or the support vector machine. Sounds like a delicious way to explore machine learning. So, why machine learning? Why do we even care about having these computers come up and be able to do all these new things for us? Well, because machines can now drive your car for you. still very in the infant stage but it's just exploding as we see with uh Google's Whimo and then Uberhead their program which unfortunately crashed. They know that this is huge. This is going to be the huge industry to change our whole transportation infrastructure. Machine learning is now used to detect over 50 eye diseases. Do you know how amazing that is to have a computer that doublech checkcks for the doctor for things they might miss? That's just huge in the health industry. pretty soon they actually do already have that within some areas where maybe not for eyes but for other diseases where they're using the camera on your phone to help pre-diagnose before you go in and see the doctor. And because the machine can now unlock your phone with your face, I mean, that's just cool having it being able to identify your face or your voice and be able to turn stuff on and off for you depending on where you're at and what you need. Talk about an ultimate automation our world we live in. And as we dig in deeper, we have a nice example of Facebook. As you can see here, they have the Facebook post with Halloween. Comment yes if you want it. Order here. Nobody likes spam posts on Facebook that annoy them into interacting with likes, shares, comments, and other actions. I remember the original ones were all if you don't click on here, you will have bad luck or some kind of fear factor. Well, this is a huge thing in a social media when people are getting spammed. And so this tactic known as engagement bait takes advantage of Facebook's newsfeed algorithm by choosing engagement in order to get the greater reach. To eliminate engagement bait, the company reviewed and categorized hundreds of thousands of posts to train a machine learning model that detects different types of engagement bait. So in this case, we have we're using Facebook, but this is of course across all the different social media. they have different tools are building and the Facebook scroll gif will be replaced kind of like a virus coming in there it notices that there's a certain setup with Facebook and it's able to replace it and they have like vote baiting react baiting share baiting they have all these different these are kind of general titles but there certainly are a lot of way of baiting you to go in there and click on something so they fed all this this data was fed into the machine and then they have the new post the new post comes up that takes over part of the Facebook setup up and that's what you're looking at. You're looking at this new post that's replaced like a virus has replaced that. So what Facebook do to eliminate this is they start scanning for keywords and phrases like this and checks the click-through rate. So it starts looking for people who are clicking through it without even looking at it or clicking through it and it's not something that normally would be clicked through. Once Facebook has scanned for these keywords and phrases, it is now able to identify the spam coming in and this makes your life easier. So you're not getting spammed. It's not like walking through an airport and in a lot of countries you have like hundreds of people trying to sell you time share. Come join us. Sign up for this. Eliminates that annoyingness. So now you can just enjoy your Facebook and your cat pictures. Or maybe it's your family pictures. Mine is family. Certainly people like their cat pictures too. Another good example is Google's Deep Mind project Alph Go. A computer program that plays a board game Go has defeated the world's number one Go player. And I hope I say his name right. KG. The ultimate go challenge game of three of three was on May 27th, 2017. So that was just last year that this happened. And what makes this so important is that you know Go is just is a game. So it's not like you're driving a car or something in our real world, but they are using games to learn how to get the machine learning program to learn. They want it to learn how to learn. And that is a huge step. A lot of this is still in its infant stage as far as development as we saw what happened with the as I referred to earlier the Uber cars. They lost their whole division because they jumped ahead too fast. So still an infant stage but boy is this like the beginning of just an amazing world that is automated in ways we can't even imagine what tomorrow's going to look like. We've looked at a lot of examples of machine learning. So let's see if we can give a little bit more of a concrete definition. What is machine learning? Machine learning is the science of making computers learn and act like humans by feeding data and information without being explicitly programmed. And we see here we have a nice little diagram where we have our ordinary system, your computer. Nowadays, you can even run a lot of this stuff on a cell phone because cell phones have advanced so much. And then with artificial intelligence and machine learning, it now takes the data and it learns from what happened before and then it predicts what's going to come next. And then really the biggest part right now in machine learning that's going on is it improves on that. How do we find a new solution? So we go from descriptive where it's learning about stuff and understanding how it fits together to predicting what it's going to do to post scripting coming up with a new solution. And when we're working on machine learning, there's a number of different diagrams that people have posted for what steps to go through. A lot of it might be very domainspecific. So if you're working on photo identification versus language versus medical or physics, some of these are switched around a little bit or new things are put in. They're very specific to the domain. This is kind of a very general diagram. First, you want to define your objective. Very important to know what it is you're wanting to predict. Then you're going to be collecting the data. So once you've defined an objective, you need to collect the data that matches. You spend a lot of time in data science collecting data and the next step preparing the data. You got to make sure that your data is clean going in. There's the old saying, bad data in, bad answer out or bad data out. And then once you've gone through and we've cleaned all this stuff coming in, then you're going to select the algorithm. Which algorithm are you going to use? You're going to train that algorithm. In this case, I think we're going to be working with SVM, the support vector machine. Then you have to test the model. Does this model work? Is this a valid model for what we're doing? And then once you've tested it, you want to run your prediction. You want to run your prediction or your choice or whatever output it's going to come up with. And then once everything is set and you've done lots of testing, then you want to go ahead and deploy the model. And remember I said domain specific. This is very general as far as the scope of doing something. A lot of models you get halfway through and you realize that your data is missing something and you have to go collect new data because you've run a test in here someplace along the line. You're saying, "Hey, I'm not really getting the answers I need." So there's a lot of things that are domain specific that become part of this model. This is a very general model, but it's a very good model to start with. And we do have some basic divisions of what machine learning does that's important to know. For instance, do you want to predict a category? Well, if you're categorizing thing, that's classification. For instance, whether the stock price will increase or decrease. So in other words, I'm looking for a yes no answer. Is it going up or is it going down? And in that case, we'd actually say, is it going up? True. If it's not going up, it's false, meaning it's going down. This way, it's a yes, no. 01. Do you want to predict a quantity? That's regression. So remember, we just did classification. Now we're looking at regression. These are the two major divisions in what data is doing. For instance, predicting the age of a person based on the height, weight, health, and other factors. So based on these different factors, you might guess how old a person is. And then there are a lot of domain specific things like do you want to detect an anomaly? That's anomaly detection. This is actually very popular right now. For instance, you want to detect money withdrawal anomalies. You want to know when someone's making a withdrawal that might not be their own account. We've actually brought this up because this is really big right now. If you're predicting the stock, whether to buy stock or not, you want to be able to know if what's going on in the stock market is an anomaly, use a different prediction model because something else is going on. You got to pull out new information in there, or is this just the norm? I'm going to get my normal return on my money invested. So, being able to detect anomalies is very big in data science these days. Another question that comes up which is on what we call untrained data is do you want to discover structure in unexplored data and that's called clustering. For instance, finding groups of customers with similar behavior given a large database of customer data containing their demographics and past buying records. And in this case, we might notice that anybody who's wearing certain set of shoes goes shopping at certain stores or whatever it is. are going to make certain purchases. By having that information, it helps us to market or group people together. So then we can now explore that group and find out what it is we want to market to them. If you're in the marketing world, and that might also work in just about any arena. You might want to group people together whether they're uh based on their different areas and investments and financial background, whether you're going to give them a loan or not. before you even start looking at whether they're valid customer for the bank, you might want to look at all these different areas and group them together based on unknown data. So, you're not you don't know what the data is going to tell you, but you want to cluster people together that come together. Let's take a quick detour for quiz time. Oh, my favorite. So, we're going to have a couple questions here under quiz time and um we'll be posting the answers in the part two of this tutorial. So, let's go ahead and take a look at these quiz times questions and hopefully you'll get them all right and it'll get you thinking about how to process data and what's going on. Can you tell what's happening in the following cases? Of course, you're sitting there with your cup of coffee and you have your checkbox and your pen trying to figure out what's your next step in your data science analysis. So, the first one is grouping documents into different categories based on the topic and content of each document. Very big these days. you know, you have legal documents, you have uh maybe it's a sports group documents, maybe you're analyzing newspaper postings, but certainly having that automated is a huge thing in today's world. B, identifying handwritten digits in images correctly. So, we want to know whether uh they're writing an A or capital A, B, C, what are they writing out in their hand digit that they're handwriting. C behavior of a website indicating that the site is not working as designed. D predicting salary of an individual based on his or her years of experience with HR hiring uh setup there. So stay tuned for part two. We'll go ahead and answer these questions when we get to the part two of this tutorial or you can just simply write at the bottom and send a note to SimplyLearn and they'll follow up with you on it. Back to our regular content. Now these last few bring us into the next topic which is another way of dividing our types of machine learning and that is with supervised unsupervised and reinforcement learning. Supervised learning is a method used to enable machines to classify predict objects, problems or situations based on labeled data fed to the machine. And in here you see we have a jungle of data with circles, triangles and squares. Then we label them. We have what's a circle, what's a triangle, what's a square and we have our model training and it trains it. So we know the answer. Very important when you're doing supervised learning, you already know the answer to a lot of your information coming in. So you have a huge group of data coming in and then you have new data coming in. So we've trained our model. The model now knows the difference between a circle, a square, a triangle. And now that we've trained it, we can send in in this case a square and a circle goes in and it predicts that the top one's a square and the next one's a circle. And you can see that this is uh being able to predict whether someone's going to default on a loan. So I was talking about banks earlier. Supervised learning on stock market, whether you're going to make money or not. That's always important. And if you are looking to make a fortune in the stock market, keep in mind it is very difficult to get all the data correct on the stock market. It is very uh it fluctuates in ways you really hard to predict. So it's quite a roller coaster ride. If you're running machine learning on the stock market, you start realizing you really have to dig for new data. So we have supervised learning. And if you have supervised, we need unsupervised learning. In unsupervised learning, machine learning model finds the hidden pattern in an unlabeled data. So in this case, instead of telling it what the circle is and what a triangle is and what a square is, it goes in there, looks at them, and says for whatever reason, it groups them together. Maybe it'll group it by the number of corners. And it notices that a number of them all have three corners, a number of them all have four corners, and a number of them all have no corners. And it's able to filter those through and group them together. We talked about that earlier with looking at a group of people who are out shopping. We want to group them together to find out what they have in common. And of course, once you understand what people have in common, maybe you have one of them who's a customer at your store, or you have five of them are customer at your store, and they have a lot in common with five others who are not customers at your store. How do you market to those five who aren't customers at your store yet? They fit the demographs of who's going to shop there, and you'd like them to shop at your store, not the one next door. Of course, this is a simplified version. You can see very easily the difference between a triangle and a circle, which is might not be so easy in marketing. Reinforcement learning. Reinforcement learning is an important type of machine learning where an agent learns how to behave in an environment by performing actions and seeing the result. And we have here where the in this case a baby. It's actually great that they used an infant for this slide because the reinforcement learning is very much in its infant stages. But it's also probably the biggest machine learning demand out there right now or in the future. It's going to be coming up over the next few years is reinforcement learning and how to make that work for us. And you can see here where we have our action. In the action in this one, it goes into the fire. Hopefully, the baby didn't it's just a little candle, not a giant fire pit like it looks like here. When the baby comes out and the new state is the baby is sad and crying because they got burned on the fire. And then maybe they take another action. The baby's called the agent because it's the one taking the actions. And in this case, they didn't go into the fire. They went a different direction. And now the baby's happy and laughing and playing. Reinforcement learning is very easy to understand because that's how as humans that's one of the ways we learn. We learn whether it is, you know, you burn yourself on the stove, don't do that anymore. Don't touch the stove. In the big picture, being able to have machine learning program or an AI be able to do this is huge because now we're starting to learn how to learn. That's a big jump in the world of computer and machine learning. And we're going to go back and just kind of go back over supervised versus unsupervised learning. Understanding this is huge because this is going to come up in any project you're working on. We have in supervised learning, we have labeled data. We have direct feedback. So someone's already gone in there and said, "Yes, that's a triangle. No, that's not a triangle." And then you predict an outcome. So you have a nice prediction. This is this this new set of data is coming in and we know what it's going to be. And then with unsupervised training, it's not labeled. So we really don't know what it is. There's no feedback. So, we're not telling it whether it's right or wrong. We're not telling it whether it's a triangle or a square. We're not telling it to go left or right. All we do is we're finding hidden structure in the data, grouping the data together to find out what connects to each other. And then you can use these together. So, imagine you have an image and you're not sure what you're looking for. So, you go in and you have the unstructured data. Find all these things that are connected together and then somebody looks at those and labels them. Now you can take that label data and program something to predict what's in the picture. So you can see how they go back and forth and you can start connecting all these different tools together to make a bigger picture. There are many interesting machine learning algorithms. Let's have a look at a few of them. Hopefully this gave you a little flavor of what's out there and these are some of the most important ones that are currently being used. We'll take a look at linear regression, decision tree and the support vector machine. Let's start with a closer look at linear regression. Linear regression is perhaps one of the most well-known and well understood algorithms in statistics and machine learning. Linear regression is a linear model. For example, a model that assumes a linear relationship between the input variables x and the single output variable y. And you'll see this if you remember from your algebra classes. Y = mx + c. Imagine we are predicting distance traveled y from speed x. Our linear regression model representation for this problem would be y = m * x + c or distance = m * speed + c where m is the coefficient and c is the y intercept. And we're going to look at two different variations of this. First, we're going to start with time is constant. And you can see we have a bicyclist. He's got a safety gear on. Thank goodness. Speed equals 10 meters/s. And so over a certain amount of time, his distance equals 36 km. We have a second bicyclist who's going twice the speed or 20 m/s. And you can guess if he's going twice the speed and time is a constant, then he's going to go twice the distance. And that's easy to compute. 36 * 2, you get 72 kilometers. And so if you had the question of how fast would somebody going three times that speed or 30 m/s is, you can easily compute the distance in our head. We can do that without needing a computer, but we want to do this from more complicated data. So, it's kind of nice to compare the two. But let's just take a look at that and what that looks like in a graph. So, in a linear regression model, we have our distance to the speed. And we have our m equals the ve slope of the line. And we'll notice that the line has a plus slope. And as speed increases, distance also increases. Hence, the variables have a positive relationship. And so your speed of the person which equals y= mx plus c distance traveled in a fixed interval of time. And we could very easily compute either following the line or just knowing it's 3 * 10 m/s that this is roughly 102 km distance that this third bicus has traveled. One of the key definitions on here is positive relationship. So the slope of the line is positive. As distance increase so does speed increase. Let's take a look at our second example where we put distance is a constant. So we have speed equals 10 m/s. They have a certain distance to go and it takes him 100 seconds to travel that distance. And we have our second bicyclist who's still doing 20 m/s. Since he's going twice the speed, we can guess he'll cover the distance in about half the time, 50 seconds. And of course, you could probably guess on the third one, 100 divided by 30 since he's going three times the speed. You can easily guess that this is 33.3333 seconds time. We put that into a linear regression model or a graph. If the distance is assumed to be constant, let's see the relationship between speed and time. And as time goes up, the amount of speed to go that same distance goes down. So now your m equals a minus v slope of the line. As the speed increases, time decreases. Hence, the variable has a negative relationship. Again, there's our definition. positive relationship and negative relationship dependent on the slope of the line and with a simple formula like this um and even a significant amount of data. Let's uh see what the mathematical implementation of linear regression and we'll take this data. So suppose we have this data set where we have xyx= 1 2 3 4 5 standard series and the y value is 3 22 43. When we take that and we go ahead and plot these points on a graph, you can see there's kind of a nice scattering and you could probably eyeball a line through the middle of it. But we're going to calculate that exact line for linear regression. And the first thing we do is we come up here and we have the mean of Xi. And remember mean is basically the average. So we added 5 + 4 + 3 + 2 + 1 and divide by five. And that simply comes out as three. And then we'll do the same for y. We'll go ahead and add up all those numbers and divide by five. And we end up with a mean value of y of i equals 2.8 where the x i references it's an average or means value. And the yi also equals a means value of y. And when we plot that, you'll see that we can put in the y= 2.8 and the x= 3 in there on our graph. We kind of gave it a little different color so you could sort it out with the dash lines on it. And it's important to note that when we do the linear regression, the linear regression model should go through that dot. Now, let's find our regression equation to find the best fit line. Remember, we go ahead and take our y= mx plus c. So, we're looking for m and c. So, to find this equation for our data, we need to find our slope of m and our coefficient of c. And we have y = mx + c where m equals the sum of x - x average * y - y average or y means and x means over the sum of x - x means squared. That's how we get the slope of the value of the line. And we can easily do that by creating some columns here. We have xy. Computers are really good about iterating through data. And so we can easily compute this and fill in a graph of data. And in our graph you can easily see that if we have our x value of one and if you remember the x i or the means value is three 1 - 3 equals a -2 and 2 - 3 = a -1 so on and so forth and we can easily fill in the column of x - x i y - yi and then from those we can compute x - x i^ 2 and x - x i * y - yi and you can guess it that the next step is to go ahead and sum the different columns for the answers we need. So we get a total of 10 for our x - x i^ 2 and a total of 2 for x - x i * y - yi and we plug those in. We get 2/10 which equals2. So now we know the slope of our line equals2. So we can calculate the value of c. That'd be the next step is we need to know where it crosses the y ais. And if you remember, I mentioned earlier that the linear regression line has to pass through the means value, the one that we showed earlier. We can just flip back up there to that graph. And you can see right here, there's our means value, which is 3, x= 3, and y= 2.8. And since we know that value, we can simply plug that into our formula. y =2x + c. So we plug that in, we get 2.8 8 =2 * 3 + c. And you can just solve for c. So now we know that our coefficient equals 2.2. And once we have all that, we can go ahead and plot our regression line. y =2 * x + 2.2. And then from this equation, we can compute new values. So let's predict the values of y using x= 1 2 3 4 5 and plot the points. Remember the 1 2 3 4 5 was our original x values. So now we're going to see what Y thinks they are, not what they actually are. And we plug those in, we get Y of designated with Y of P. You can see that X= 1= 2.4, X= 2= 2.6, and so on and so on. So we have our Y predicted values of what we think it's going to be when we plug those numbers in. And when we plot the predicted values along with the actual values, we can see the difference. And this is one of the things that's very important with linear regression in any of these models is to understand the error. And so we can calculate the error on all of our different values. And you can see over here we plotted um x and y and y predict. And we draw a little line so you can sort of see what the error looks like there between the different points. So our goal is to reduce this error. We want to minimize that error value on our linear regression model. Minimizing the distance. There are lots of ways to minimize the distance between the line and the data points like sum of squared errors, sum of absolute errors, root mean square error, etc. We keep moving this line through the data points to make sure the best fit line has the least squared distance between the data points and the regression line. So to recap with a very simple linear regression model, we first figure out the formula of our line through the middle and then we slowly adjust the line to minimize the error. Keep in mind this is a very simple formula. The math gets even though the math is very much the same, it gets much more complex as we add in different dimensions. So this is only two dimensions. Y equals MX plus C. But you can take that out to X, Z, Y, J, Q, all the different features in there and they can plot a linear regression model on all of those using the different formulas to minimize the error. Let's go ahead and take a look at decision trees. A very different way to solve problems in the linear regression model. Decision tree is a treeshaped algorithm used to determine a course of action. Each branch of a tree represents a possible decision, occurrence, or reaction. We have data which tells us if it is a good day to play golf. And if we were to open this data up in a general spreadsheet, you can see we have the outlook whether it's rainy, overcast, sunny, temperature, hot, mild, cool, humidity, windy, and did I like to play golf that day? Yes or no. So, we're taking a census. And certainly, I wouldn't want a computer telling me when I should go play golf or not. But you could imagine if you got up in the night before, you're trying to plan your day and it comes up and says, "Tomorrow would be a good day for golf for you in the morning and not a good day in the afternoon or something like that." This becomes very beneficial and we see this in a lot of applications coming out now where it gives you suggestions and lets you know what what would uh fit the match for you for the next day or the next purchase or the next uh whatever you know next mail out in this case is tomorrow a good day for playing golf based on the weather coming in. And so we come up and let's uh determine if you should play golf when the day is sunny and windy. So we found out the forecast tomorrow is going to be sunny and windy. And suppose we draw our tree like this. We're going to have our humidity and then we have our normal, which is if it's if you have a normal humidity, you're going to go play golf. And if the humidity is really high, then we look at the outlook. And if the outlook is sunny, overcast, or rainy, it's going to change what you choose to do. So, if you know that it's a very high humidity and it's sunny, you're probably not going to play golf cuz you're going to be out there miserable fighting off the mosquitoes that are out joining you to play golf with you. Maybe if it's rainy, you probably don't want to play in the rain. But if it's slightly overcast and you get just the right shadow, that's a good day to play golf and be outside out on the green. Now, in this example, you can probably make your own tree pretty easily. So, it's a very simple set of data going in. But the question is, how do you know what to split? Where do you split your data? What if this is much more complicated data where it's not something that you would particularly understand like studying cancer? They take about 36 measurements of the cancerous cells and then each one of those measurements represents how bulbous it is, how extended it is, how sharp the edges are, something that as a human we would have no understanding of. So how do we decide how to split that data up and is that the right decision tree? But so that's a question that's going to come up. Is this the right decision tree? For that we should calculate entropy and information gain. Two important vocabulary words there are the entropy and the information gain. Entropy. Entropy is a measure of randomness or impurity in the data set. Entropy should be low. So we want the chaos to be as low as possible. We don't want to look at it and be confused by the images or what's going on there with mixed data. And the information gain, it is a measure of decrease in entropy after the data set is split. Also known as entropy reduction. information gain should be high. So we want our information that we get out of the split to be as high as possible. Let's take a look at entropy from the mathematical side. In this case, we're going to denote entropy as I of P of and N where P is the probability that you're going to play a game of golf and N is the probability where you're not going to play the game of golf. Now, you don't really have to memorize these formulas. There's a few of them out there depending on what you're working with. But it's important to note that this is where this formula is coming from. So when you see it, you're not lost when you're running your programming. Unless you're building your own decision tree code in the back. And we simply have a log squar of p + n minus n / p + n * the log squar of n of p + n. But let's break that down and see what actually looks like when we're computing that from the computer script side. Entropy of a target class of the data set is the whole entropy. So we have entropy play golf and we look at this. If we go back to the data you can simply count how many yeses and no in our complete data set for playing golf days. In our complete set we find we have five days we did play golf and 9 days we did not play golf. And so our I equals if you add those together 9 + 5 is 14. And so our I equals 5 over 14 and 9 over 14. That's our PNN values that we plug into that formula. And you can go 5 over 14=.36. 9 over 14= 64. And when you do the whole equation, you get the minus.36 log<unk>^2 of.36US.64 log<unk> of 64. And we get a set value. We get 94. So we now have a full entropy value for the whole set of data that we're working with. And we want to make that entropy go down. And just like we calculated the entropy out for the whole set, we can also calculate entropy for playing golf and the outlook. Is it going to be overcast or rainy or sunny? And so we look at the entropy. We have P of sunny times E of three of two. And that just comes out how many sunny days yes and how many sunny days no over the total which is five. Don't forget to put the we'll divide that five out later on. equals P overcast = 4, 0 plus rainy = 2a 3. And then when you do the whole setup, we have 5 over4. Remember I said there was a total of five. 5 over 14 * the i of 3 of 2 + 4 over 14 * the 4 comma 0 and 54 over i of 23. And so we can now compute the entropy of just the part that has to do with the forecast and we get 693. Similarly, we can calculate the entropy of other predictors like temperature, humidity and wind. And so we look at the gain outlook. How much are we going to gain from this entropy play golf minus entropy play golf outlook? And we can take the original 0.94 for the whole set minus the entropy of just the rainy day and temperature and we end up with a gain of.247. So this is our information gain. Remember we define entropy and we define information gain. The higher the information gain, the lower the entropy, the better. The information gain of the other three attributes can be calculated in the same way. So we have our gain for temperature equals 0.029. We have our gain for humidity equals.152. And our gain for a windy day equals 0048. And if you do a quick comparison, you'll see the.247 is the greatest gain of information. So that's the split we want. Now let's build the decision tree. So, we have the outlook. Is it going to be sunny, overcast, or rainy? That's our first split because that gives us the most information gain. And we can continue to go down the tree using the different information gains with the largest information. We can continue down the nodes of the tree where we choose the attribute with the largest information gain as the root node and then continue to split each subnode with the largest information gain that we can compute. And although it's a little bit of a tongue twister to say all that, you can see that it's a very easy to view visual model. We have our outlook. We split it three different directions. If the outlook is overcast, we're going to play. And then we can split those further down if we want. So if the over outlook is sunny, but then it's also windy. If it's uh windy, we're not going to play. If it's uh not windy, we'll play. So, we can easily build a nice decision tree to guess what we would like to do tomorrow and give us a nice recommendation for the day. So, we want to know if it's a good day to play golf when it's sunny and windy. Remember the original question that came out, tomorrow's weather report is sunny and windy. You can see by going down the tree, we go outlook sunny, outlook windy. We're not going to play golf tomorrow. So, our little smartwatch pops up and says, I'm sorry, tomorrow's not a good day for golf. It's going to be sunny and windy. And if you're a huge golf fan, you might go, "Uhoh, it's not a good day to play golf." We can go in and watch a golf game at home. So, we'll sit in front of the TV instead of being out playing golf in the wind. Now that we looked at our decision tree, let's look at the third one of our algorithms we're investigating. Support vector machine. Support vector machine is a widely used classification algorithm. The idea of support vector machine is simple. The algorithm creates a separation line which divides the classes in the best possible manner. For example, dog or cat, disease or no disease. Suppose we have a labeled sample data which tells height and weight of males and females. A new data point arrives and we want to know whether it's going to be a male or a female. So we start by drawing a line. We draw decision lines. But if we consider decision line one, then we will classify the individual as a male. And if we consider decision line two, then it'll be a female. So you can see this person kind of lies in the middle of the two groups. So it's a little confusing trying to figure out which line they should be under. We need to know which line divides the classes correctly. But how the goal is to choose a hyper plane and that is one of the key words they use when we talk about support vector machines. Choose a hyper plane with the greatest possible margin between the decision line and the nearest point within the training set. So you can see here we have our support vector. we have the two nearest points to it and we draw a line between those two points and the distance margin is the distance between the hyperplane and the nearest data point from either set. So we actually have a value and it should be equal distance between the two points that we're comparing it to. When we draw the hyperplanes we observe that line one has a maximum distance. So we observe that line one has a maximum distance margin. So we'll classify the new data point correctly. And our result on this one is going to be that the new data point is MEL. One of the reasons we call it a hyper plane versus a line is that a lot of times we're not looking at just weight and height. We might be looking at 36 different features or dimensions. And so when we cut it with a hyper plane, it's more of a three-dimensional cut in the data. Multi-dimensional that cuts the data a certain way. and each plane continues to cut it down until we get the best fit or match. Let's understand this with the help of an example problem statement. You always start with a problem statement when you're going to put some code together. We're going to do some coding now. Classifying muffin and cupcake recipes using support vector machines. So, the cupcake versus the muffin. Let's have a look at our data set. And we have the different recipes here. We have a muffin recipe that has so much flour. I'm not sure what measurement 55 is in, but it has 55, maybe it's ounces, but it has a certain amount of flour, certain amount of milk, sugar, butter, egg, baking powder, vanilla, and salt. And so based on these measurements, we want to guess whether we're making a muffin or a cupcake. And you can see in this one, we don't have just two features. We don't just have height and weight as we did before between the male and female. In here, we have a number of features. In fact, in this we're looking at eight different features to guess whether it's a muffin or a cupcake. What's the difference between a muffin and a cupcake? Turns out muffins have more flour while cupcakes have more butter and sugar. So basically the cupcakes a little bit more of a dessert where the muffin's a little bit more of a fancy bread. But how do we do that in Python? How do we code that to go through recipes and figure out what the recipe is? And I really just want to say cupcakes versus muffins like some big professional wrestling thing. Before we start in our cupcakes versus muffins, we are going to be working in Python. There's many versions of Python, many different editors. That is one of the strengths and weaknesses of Python is it just has so much stuff attached to it. It's one of the more popular data science programming packages you can use. In this case, we're going to go ahead and use Anaconda and Jupyter Notebook. The Anaconda Navigator has all kinds of fun tools. Once you're into the Anaconda Navigator, you can change environments. I actually have a number of environments on here. We'll be using Python 36 environment. So, this is in Python version 36. Although, it doesn't matter too much which version you use. I usually try to stay with the 3X because they're current unless you have a project that's very specifically in version 2X. 2.7 I think is usually what most people use in the version two. And then once we're in our um Jupiter notebook editor, I can go up and create a new file and we'll just jump in here. In this case, we're doing SVM muffin versus cupcake. And then let's start with our packages for data analysis. And we almost always use a couple there's a few very standard packages we use. We use import oops import numpy that's for number python. They usually denote it as np that's very comma that's very common. And then we're going to import pandas as pd. And numpy deals with number arrays. There's a lot of cool things you can do with the numpy uh setup as far as multiplying all the values in an array in a numpy array. Data array pandas I can't remember if we're using it actually in this data set. I think we do as an import. It makes a nice data frame. And the difference between a data frame and a numpy array is that a data frame is more like your Excel spreadsheet. You have columns, you have indexes. So you have different ways of referencing it, easily viewing it, and there's additional features you can run on a data frame. And pandas kind of sits on numpy. So they you need them both in there. And then finally, we're working with the support vector machine. So from sklearn, we're going to use the sklearn model. Import SVM support vector machine. And then as a data scientist, you should always try to visualize your data. Some data obviously is too complicated or doesn't make any sense to the human. But if it's possible, it's good to take a second look at it so that you can actually see what you're doing. Now, for that, we're going to use two packages. We're going to import mapplot library.pipplot as plt. Again, very common. And we're going to import seabor as sns. And we'll go ahead and set the font scale in the SNS. Right in our import line, that's what this U semicolon followed by a line of data. We're going to set the SNS. And these are great because the the seaborn sits on top of map plot library just like pandas sits on numpy. So it adds a lot more features and uses and control. We're obviously not going to get into mattplot library and seabour. It' be its own tutorial. We're really just focusing on the SVM, the support vector machine from sklearn. And since we're in Jupiter notebook, uh we have to add a special line in here for our mattplot library. And that's your percentage sign or amber sign mattplot library in line. Now, if you're doing this in just a straight code project, a lot of times I use like Notepad++ and I'll run it from there. You don't have to have that line in there because it'll just pop up as its own window on your computer depending on how your computer's set up because we're running this in the Jupyter notebook as a browser setup. This tells it to display all of our graphics right below on the page. So that's what that line is for. I remember the first time I ran this, I didn't know that and I had to go look that up years ago. It's quite a headache. So mapplot library inline is just because we're running this on the web setup. And we can go ahead and run this. make sure all our modules are in. They're all imported, which is great. If you don't have them import, you'll need to go ahead and pip. Use the pip or however you do it. There's a lot of other install packages out there, although pip is the most common. And you have to make sure these are all installed on your Python setup. The next step, of course, is we got to look at the data. You can't run a model for predicting data if you don't have actual data. So, to do that, let me go ahead and open this up and take a look. And we have our uh cupcakes versus muffins. and it's a CSV file or CSV meaning that it's commaepparated variable and it's going to open it up in a nice uh spreadsheet for me. And you can see up here we have the type we have muffin muffin muffin cupcake cupcake cupcake and then it's broken up into flour, milk, sugar, butter, egg, baking powder, vanilla and salt. So we can do is we can go ahead and look at this data also in our Python. Let us create a variable recipes equals we're going to use our pandas module read CSV remember is a commaepparated variable and the file name happened to be cupcakes versus muffins. Oops, I got double brackets there. Do it this way. There we go. cupcakes versus muffins. Because the program I loaded or the the place I saved this particular Python program is in the same folder, we can get by with just the file name. But remember, if you're storing it in a different location, you have to also put down the full path on there. And then because we're in pandas, we're going to go ahead and you can actually in line you can do this, but let me do the full print. You can just type in recipes.head head in the Jupyter notebook. But if you're running in code in a different script, you'd need to go ahead and type out the whole print recipes. And Pandanda's nose is that's going to do the first five lines of data. And if we flip back on over to the spreadsheet where we opened up our CSV file, uh you can see where it starts on line two. This one calls it zero. And then 2 3 4 5 6 is going to match. Go and close that out because we don't need that anymore. And it always starts at zero. And these are it automatically indexes it since we didn't tell it to use an index in here. So that's the index number for the left hand side. And it automatically took the top row as labels. So pandas using it to read a CSV is just really slick and fast. One of the reasons we love our pandas, not just because they're cute and cuddly teddy bears. And let's go ahead and plot our data. And I'm not going to plot all of it. I'm just going to plot the uh sugar and flour. Now, obviously, you can see where they get really complicated if we have tons of different features. And so, you'll break them up and maybe look at just two of them at a time to see how they connect. And to plot them, we're going to go ahead and use Seabor. So, that's our SNS. And the command for that is SNS.LM plot. And then the two different variables I'm going to plot is flour and sugar. Data equals recipes. The hue equals type. And this is a lot of fun because it knows that this is pandas coming in. So this is one of the powerful things about pandas mixed with seabour and doing graphing. And then we're going to use a pallet set one. There's a lot of different sets in there. You can go look them up for seabour. or do a regular fit regular equals false. So, we're not really trying to fit anything. And it's a scatter KWS. A lot of these settings you can look up in Seabor. Half of these you could probably leave off when you run them. Somebody played with this and found out that these were the best settings for doing a Seabor plot. And let's go ahead and run that. And because it does it in line, it just puts it right on the page. And you can see right here that just based on sugar and flour alone, there's a definite split. And we use these models because you can actually look at it and say, "Hey, if I drew a line right between the middle of the blue dots and the red dots, we'd be able to do an SVM and and a hyper plane right there in the middle." Then the next step is to format or pre-process our data. And we're going to break that up into two parts. We need a type label. And remember, we're going to decide whether it's a muffin or a cupcake. Well, a computer doesn't know muffin or cupcake. It knows zero and one. So, what we're going to do is we're going to create a type label. And from this we'll create a numpy array np where and this is where we can do some logic. We take our recipes from our panda and wherever type equals muffin it's going to be zero. And then if it doesn't equal muffin which is cupcakes it's going to be one. So we create our type label. This is the answer. So when we're doing our training model remember we have to have a a training data. This is what we're going to train it with is that it's zero or one. it's a muffin or it's not. And then we're going to create our recipe features. And if you remember correctly from right up here, the first column is type. So we really don't need the type column because that's our muffin or cupcake. And in pandas, we can easily sort that out. We take our value recipes. That's a pandas function built into pandas. values converting them to values. So it's just the column titles going across the top and we don't want the first one. So what we do is since it always starts at zero, we want one colon till the end. And then we want to go ahead and make this a list. And this converts it to a list of strings. And then we can go ahead and just take a look and see what we're looking at for the features. Make sure it looks right. Me go ahead and run that. And I forgot the S on recipes. So, we'll go ahead and add the S in there and then run that. And we can see we have flour, milk, sugar, butter, egg, baking powder, vanilla, and salt. And that matches what we have up here. We printed out everything but the type. So, we have our features and we have our label. Now, the recipe features is just the titles of the columns. We actually need the ingredients. And at this point, we have a couple options. One, we could run it over all the ingredients. And when you're doing this, usually you do, but for our example, we want to limit it so you can easily see what's going on because if we did all the ingredients, we have, you know, that's what, um, seven, eight different hyperplanes that would be built into it. We only want to look at one so you can see what the SVM is doing. And so we'll take our recipes and we'll do just flour and sugar. Again, you can replace that with your recipe features and do all of them, but we're going to do just flour and sugar. And we're going to convert that to values. We don't need to make a list out of it because it's not string values. These are actual values on there. And we can go ahead and just print ingredients. And you can see what that looks like. Uh, and so we have just the nan of flour and sugar, just the two sets of plots. And just for fun, let's go ahead and take this over here and take our recipe features. And so if we decided to use all the recipe features, you'll see that it makes a nice column of different data. So it just strips out all the labels and everything. We just have just the values. But because we want to be able to view this easily in a plot later on, we'll go ahead and take that and just do flour and sugar. And we'll run that. And you'll see it's just the two columns. So the next step is to go ahead and fit our model. We'll go ahead and just call it model. And it's a SVM. We're using a package called SVC. In this case, we're going to go ahead and set the kernel equals linear. So, it's using a specific setup on there. And if we go to the reference on their website for the SVM, you'll see that there's about there's eight of them here. Three of them are for regression. Three are for classification. The SVC, support vector classification, is probably one of the most commonly used. And then there's also one for detecting outliers and another one that has to do with something a little bit more specific on the model. But SBC and SVR are the two most commonly used standing for support vector classifier and support vector regression. Remember regression is an actual value, a float value or whatever you're trying to work on. And SBC is a classifier. So it's a yes, no, true, false. But for this we want to know 01 muffin cupcake. We go ahead and create our model. And once we have our model created, we're going to do model.fit. And this is very common, especially in the sklearn. All their models are followed with the fit command. And what we put into the fit, what we're training with it is we're putting in the ingredients, which in this case we limited to just flour and sugar, and the type label. Is it a muffin or cupcake? Now, in more complicated data science series, you'd want to split into, we won't get into that today, where you split it into training data and test data. And they even do something where they split it into thirds, where a third is used for where you switch between which one's training and test. There's all kinds of things go into that. It gets very complicated when you get to the higher end. Not overly complicated, just an extra step, which we're not going to do today because this is a very simple set of data. And let's go ahead and run this. And now we have our model fit. And uh I got an error here. So let me fix that real quick. It's capital SPC. It turns out I did it lowerase. Support vector classifier. There we go. Let's go ahead and run that. And you'll see it comes up with all this information that it prints out automatically. These are the defaults of the model. You notice that we changed the kernel to linear. And there's our kernel linear on the printout. And there's other different settings you can mess with. We're going to just leave that alone for right now. For this, we don't really need to mess with any of those. So, next we're going to dig a little bit into our newly trained model. And we're going to do this so we can show you on a graph. And let's go ahead and get the separating. We're going to say uh we're going to use a W for our variable on here and we're going to do model.coreefficient_0. So what the heck is that? Again, we're digging into the model. So we've already got a prediction and a train. This is a math behind it that we're looking at right now. And so the w is going to represent two different coefficients. And if you remember, we had y = mx + c. So these coefficients are connected to that but in two-dimensional it's a plane. We don't want to spend too much time on this because you can get lost in the confusion of the math. So if you're a math wiz this is great. You can go through here and you'll see that we have a equals minus w of 0 over w of 1. Remember there's two different values there. And that's basically the slope that we're generating. And then we're going to build an xx. What is xx? We're going to set it up to a numpy array. There's our np line space. So we're creating a line of values between 30 and 60. So it just creates a set of numbers for x. And then if you remember correctly, we have our formula y equ= the slope * x plus the intercept. Well, to make this work, we can do this as y equals the slope times each value in that array. That's the neat thing about numpy. So, when I do a * xx, which is a whole numpy array of values, it multiplies a across all of them. And then it takes those same values and we subtract the model intercept. That's your uh we had mx plus c. So, that'd be the c from the formula yals mx plus c. And that's where all these numbers come from. A little bit confusing because it's digging out of these different arrays. And then we want to do is we're going to take this and we're going to go ahead and plot it. So plot the parallels to separating hyper plane that pass through the support vectors. And so we're going to create B equals a model support vectors. Pulling our support vectors out there. Here's our y, which we now know is a set of data. And we have uh we're going to create y down = a * xx + b1 - a * b 0. And then model support vector b is going to be set that to a new value, the minus1 setup. And y up = a * xx + b1 - a * b 0. And we can go ahead and just run this to load these variables up. If you wanted to know understand a little bit more what's going on, you can see if we print y, let me just run that. You can see it's an array. This is a line. It's going to have in this case between 30 and 60. So there's going to be 30 variables in here. And the same thing with y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y up, y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y down and we'll we'll plot those in just a minute on a graph so you can see what those look like. Just go ahead and delete that out of here and run that. So, it loads up the variables. Nice clean slate. I'm just going to copy this from before. Remember this? Our SNS, our Seabor plot, LM plot, flower, sugar. And I'll just go and run that real quick so you can see what remember what that looks like. It's just a straight graph on there. And then one of the neat things is because Seabor sits on top of pipplot, we can do the piplot for the line going through. And that is simply plt.plot And that's our xx and y, our two corresponding values, x y. And then somebody played with this to figure out that the line width equals 2 and the color black would look nice. So let's go ahead and run this whole thing with the pie plot on there. And you can see when we do this, it's just doing flour and sugar on here. Corresponding line between the sugar and the flour and the muffin versus cupcake. Um, and then we generated the U support vectors, the y down and y up. So let's take a look and see what that looks like. So we'll do our plot. And again, this is all against xx or our x value, but this time we have y down. And let's do something a little fun with this. We can put in a k dash dash. That just tells it to make it a dotted line. And if we're going to do the down one, we also want to do the up one. So here's our y up. And when we run that, it adds both sets of line. And so here's our support. And this is what you expect. You expect these two lines to go through the nearest data point. So the dash lines go through the nearest muffin and the nearest cupcake when it's plotting it. And then your SVM goes right down the middle. So it gives it a nice split in our data. And you can see how easy it is to see based just on sugar and flour which one's a muffin or a cupcake. Let's go ahead and create a function to predict muffin or cupcake. I've got my um recipes I pulled off the um internet and I want to see the difference between a muffin or a cupcake. And so we need a function to push that through. And we create a function with deaf. And let's call it muffin or cupcake. And remember, we're just doing flour and sugar today. We're not doing all the ingredients. And that actually is a pretty good split. You really don't need all the ingredients to know it's flour and sugar. And let's go ahead and do an if else statement. So if model predict is of flour and sugar equals zero. So we take our model and we do run a predict. It's very common in sklearn where you have a predict. You put the data in and it's going to return a value. In this case if it equals zero then print you're looking at a muffin recipe. Else if it's not zero that means it's one and you're looking at a cupcake recipe. That's pretty straightforward for function or def for definition. Def is how you do that in Python. And of course, if you're going to create a function, you should run something in it. And so, let's run a cupcake. And we're going to send it values 50 and 20. A muffin or a cupcake. I don't know what it is. And let's run this and just see what it gives us. It says, "Oh, it's a muffin. You're looking at a muffin recipe." So, it very easily predicts whether we're looking at a muffin or a cupcake recipe. Let's plot this. There we go. Plot this on the graph so we can see what that actually looks like. And I'm just going to copy and paste it from below where we're plotting all the points in there. So, this is nothing different than we did before. If I run it, you'll see it has all the points and the lines on there. And what we want to do is we want to add another point. And we'll do pltot. And if you remember correctly, we did for our test we did 50 and 20. And then somebody went in here and decided we'll do YO for yellow or it's kind of a orangeish yellow color is going to come up. Marker size nine. Those are settings you can play with. Somebody else played with them to come up with the right setup so it looks good. And you can see there it is graphed. Clearly a muffin. In this case in cupcakes versus muffins, the muffin has won. And if you'd like to do your own muffin cupcake contender series, you certainly can send a note down below and the team at SimplyLearn will send you over the data they used for the muffin and cupcake. And that's true of any of the data. We didn't actually run a plot on it earlier. We had men versus women. You can also request that information to run it on your data setup. So you can test that out. So to go back over our setup, we went ahead for our support vector machine code. We did a predict 40 parts flour, 20 parts sugar. I think it was different than the one we did whether it's a muffin or a cupcake. Hence, we have built a classifier using SPM which is able to classify if a recipe is of a cupcake or a muffin. Which wraps up our cupcake versus muffin. So the key takeaways, what is machine learning? We discussed that with some of the different aspects of machine learning on there. We went into types of machine learning. If you memorize we have supervised, unsupervised and reinforcement learning. We discussed regression line or best fit and we did the building a decision tree and what the logic is behind that. And finally we did classification using SVM support vector machine and we did the code in there. Today we are diving into machine learning, the technology behind things like Netflix recommendations, CD, and even the face unlock of your phone. Machine learning helps devices get smarter by learning from data and predicting what we might like or need. And here's why machine learning is huge for your career. Right now, machine learning jobs are among the fastest growing roles worldwide. Companies in every industry, tech, healthcare, finance, and more, are looking for people with machine learning skills to improve their products, automate tasks, and make smarter decisions. Machine learning engineers in the US earn around $112,000 on average with plenty of room for growth as you gain experience. So, if you want to jump into this exciting field, learning machine learning can open doors to highpaying in demand jobs. So in this video I'll guide you through the ultimate road map to master machine learning in 2025 one step at a time. So let's get started. So in the first month start with the foundations of programming. So programming is a language you'll use to communicate with your computer and bring machine learning algorithms to life. So this month is all about Python, the language of choice for most machine learning practitioners. So here's what to focus on. First, learn Python basics. Begin with Python's fundamentals like variables, data types, loops and functions. So spend time writing small programs daily to get comfortable. After that explore the key libraries like numpy, pandas and scikitlearn. So numpy is for numerical operations. It makes handling large data sets faster and easier. And pandas is to manipulate and analyze data. So pandas allow you to filter, sort and reshape data in a breeze. And then scikitlearn is for implementing algorithms in just a few lines of code. So now you might have heard about R, another language used in machine learning. But don't stress about it now. Python will serve you well, especially as a beginner because it's simpler and more flexible. So aim to spend an hour or two each day coding. By the end of this month, you'll have a solid base to build on. Now, in the second month, get organized with version control and data structures. So this month is about learning how to organize and manage your code effectively and sharpening your problem solving skills with data structures and algorithms. So first is version control with git. So think of git as your project history tracker. So imagine working on a big project and making changes then realizing something went wrong. You want to go back to an earlier version, right? So that's where git comes in. And here's what you should practice. Number one is committing changes. So save different versions of your work as you progress. And then branching which means work on separate features without affecting your main code. And then comes merging which means combining changes from different versions once they are ready. So you have to set up an account on GitHub or GitLab to store your projects online. So not only will this be super useful, but it'll also start building your portfolio. Now next is data structures and algorithm. So think of data structures like tools in a toolkit. So each one like arrays, stacks, cues, etc. serves a specific purpose. So here's how to approach them. Number one, arrays and lists. Now arrays and lists are for storing data in sequence. After that, you can get familiar with stacks and cues. So stacks and cues are for tasks that need order data access. And then you have sorting and searching algorithms. So these make your programs more efficient. And that's super important in machine learning where data can get massive. So the goal here is to build up your problem solving skills which are key to machine learning success. So take it slow, practice daily and you'll see progress. Now in the third month, learn to access data with SQL. So in machine learning, a lot of work involves accessing and organizing data from databases. So SQL, a structured query language, is your ticket to getting the data you need for training ML models. So here's what you should focus on. Select and where. So these commands help you pull specific pieces of data and then you can move on to joins. Joins usually combine data from different tables. So this is so powerful that you'll use it all the time. And then comes group by and aggregate functions. They are great for summarizing data to find patterns. So spend time working with sample databases you can find online and practice writing queries. Being comfortable with SQL will save you time when preparing data for your models. Now after completing the third month you can move on to mathematics which is building your analytical mind. So this month we are tackling the math behind machine learning. So don't worry you don't need to be a math genius but understanding certain concepts will make everything feel less mysterious. So in this month you have to focus on linear algebra. So this is the math behind how models see data. So you can study vectors, matrices and operations like multiplication. Next comes calculus. So you'll use calculus to help your models learn. So you have to focus on derivatives and gradients which help minimize errors in your model. And then you can move on to probability and statistics. So understanding probability helps you make sense of data. So learn about distributions like normal distribution, boromal distribution and then variance and standard deviation. So once you have learned maths, next you'll be moving on to data handling and visualization which is the heart of machine learning as you all know. So with Matt under your belt, it's time to dig into data handling and visualization. So data preparation is vital because your model is only as good as the data you feed it. So number one comes data manipulation. So using pandas and numpy, you'll clean and organize your data. You might be removing missing values like clean up messy data so it doesn't confuse your model. And then you will learn transforming variables like converting data into formats that work for models. And then you will move on to encoding categorical data like changing text data like female or male into numbers. Now once you're done with data manipulation, next comes data visualization. So visualization is how you get to see your data before training a model. So here you have to learn mattplot lip and seabboard. So you can create line charts, histograms, scatter plots and heat maps. So this lets you explore patterns and spot outliers. So understanding these patterns in your data is crucial for building effective models. Now in the sixth month you'll be moving on to the machine learning fundamentals. So now it's time to start building your own models. So you'll focus on two main types of machine learning. This month number one comes the supervised learning. So this is when you train a model on label data where the outcome is already known. So you'll work with algorithms like linear regression which predicts a continuous outcome. Then you'll work with decision trees which breaks down decisions into a tree structure. And then you have support vector machines under supervised learning which updates data into classes. Now after supervised learning comes unsupervised learning. So here your model identifies patterns in data without labeled outcomes. So two popular techniques in unsupervised learning is number one clustering like K means clustering which means group similar data points and then you have dimensionality reduction. This reduces data complexity by focusing on key features. So you can use scikitlearn to try out these algorithms on sample data sets. So this will give you hands-on experience with model training and you will learn to fine-tune them to get better results. Now before moving on, if you are interested in advancing your career in the field of AI and machine learning, Similar's postgraduate program delivered in collaboration with Peru University and IBM is a perfect opportunity. This highly ranked program offers a comprehensive curriculum covering essential topics like machine learning, deep learning, NLP, computer vision, reinforcement learning, generative AI, prompt engineering, and many more. With hands-on experience to 25 plus projects and access to 20 plus cutting edge tools, you will gain the skills needed to excel in today's competitive job market. So join now and elevate your expertise with the backing of produce academic excellence and IBM's industryleading insight. You can find the course link in the description box and pin comments. Now moving on to the seventh month, you'll be building and training models with advanced libraries. So by now you have experimented with some basic models. So let's step it up with advanced tools like TensorFlow and PyTorch. So these libraries offer more flexibility and power. So TensorFlow and PyTorch. So here you can start with simple models and work your way up. So these libraries allow for building neural networks which you'll be studying more on the next month. Now once you have become familiar with TensorFlow and PyTorch, you can move on to model training and evaluation. So you have to learn to split data into training and testing sets and evaluate models using metrics like accuracy and precision. So your goal this month should be to get comfortable with these libraries and understand how they handle data and model training behind the scenes. So once you are done with this, you'll be moving on to the eighth month where you'll be dealing with advanced machine learning. So this month's concept will be number one on ensemble learning which means combining multiple models to get better predictions. So here you'll be learning about bagging for example random forests. Here multiple decision trees make predictions and then you have boosting like ADA boost XG boost. So models learn from each other's mistakes over here. And after ensemble learning comes deep learning. So here you explore neural networks which mimic the human brain. So you learn about neural network basics. So you can start with simple fully connected networks and then you can move on to back propagation and gradient descent. So these helps your model learn and improve. So you can use tensorflow or pytorch to practice building neural networks. So you can work on projects to reinforce these concepts. Now moving on you have two specialize on topics like NLP and computer vision. So machine learning applications are so powerful and here you'll get a taste of two major fields which is NLP or natural language processing. So here they work with text data with tasks like sentiment analysis and text classification. So you can start with basic pre-processing like tokenization, stop word removal and move to building simple NLP models. After that you can try computer vision. So for image data you have to learn CNN, convolutional neural networks. So these network analyze visual patterns making them ideal for image classification. So you practice with open data sets like text, documents or images and apply the concepts you will learn to see results in real world applications. Now in the 10th month you'll be dealing with model deployment which is bringing your models to life. So here you'll be using Flask or Django. So you can use these frameworks to create a web API so users can interact with your model. For example, build a web app that lets people upload images for classification. And then you can also try out Docker. So package your model and its dependencies so it can run on any machine. So this is super helpful for deploying models without compatibility issues. So by the end of this month, you'll be able to share your models with the world. So moving on to the 11th month, you'll be starting with cloud and production. So this month, you'll learn how to deploy models on the cloud and ensure they perform well in real world environments. So you'll be dealing with cloud platforms like AWS, Google Cloud or Azure. So you have to learn to deploy models of the cloud provider accessibility and scalability. And then comes monitoring and maintenance. So understand how to track your models performance over time and update it as needed. So these skills are essential for maintaining models in production and ensuring they stay reliable. And finally you'll be creating real world projects and portfolio building. So here you have to choose topics that interest you and showcase your skills. So first you can start with full projects. So complete projects that go from data cleaning and model building to deployment. So ideas could be a sentiment analysis tool or an image recognition app. And then you have to build your portfolio. So organize and document your projects, host them on GitHub and create an online portfolio to share with potential employers or collaborators. So by following this road map, you'll be well prepared to handle real world machine learning challenges and have an impressive portfolio to show for it. >> Welcome to machine learning tutorial part two. My name is Richard Kersner with the SimplyLearn team. That is www.simplearn.com. Get certified, get ahead. Today in our second tutorial, we're going to cover K means and linear regression along with going over the quiz questions we had during our first tutorial. What's in it for you? We're going to cover clustering. What is clustering? K means clustering which is one of the most common used clustering tools out there including a flowchart to understand K means clustering and how it functions and then we'll do an actual Python live demo on clustering of cars based on brands. Then we're going to cover logistic regression. What is logistic regression? Logistic regression curve and sigmoid function. And then we'll do another Python code demo to classify a tumor as malignant or benign based on features. And let's start with clustering. Suppose we have a pile of books of different genres. Now we divide them into different groups like fiction, horror, education, and as we can see from this young lady, she definitely is into heavy horror. You can just tell by those eyes and the maple Canadian leaf on her shirt. But we have fiction, horror, and education. And we want to go ahead and divide our books up. Well, organizing objects into groups based on similarity is clustering. And in this case, as we're looking at the books, we're talking about clustering things with known categories. But you can also use it to explore data. So you might not know the categories. You just know that you need to divide it up in some way to conquer the data and to organize it better. But in this case, uh we're going to be looking at clustering in specific categories. And let's just take a deeper look at that. We're going to use K means clustering. K means clustering is probably the most commonly used clustering tool in the machine learning library. K means clustering is an example of unsupervised learning. If you remember from our previous thing, it is used when you have unlabeled data. So we don't know the answer yet. We have a bunch of data that we want to cluster to different groups. Define clusters in the data based on feature similarity. So we've introduced a couple terms here. We've already talked about unsupervised learning and unlabeled data. So we don't know the answer yet. We're just going to group stuff together and see if we can find an unanswer connect. We've also introduced feature similarity. Features being different features of the data. Now, with books, we can easily see fiction and horror and history books, but a lot of times with data, some of that information isn't so easy to see right when we first look at it. And so, K means is one of those tools where we can start finding things that connect that match with each other. Suppose we have these data points and want to assign them into a cluster. Now when I look at these data points, I would probably group them into two clusters just by looking at them. I'd say two of these group of data kind of come together. But in K means, we pick Kclusters and assign random centrids to clusters where the K clusters represents two different clusters. We pick K clusters and assign random centroidids to the clusters. Then we compute distance from objects to the centrids. Now we form new clusters based on minimum distances and calculate the centrids. So we figure out what the best distance is for the centrid. Then we move the centrid and recalculate those distances. Repeat previous two steps iteratively till the cluster centroid stop changing their positions and become static. Repeat previous two steps iteratively till the cluster centroid stop changing and the positions become static. Once the clusters become static, then K means clustering algorithm is said to be converged. And there's another term we see throughout machine learning is converged. That means whatever math we're using to figure out the answer has come to a solution or it's converged on an answer. Shall we see the flowchart to understand make a little bit more sense by putting it into a nice easy step by step? So we start, we choose K. We'll look at the elbow method in just a moment. We assign random centrids to clusters. And sometimes you pick the centrids because you might look at the data in a in a graph and say, "Ah, these are probably the central points. Then we compute the distance from the objects to the centrids. We take that and we form new clusters based on minimum distance and calculate their centrids. Then we compute the distance from objects to the new centrids. And then we go back and repeat those last two steps. We calculate the distances. So as we're doing it, it brings into the new centroidid and then we move the centrid around and we figure out what the best which objects are closest to each centrid. So the objects can switch from one centroid to the other as the centroidids are moved around and we continue that until it is converged. Let's see an example of this. Suppose we have this data set of seven individuals and their score on two topics A and B. Uh so here's our subject in this case referring to the person taking the uh test and then we have subject A where we see what they've scored on their first subject and we have subject B and we can see what they score on the second subject. Now let's take two farthest apart points as initial cluster centroidids. Now remember we talked about selecting them randomly or we can also just put them in different points and pick the furthest one apart so they move together. Either one works okay depending on what kind of data you're working on and what you know about it. So we took the two furthest points one and one and five and seven. And now let's take the two farthest apart points as initial cluster centrids. Each point is then assigned to the closest cluster with respect to the distance from the centrids. So we take each one of these points in there. We measure that distance. And you can see that if we measured each of those distances and you use the the Pythagorean theorem for a triangle in this case because you know the x and the y and you can figure out the diagonal line from that or you just take a ruler and put it on your monitor. That'd be kind of silly but it would work if you're just eyeballing it. You can see how they naturally come together in certain areas. Now we again calculate the centroidids of each cluster. So cluster one and then cluster two and we look at each individual dot. There's one, two, three. We're in one cluster. Uh the centrid then moves over. It becomes 1.8 comma 2.3. So remember it was at 1 and one. Well, the very center of the data we're looking at would put it at the one point roughly 22, but 1.8 and 2.3. And the second one, if we wanted to make the overall mean vector, the average vector of all the different distances to that centrid, we come up with 4, 1, and 54. So we've now moved the centrids. We compare each individual's distance to its own cluster mean and to that of the opposite cluster and we find build a nice chart on here that the as we move that centrid around we now have a new different kind of clustering of groups and using ukitian distance between the points and the mean we get the same formula. You see new formulas coming up. So we have our individual dots distance to the mean centr of the cluster and distance to the mean centrid of the cluster. only individual three is nearer to the mean of the opposite cluster cluster two than its own cluster one. And you can see here in the diagram where we've kind of circled that one in the middle. So when we've moved the clust the centroidids of the clusters over one of the points shifted to the other cluster because it's closer to that group of individuals. Thus individual 3 is relocated to cluster two resulting in a new partition. And we regenerate all those numbers of how close they are to the different clusters. For the new clusters, we will find the actual cluster centroidids. So now we move the centrids over and you can see that we've now formed two very distinct clusters on here. On comparing the distance of each individual's distance to its own cluster mean and to that of the opposite cluster, we find that the data points are stable. Hence we have our final clusters. Now if you remember I brought up a concept earlier K mean on the K means algorithm. Choosing the right value of K will help in less number of iterations. And to find the appropriate number of clusters in a data set, we use the elbow method. And within sum of squares, WSS is defined as the sum of the squared distance between each member of the cluster in its centrid. And so you see we've done here is we have the number of clusters. And as you do the same K means algorithm over the different clusters and you calculate what that centrid looks like and you find the optimal you can actually find the optimal number of clusters using the elbow the graph is called as the elbow method. And on this we guessed at two just by looking at the data. But as you can see the slope you actually just look for right there where the elbow is in the slope and you have a clear answer that we want two different to start with K means equals two. A lot of times people end up computing k means equals 2, three, four, five until they find the value which fits on the elbow joint. Sometimes you can just look at the data and if you're really good with that specific domain. Remember domain I mentioned that last time you'll know that that where to pick those numbers and where to start guessing at what that k value is. So let's take this and we're going to use a use case using K means clustering to cluster cars into brands using parameters such as horsepower, cubic inches, make, year, etc. So we're going to use the data set cars data having information about three brands of cars, Toyota, Honda, and Nissan. We'll go back to my favorite tool, the Anaconda Navigator with the Jupiter Notebook. And let's go ahead and flip over to our Jupyter Notebook. And in our Jupyter notebook, I'm going to go ahead and just paste the uh basic code that we usually start a lot of these off with. We're not going to go too much into this code because we've already discussed numpy. We've already discussed mapplot library and pandas. Numpy being the number array, pandas being the pandas data frame and mattplot for the graphing. And don't forget uh since if you're using the Jupyter notebook, you do need the mattplot library in line so that it plots everything on the screen. If you're using a different Python editor, then you probably don't need that because it'll have a popup window on your computer. And we'll go ahead and run this just to load our libraries and our setup into here. The next step is, of course, to look at our data, which I've already opened up in a spreadsheet. And you can see here we have the miles per gallon, cylinders, cubic inches, horsepower, weight pounds, how, you know, how heavy it is, time it takes to get to 60. My card is probably on this one at about 80 or 90 what year it is. So this is you can actually see this is kind of older cars and then the brand Toyota, Honda, Nissan. So the different cars are coming from all the way from 1971. If we scroll down to uh the 80s we have between the 70s and 80s a number of cars that they've put out. And let's uh we come back here. We're going to do importing the data. So we'll go ahead and do data set equals and we'll use pandas to read this in. and it's uh from a CSV file. Remember, you can always post this in the comments and request the data files for these either in the comments here on the YouTube video or go to simplylearn.com and request that. The car CSV, I put it in the same folder as the code that I've stored. So, my Python code is stored in the same folder. So, I don't have to put the full path. If you store them in different folders, you do have to change this. And double check your name variables. And we'll go ahead and run this. And uh we've chosen data set arbitrarily because you know it's a data set we're importing. And we've now imported our car CSV into the data set. As you know, you have to prep the data. So we're going to create the X data. This is the one that we're going to try to figure out what's going on with. And then there's a number of ways to do this, but we'll do it in a simple loop so you can actually see what's going on. So we'll do for I NX.c columns. So we're going to go through each of the columns. And a lot of times it's important I I'll make lists of the columns and do this because I might remove certain columns or there might be columns that I want to be processed differently. But for this we can go ahead and take x of i and we want to go fill na and that's a pandas command. But the question is what are we going to fill the missing data with? We definitely don't want to just put in a number that doesn't actually mean something. And so one of the tricks you can do with this is we can take x of i. And in addition to that, we want to go ahead and turn this into an integer because a lot of these are integers. So we'll go ahead and keep it integers. And we add the bracket here. And a lot of editors will do this. They'll think that you're closing one bracket. Make sure you get that second bracket in there if it's a double bracket. That's always something that happens regularly. So once we have our integer of x of yi, this is going to fill in any missing data with the average. And I was so busy closing one set of brackets, I forgot that the mean is also has brackets in there for the pandas. So we can see here we're going to fill in all the data with the average value for that column. So if there's missing data is in the average of the data it does have. Then once we've done that, we'll go ahead and loop through it again and just check and see to make sure everything is filled in correctly. And we'll print. And then we take x is null. And this returns a set of the null value or the how many lines are null. And we'll just sum that up to see what that looks like. And so when I run this and so with the X, what we want to do is we want to remove the last column because that had the models. That's what we're trying to see if we can cluster these things and figure out the models. There is so many different ways to sort the X out. For one, we could take the X and we could go data set, our variable we're using, and use the eyelocation, one of the features that's in pandas, and we could take that and then take all the rows and all but the last column of the data set. And at this time, we could do values. We just convert it to values. So, that's one way to do this. And if I let me just put this down here and print X, it's a capital X we chose. and I run this, you can see it's just the values. We could also take out the values and it's not going to return anything because there's no values connected to it. What I like to do with this is instead of doing the location which does integers more common is to come in here and we have our data set and we're going to do data set dot or data set columns. And remember that lists all the columns. So if I come in here, let me just mark that as red and I print data set.c columns. You can see that I have my index here. I have my MPG cylinders everything including the brand which we don't want. So the way to get rid of the brand would be to do data columns of everything but the last one minus one. So now if I print this, you'll see the brand disappears. And so I can actually just take data set columns minus one and I'll put it right in here for the columns we're going to look at. And let's unmark this. And unmark this. And now if I do an X.Ahead, I now have a new data frame. And you can see right here we have all the different columns except for the brand at the end of the year. And it turns out when you start playing with the data set, you're going to get an error later on and it'll say cannot convert string to float value. And that's because it for some reason these things the way they recorded them must have been recorded as strings. So we have a neat feature in here on pandas to convert. And it is simply convert objects. And for this, we're going to do convert. Oops. Convert underscore numeric numeric equals true. And yes, I did have to go look that up. I don't have it memorized the convert numeric in there. If I'm working with a lot of these things, I remember them, but um depending on where I'm at, what I'm doing, I usually have to look it up. And we run that. Oops, I must have missed something in here. Let me double check my spelling. And when I double check my spilling, you'll see I missed the first underscore in the convert objects. And when I run this, it now has everything converted into a numeric value because that's what we're going to be working with is numeric values down here. And the next part is that we need to go through the data and eliminate null values. Most people when they're doing small amounts, you working with small data pools discover afterwards that they have a null value and they have to go back and do this. So, you know, be aware whenever we're formatting this data, things are going to pop up and sometimes you go backwards to fix it. And that's fine. That's just part of exploring the data and understanding what you have. And I should have done this earlier, but let me go ahead and increase the size of my window one notch. There we go. Easier to see. So, we'll do 4 I in working with X dot columns. will page through all the columns. And we want to take X of I and we're going to change that. We're going to alter it. And so with this, we want to go ahead and fill in X of I. Pandas has the fill in a. And that just fills in any non-existent missing data. And we'll put my brackets up. And there's a lot of different ways to fill this data. If you have a really large data set, some people just void out that data because if and then look at it later in a separate exploration of data. One of the tricks we can do is we can take our column and we can find the means and the means is in there or quotation marks. So we take the columns, we're going to fill in the non-existing one with the means. The problem is that returns a decimal float. So some of these aren't decimals certainly. Let me be a little careful of doing this, but for this example, we're just going to fill it in with the integer version of this. Keeps it on par with the other data that isn't a decimal point. And then what we also want to do is we want to double check. A lot of times you do this first part first to double check, then you do the fill, and then you do it again just to make sure you did it right. So, we're going to go through and test for missing data. And one of the re ways you can do that is simply go in here and take our X of I column. So it's going to go through the X of I column. It says is null. So it's going to return any any place there's a null value. It actually goes through all the rows of each column is null. And then we want to go ahead and sum that. So we take that, we add the sum value. And these are all pandas. So is null is a panda command and so is sum. And if we go through that and we go ahead and run it and we go ahead and take and run that, you'll see that all the columns have zero null values. So we've now tested and double checked and our data is nice and clean. We have no null values. Everything is now a number value. We turned it into numeric and we've removed the last column in our data. And at this point, we're actually going to start using the elbow method to find the optimal number of clusters. So, we're now actually getting into the sklearn part. Uh, the K means clustering on here. I guess we'll go ahead and zoom it up one more knot so you can see what I'm typing in here. And then from sklearn going to or sklearn cluster, we're going to import K means. I always forget to capitalize the K and the M when I do this. So, it's capital K, capital M, K means. And we'll go and create a um array WCSS equals we'll make it an empty array. If you remember from the elbow method from our slide within the sums of squares, WSS is defined as the sum of squared distance between each member of the cluster and its centrid. So, we're looking at that change in differences as far as a squared distance. And we're going to run this over a number of K mean values. In fact, let's go for I in range. We'll do 11 of them. Range zero of 11. And the first thing we're going to do is we're going to create the actual we'll do it all lowercase. And so we're going to create this object from the K means that we just imported. And the variable that we want to put into this is in clusters. We're going to set that equals to I. That's the most important one because we're looking at how increasing the number of clusters changes our answer. There are a lot of settings to the K means. Our guys in the back did a great job just kind of playing with some of them. The most common ones that you see in a lot of stuff is how you enit your K means. So we have K means plus plus. This is just a tool to let the model itself be smart how it picks it centrids to start with its initial centroidids. We only want to iterate no more than 300 times. We have a max iteration we put in there. We have the infinite the random state equals zero. You really don't need to worry too much about these when you're first learning this. As you start digging in deeper, you start finding that these are shortcuts that will speed up the process as far as a setup. But the big one that we're working with is the inclusters equals I. So, we're going to literally train our K means 11 times. We're going to do this process 11 times. And if you're working with big data, you know, the first thing you do is you run a small sample of the data so you can test all your stuff on it. And you can already see the problem that if I'm going to iterate through a terabyte of data 11 times and then the K means itself is iterating through the data multiple times. That's a heck of a process. So you got to be a little careful with this. A lot of times though you can find your elbow using the elbow method. Find your optimal number on a sample of data especially if you're working with larger data sources. So we want to go ahead and take our K means and we're just going to fit it. If you're looking at any of the sklearn, very common that you fit your model. And if you remember correctly, our variable we're using is the capital X. And once we fit this value, we go back to the array we made. And we want to go and just append that value on the end. And it's not the actual fit we're pinning in there. It's like when it generates it, it generates the value you're looking for is inertia. So k means.inertia will pull that specific value out that we need. And let's get a visual on this. We'll do our PLT plot. And what we're plotting here is first the x axis, which is range 0 11. So that will generate a nice little plot there. And the wcss for our y axis. It's always nice to give our uh plot a title. And let's see, we'll just give it the elbow method for the title. And let's get some labels. So let's go ahead and do PLT X label. And what we'll do, we'll do number of clusters for that. And PLT Y label. And for that, we can do oops, there we go. WCSS since that's what we're doing on the plot on there. And finally, we want to go ahead and display our graph, which is simply plt. Oops. Show. There we go. And because we have it set to inline, it'll appear inline. Hopefully I didn't make a type error on there. And you can see we get a very nice graph. You can see a very nice elbow joint there at uh two and again right around three and four. And then after that there's not very much. Now as a data scientist, if I was looking at this, I would do either three or four. And I'd actually try both of them to see what the u output look like. And they've already tried this in the back. So, we're just going to use three as a setup on here. And let's go ahead and see what that looks like when we actually use this to show the different kinds of cars. And so, let's go ahead and apply the K means to the cars data set. And basically, we're going to copy the code that we loop through up above where K means equals K means number of clusters. And we're just going to set the number of clusters to three since that's what we're going to look for. And you could do three and four on this and graph them just to see how they come up differently. It'd be kind of curious to look at that. But for this, we're just going to set it to three. Go ahead and create our own variable Y K means for our answers. And we're going to set that equal to Whoops, I double equal there to K means. But we're not going to do a fit. We're going to do a fit predict is the setup you want to use. And when you're using untrained models, you'll see um a slightly different because usually you see fit and then you see just the predict. But we want to both fit and predict the k means on this. And that's fit underscore predict. And then our capital x is the data we're working with. And before we plot this data, we're going to do a little pandas trick. We're going to take our x value and we're going to set x as matrix. So we're converting this into a nice rows and columns kind of setup. But we want the we're going to have columns equals none. So it's just going to be a matrix of data in here. And let's go ahead and run that. Get a little warning. You'll see this warnings pop up because things are always being updated. So there's like minor changes in the versions and future versions. Let's set a matrix now that it's more common to set it values instead of doing as matrix. But mass matrix works just fine for right now and you'll want to update that later on. But let's go ahead and dive in and plot this and see what that looks like. And before we dive into plotting this data, I always like to take a look and see what I am plotting. So let's take a look at why K means. I'm just going to print that out down here. And we see we have an array of answers. We have 2 1 0 2 1 2. So it's clustering these different rows of data based on the three different spaces it thinks it's going to be. And then let's go ahead and print X and see what we have for X. And we'll see that X is an array. It's a matrix. So we have our different values in the array. And what we're going to do, it's very hard to plot all the different values in the array. So we're only going to be looking at the first two or positions zero and one. And if you were doing a full presentation in front of the board meeting, you might actually do a little different than and dig a little deeper into the different aspects because this is all the different columns we looked at. But we'll only look at columns one and two for this to make it easy. So let's go ahead and clear this data out of here and let's bring up our plot. And we're going to do a scatter plot here. So pl scatter. And this looks a little complicated. So let's explain what's going on with this. We're going to take the x values and we're only interested in y of k means equals 0, the first cluster. Okay? And then we're going to take value zero for the x-axis. And then we're going to do the same thing here. We're only interested in k means equals 0, but we're going to take the second column. So we're only looking at the first two columns in our answer or in the data. And then the guys in the back played with this a little bit to make it pretty. And they discovered that it looks good with as a size equals 100. That's the size of the dots. We're going to use red for this one. And when they were looking at the data and what came out, it was definitely the Toyota on this. We're just going to go ahead and label it Toyota. Again, that's something you really have to explore in here as far as playing with those numbers and see what looks good. We'll go ahead and hit enter in there. And I'm just going to paste in the next two lines, which is the next two cars. And this is our Nissa and Honda. And you'll see with our scatter plot, we're now looking at where Y_K means equals 1. And we want the zero column and YK means equals 2. Again, we're looking at just the first two columns, zero and one. And each of these rows then corresponds to Nissan and Honda. And I'll go ahead and hit enter on there. And uh finally, let's take a look and put the centrids on there. Again, we're going to do a scatter plot. And on the centrids, you can just pull that from our K means, the uh model we created dotcluster centers. And we're going to just do um all of them in the first number and all of them in the second number, which is 01 because you always start with zero and one. And then they were playing with the size and everything to make it look good. We'll do a size of 300. We're going to make the color yellow. And we'll label them. It's always good to have some good labels. Centroidids. And then we do want to do a title. PLT title. And pop up there. PLT title. So you always make want to make your graphs look pretty. We'll call it clusters of carmake. And one of the features of the plot library is you can add a legend. It'll automatically bring in it since we've already labeled the different aspects of the legend with Toyota, Nissan, and Honda. And finally, we want to go ahead and show so we can actually see it. And remember, it's in line. Uh so if you're using a different editor that's not the Jupyter notebook, you'll get a popup of this. And you should have a nice set of clusters here. So we can look at this and we have a clusters of Honda in green, Toyota in red, Nissan in purple. And you can see where they put the centroidids to separate them. Now when we're looking at this, we can also plot a lot of other different data on here as far because we only looked at the first two columns. This is just column one and two or 01 as as you label them in computer scripting. But you can see here we have a nice clusters of car making. and we were able to pull out the data and you can see how just these two columns form very distinct clusters of data. So if you were exploring new data, you might take a look and say, well, what makes these different? Almost going in reverse, you start looking at the data and pulling apart the columns to find out why is the first group set up the way it is. Maybe you're doing loans and you want to go, well, why is this group not defaulting on their loans and why is the last group defaulting on their loans? and why is the middle group 50% defaulting on their bank loans? And you start finding ways to manipulate the data and pull out the answers you want. So now that you've seen how to use K mean for clustering, let's move on to the next topic. Now let's look into logistic regression. The logistic regression algorithm is the simplest classification algorithm used for binary or multiclassification problems. And we can see we have our little girl from Canada who's into horror books is back. That's actually really scary when you think about that with those big eyes. In the previous tutorial, we learned about linear regression, dependent and independent variables. So to brush up, y = mx + c. Very basic algebraic function of uh y and x. The dependent variable is the target class variable. We are going to predict the independent variables. X1 all the way up to XN are the features or attributes we're going to use to predict the target class. We know what a linear regression looks like. But using the graph, we cannot divide the outcome into categories. It's really hard to categorize 1.5, 3.6, 9.8. Uh for example, a linear regression graph can tell us that with increase in number of hours studied the marks of a student will increase but it will not tell us whether the student will pass or not. In such cases where we need the output as categorical value, we will use logistic regression and for that we're going to use the sigmoid function. So you can see here we have our marks 0 to 100 number of hours studied. That's going to be what they're comparing it to in this example. And we usually form a line that says y = mx + c. And when we use the sigmoid function, we have p = 1 / 1 + e the minus y. It generates a sigmoid curve. And so you can see right here when you take the ln, which is the natural logarithm. I always thought it should be nl not ln. That's just the inverse of uh e to the minus y. And so we do this, we get ln of p 1 - p = m * x + c. That's the sigmoid curve function we're looking for. And we can zoom in on the function and you'll see that the function as it deres goes to one or to zero depending on what your x value is. And the probability if it's greater than 0.5, the value is automatically rounded off to one indicating that the student will pass. So if they're doing a certain amount of studying, they will probably pass. Then you have a threshold value at the 0.5. It automatically puts that right in the middle usually. And your probability if it's less than 0.5, the value run it off to zero indicating the student will fail. So if they're not studying very hard, they're probably going to fail. This, of course, is ignoring the outliers of that one student who's just a natural genius and doesn't need any studying to memorize everything. That's not me, unfortunately. Have to study hard to learn new stuff. problem statement to classify whether a tumor is malignant or B9. And this is actually one of my favorite data sets to play with because it has so many features and when you look at them, you really are hard to understand. You can't just look at them and know the answer. So it gives you a chance to kind of dive into what data looks like when you aren't able to understand the specific domain of the data. But I also want you to remind you that in the domain of medicine, if I told you that my probability was really good at classified things that say 90% or 95% and I'm classifying whether you're going to have a malignant or a B9 tumor, I'm guessing that you're going to go get it tested anyways. So you got to remember the domain we're working with. So why would you want to do that if you know you're just going to go get a biopsy? Because you know it's that serious. This is like an all or nothing. just referencing the domain. It's important. It might help the doctor know where to look just by understanding what kind of tumor it is. So it might help them or aid them on something they missed from before. So let's go ahead and dive into the code and I'll come back to the domain part of it in just a minute. So use case and we're going to do our normal imports here where we're importing numpy, pandas, seabour, the mattplot library and we're going to do mattplot library in line since I'm going to switch over to anaconda. So, let's go ahead and flip over there and get this started. So, I've opened up a new window in my Anaconda Jupyter Notebook. And by the way, Jupyter Notebook, uh, you don't have to use Anaconda for the Jupyter Notebook. I just love the interface and all the tools that Anaconda brings. So, we got our import numpy as in P for our numpy number array. We have our pandas pd. We're going to bring in Seabor to help us with our graphs as SNS. So many really nice tools in both Seabour and Mattplot library. And we'll do our mattplot library.pipplot as plt. And then of course we want to let it know to do it in line. And let's go and just run that. So it's all set up. And we're just going to call our data data. Not creative today. Uh equals pd. And this happens to be in a CSV file. So we'll use a pdread csv. And I happen to name the file. I renamed it data forp2.csv. You can of course um write in the comments below the YouTube and request for the data set itself or go to the simply learn website and we'll be happy to supply that for you. And let's just um open up the data before we go any further and let's just see what it looks like in a spreadsheet. So when I pop it open in a local spreadsheet and this is just a CSV file, comma separated variables, we have an ID. So I guess the um categorizes for reference or what ID which test was done the diagnosis M for malignant B for B9. So there's two different options on there. And that's what we're going to try to predict is the M and B and test it. And then we have like the radius mean or average the texture average perimeter mean area mean smoothness. I don't know about you, but unless you're a doctor in the field, most of the stuff, I mean, you can guess what concave means just by the term concave, but I really wouldn't know what that means in the measurements they're taking. So, they have all kinds of stuff like how smooth it is, uh, the symmetry, and these are all float values. You just page through them real quick, and you'll see there's, I believe, 36, if I remember correctly in this one. So there's a lot of different values they take and all these measurements they take when they go in there and they take a look at the different growth, the tumorous growth. So back in our data and I put this in the same folder as a code. So I saved this code in that folder. Obviously if you have it in a different location, you want to put the full path in there and we'll just do uh pandas first five lines of data with the data head. When we run that, we can see that we have pretty much what we just looked at. We have an ID, we have a diagnosis. If we go all the way across, you'll see all the different columns coming across displayed nicely for our data. And while we're exploring the data, our uh Seabor, which we referenced as SNS, makes it very easy to go in here and do a joint plot. You'll notice the very similar to because it is sitting on top of the um plot library. So the joint plot does a lot of work for us and we're just going to look at the first two columns that we're interested in. The radius mean and the texture mean. We'll just look at those two columns and data equals data. So that tells it which two columns we're plotting and that we're going to use the data that we pulled in. Let's just run that. And it generates a really nice graph on here. And there's all kinds of cool things on this graph to look at. I mean, we have the texture mean and the radius mean obviously the axes. You can also see and uh one of the cool things on here is you can also see the histogram. They show that for the radius mean where is the most common radius mean come up and where the most common texture is. So we're looking at the tech the on each growth it's average texture and on each radius it's average uh radius on there gets a little confusing because we're talking about the individual objects average and then we can also look over here and see the the histogram showing us the median or how common each measurement is. And that's only two columns. So let's dig a little deeper into Seabor. They also have a heat map. And if you're not familiar with heat maps, a heat map just means it's in color. That's all that means. Heat map, I guess the original ones were plotting heat density on something. And so ever since then, it's just called a heat map. And we're going to take our data and get our corresponding numbers to put that into the heat map. And that's simply data.cr for that. That's a pandas expression. Remember, we're working in a pandas data frame. So that's one of the cool tools in pandas for our data. And let's just pull that information into a heat map and see what that looks like. And you'll see that we're now looking at all the different features. We have our ID, we have our texture, we have our area, our compactness, concave points. And if you look down the middle of this chart diagonal going from the upper left to bottom right, it's all white. That's because when you compare texture to texture, they're identical. So they're 100% or in this case perfect one in their correspondence. And you'll see that when you look at say area or right below it, it has almost a black on there when you compare it to texture. So these have almost no corresponding data. They don't really form a linear graph or something that you can look at and say how connected they are. They're very scattered data. This is really just a really nice graph to get a quick look at your data. doesn't so much change what you do, but it changes verifying. So, when you get an answer or something like that, or you start looking at some of these individual pieces, you might go, "Hey, that doesn't match according to showing our heat map." This should not correlate with each other. And if it is, you're going to have to start asking, well, why? What's going on? What else is coming in there? But it does show some really cool information on here. I mean, we can see from the ID, there's no real one feature that just says if you go across the top line that lights up. There's no one feature that says, hey, if the area is a certain size, then it's going to be B9 or malignant. It says there's some that sort of add up. And that's a big hint in the data that we're trying to ID this whether it's malignant or B9. That's a big hint to us as data scientists to go, okay, we can't solve this with any one feature. It's going to be something that includes all the features or many of the different features to come up with a solution for it. And while we're exploring the data, let's explore one more area and let's look at data.isnull. We want to check for null values in our data. If you remember from earlier in this tutorial, we did it a little differently where we added stuff up and sum them up. You can actually with pandas do it really quickly. Data.isnull and summit. And it's going to go across all the columns. So when I run this, you're going to see all the columns come up with no null data. So we've just just to rehash these last few steps. We've done a lot of exploration. We have looked at the first two columns and seen how they plot with the seabour with a joint plot which shows both the histogram and the data plotted on the XY coordinates. And obviously you can do that more in detail with different columns and see how they plot together. And then we took and did the seabourn heat map the SNS heat mapap of the data. And you can see right here where it did a nice job showing us some bright spots where stuff correlates with each other and forms a very nice combination or points of scattering points. And you can also see areas that don't. And then finally we went ahead and checked the data. Is the data null value? Do we have any missing data in there? Very important step because it'll crash later on. If you forget to do this step, it will remind you when you get that nice error code that says null values. Okay, so not a big deal if you miss it, but it it's no fun having to go back when you're when you're in a huge process and you've missed this step and now you're 10 steps later and you got to go remember where you were pulling the data in. So, we need to go ahead and pull out our x and our y. So, we just put that down here. And we'll set the x equal to. And there's a lot of different options here. Certainly, we could do x equals all the columns except for the first two because if you remember, the first two is the ID and the diagnosis. So, that certainly would be an option. But what we're going to do is we're actually going to focus on the worst. the worst radius, the worst texture, parameter, area, smoothness, compactness, and so on. One of the reasons to start dividing your data up when you're looking at this information is sometimes the data will be the same data coming in. So, if I have two measurements coming into my model, it might overweigh them. It might overpower the other measurements because it's measur it's basically taking that information in twice. That's a little bit past the scope of this tutorial. I want you to take away from this though is that we are dividing the data up into pieces and our team in the back went ahead and said hey let's just look at the worst. So I'm going to create a an array and you'll see this array radius worst texture worst perimeter worst. We've just taken the worst of the worst and I'm just going to put that in my X. So this X is still a pandas data frame but it's just those columns. And our Y, if you remember correctly, is going to be Oops. Hold on one second. It's not X. is data. There we go. So, x equals data and then it's a list of the different columns, the worst of the worst. And if we're going to take that, then we have to have our answer for our y for the stuff we know. And if you remember correctly, we're just going to be looking at the diagnosis. That's all we care about is what is it diagnosed? Is it B9 or malignant? And since it's a single column, we can just do diagnosis. Oh, I forgot to put the brackets. There we go. Okay. So, it's just diagnosis on there. And we can also real quickly do like an X do. If you want to see what that looks like and Y head and run this and you'll see um it only does the last one. I forgot about that. If you don't do print, you can see that the the Y.D is just mm because the first ones are all malignant. And if I run this, the X do head is just the first five values of radius worst, texture worst, parameter worst, area worst, and so on. I'll go ahead and take that out. So, moving down to the next step, we've built our two data sets, our answer and then the features we want to look at. In data science, it's very important to test your model. So we do that by splitting the data and from sklearn model selection we're going to import train test split. So we're going to split it into two groups. There are so many ways to do this. I noticed in one of the more modern ways they actually split it into three groups and then you model each group and test it against the other groups. So you have all kinds and there's reasons for that which is past the scope of this and for this particular example isn't necessary for this. We're just going to split it into two groups. one to train our data and one to test our data. And the sklearn uh.mmodel selection we have train tests split. You could write your own quick code to do this where you just randomly divide the data up into two groups but they do it for us nicely and we actually can almost we can actually do it in one statement with this where we're going to generate four variables. Capital X train capital X test. So we have our training data we're going to use to fit the model and then we need something to test it and then we have our y train. So we're going to train the answer and then we have our test. So this is the stuff we want to see how good it did on our model. And we'll go ahead and take our train test split that we just imported. And we're going to do X and our Y, our two different data that's going in for our split. And then the guys in the back came up and wanted us to go ahead and use a test size equals.3. That's test size. Random state. It's always nice to kind of switch a random state around, but not that important. What this means is that the test size is we're going to take 30% of the data and we're going to put that into our test variables, our Y test and our X test. And we're going to do 70% into the X train and the Y train. So, we're going to use 70% of the data to train our model and 30% to test it. Let's go ahead and run that and load those up. So now we have all our stuff split up and all our data ready to go. And now we get to the actual logistics part. We're actually going to do our create our model. So let's go ahead and bring that in from sklearn. We're going to bring in our linear model and we're going to import logistic regression. That's the actual model we're using. And let's we'll call it log model. Oops, there we go. Model. And let's just set this equal to our logistic regression that we just imported. So now we have a variable log model set to that class for us to use. And with most the uh models in the sklearn, we just need to go ahead and fix it. Fit do a fit on there. And we use our X train that we separated out with our Y train. And let's go ahead and run this. So once we've run this, we'll have a model that fits this data, that 70% of our training data. Uh, and of course it prints this out that tells us all the different variables that you can set on there. There's a lot of different choices you can make, but for Word do, we're just going to let all the default set. We don't really need to mess with those on this particular example. And there's nothing in here that really stands out as super important until you start fine-tuning it. But for what we're doing, the basics will work just fine. And then let's we need to go ahead and test out our model. Is it working? So let's create a variable y predict. And this is going to be equal to our log model. And we want to do a predict. Again, very standard format for the sklearn library is taking your model and doing a predict on it. And we're going to test y predict against the y test. So we want to know what the model thinks it's going to be. That's what our y predict is. And with that, we want the capital xx test. So we have our train set and our test set. And now we're going to do our y predict. And let's go ahead and run that. And if we uh print y predict, let me go ahead and run that. You'll see it comes up and it predicts a prints a nice array of uh B and M for B9 and malignant for all the different test data we put in there. So, it does pretty good. We're not sure exactly how good it does, but we can see that it actually works and is functional. Was very easy to create. You'll always discover with our data science that as you explore this, you spend a significant amount of time prepping your data and making sure your data coming in is good. Uh there's a saying, good data in, good answers out. Bad data in, bad answers out. That's only half the thing. That's only half of it. Selecting your models becomes the next part as far as how good your models are. And then of course fine-tuning it depending on what model you're using. So we come in here, we want to know how good this came out. So we have our Y predict here, log model.predict X test. So for deciding how good our model is, we're going to go from the sklearn.metrics, we're going to import classification report. And that just reports how good our model is doing. And then we're going to feed it the model data. And let's just print this out. and we'll take our uh classification report and we're going to put into there our test our actual data. So this is what we actually know is true and our prediction what our model predicted for that data on the test side. And let's run that and see what that does. So we pull that up. You'll see that we have um a precision for B9 and malignant B and M. And we have a precision of 93 and 91, a total of 92. So it's kind of the average between these two of 92. There's all kinds of different information on here. Your F1 score, your recall, your support coming through on this. And for this, I'll go ahead and just flip back to our slides that they put together for describing it. And so here we're going to look at the precision using the classification report. And you see this is the same print out I had up above. Some of the numbers might be different because it does randomly pick out which data we're using. So this model is able to predict the type of tumor with 91% accuracy. So we look back here that's you will see where we have uh B9 and migant. It actually has 92 coming up here. We're looking about a 92 91% precision. And remember I reminded you about domain. So, we're talking about the domain of a medical domain with a very catastrophic outcome, you know, at 91 or 92% precision, you're still going to go in there and have somebody do a biopsy on it. Very different than if you're investing money and there's a 92% chance you're going to earn 10% and 8% chance you're going to lose 8%, you're probably going to bet the money because at that odds, it's pretty good that you'll make some money. And in the long run, you do that enough, you definitely will make money. And also with this domain, I've actually seen them use this to identify different forms of cancer. That's one of the things that they're starting to use these models for because then it helps a doctor know what to investigate. So that wraps up this section. We're finally we're going to go in there and let's discuss the answers to the quiz asked in machine learning tutorial part one. Can you tell what's happening in the following cases? Grouping documents into different categories based on the topic and content of each document. This is an example of clustering where K means clustering can be used to group the documents by topics using bag of words approach. So if you gotten in there that you're looking for clustering and hopefully you had at least one or two examples like K means that are used for clustering different things then give yourself a two thumbs up. B identifying handwritten digits in images correctly. This is an example of classification. The traditional approach to solving this would be to extract digit dependent features like curvature of different digits etc. and then use a classifier like SVM to distinguish between images. Again, if you got the fact that it's a classification example, give yourself a thumb up. And if you're able to go, hey, let's use SVM or another model for this, give yourself those two thumbs up on it. C. Behavior of a website indicating that the site is not working as designed. This is an example of anomaly detection. In this case, the algorithm learns what is normal and what is not normal, usually by observing the logs of the website. Give yourself a thumbs up if you got that one. And just for a bonus, can you think of another example of anomaly detection? One of the ones I use it for in my own business is detecting anomalies in stock markets. Stock markets are very fickled and they behave very erratic. So finding those erratic areas and then finding ways to track down why they're erratic. Was something released in social media? Was something released you can see where knowing where that anomaly is can help you to figure out what the answer is to it in another area. D predicting salary of an individual based on his or her years of experience. This is an example of regression. This problem can be mathematically defined as a function between independent years of experience and dependent variables salary of an individual. And if you guess that this was a regression model, give yourself a thumbs up. And if you were able to remember that it was between independent and dependent variables and that terms, give yourself two thumbs up. Summary. So to wrap it up, we went over what is K means and we went through also the chart of choosing your elbow method and assigning a random centrid to the clusters, computing the distance and then going in there and figuring out what the minimum centroidids is and computing the distance and going through that loop until it gets the perfect centrid. And we looked into the elbow method to choose K based on running our clusters across a number of variables and finding the best location for that. We did a nice example of clustering cars with K means even though we only looked at the first two columns to make it simple and easy to graph. You can easily extrapolate that and look at all the different columns and see how they all fit together. And we looked at what is logistic regression. We discussed the sigmoid function. What is logistic regression? And then we went into an example of classifying tumors with logistics. I hope you enjoyed part two of machine learning. So in today's session we will discuss what RNN model is. Moving ahead we will see why should we use RNN. After that we will see how does RNN work recurrent neural network. After covering these topics we will move forward and see types of RNN recurrent neural network and applications of RNN. At the end we will do a hands-off lab demo of sentiment analysis using RNN. So before starting let us have a simple question to brush our knowledge. So question is what are the application of RNN? Okay. NLP, time series, image captioning and all of the above. Please answer in the comment section below and we will update the correct answer in the pin comments or you can pause this video, give it a thought and answer in the comment section. Before we move on to the programming part, let's discuss what RNN is and proceed further for the same. So what is RNN? Recurrent neural network. So RNN work on the principle of saving output on a particular layer and feeding this back to the input in order to predict the output of the layer. This is how can convert a feed neural network into a recurrent neural network RN. The node in different layers of neural network are compressed to form a single layer of recurrent neural network. A B and C are the parameters of neural network. Now that you understand what RNN is, let's look at the way why RNN. Okay. So why RNN? RNN were created because there are few issues in the feed forward neural network cannot handle the sequential data considers only the current input cannot memorize previous input. Okay. So the solution of these issues is RNN and RNN can handle sequential data accepting the current input data and previously received input data. So RNN can memorize previous input due to their internal memory. So moving forward let's see how does RNN networks work. Okay. So the input layer X takes an input to the neural network and process it and the passes it into the middle layer. The middle layer edge can consist of multiple hidden layers each with its own activation function and weight and biases. If you have a neural network where the various parameters of different hidden layers are not affected by the previous layer that is the neural network does not have the memory then you can use RNN. So the RNN will standardize the different activation function and weights and biases so that each hidden layer has the same parameter. Then instead of creating multiple hidden layers, it will create one end loop over it as many time it has required. So moving forward let's see types of RNN. So there are four types of RNN one to one, one to many, many to many and many to one. So let's see one to one RNN. So this type of neural network is known as the vanilla neural network. It is used for general machine learning problem which has a single input and a single output. Now see one to many RNN. This type of neural network has a single input and multiple outputs. An example of this is a image captioning. Now let's see many to one RNN. This RNN take a sequence of input and generates a single output. Sentiment analysis is a good example of this kind of neural network where a given sentence can be classified as expressing positive or negative sentiment. And the last one is many to many RNN. This RNN takes a sequence of inputs and generates a sequence of output. Machine translation is the one of the example. So moving forward, let's see application of recurrent neural network. First one is image captioning. RNNs are used to caption an image by analyzing the activities present. The second one is time series prediction. Any time series problem like predicting the prices of stocks in a particular month can be solved using RNN. And the third one is natural language processing. Text mining and sentiment analysis can be carried out using RNN or NLP. Natural language processing. The fourth one is machine translation. Given an input in one language, RNNs can be used to translate the input into different language as output. So now let's move to the programming part. First we will import some libraries major libraries for the first we will import for the data frame. So I will write import pd. The second one is import numpy as np. So pandas is a software library written for the python programming language for data manipulation and analysis. In particular, it offers a data structure and operations for manipulating numerical tables and the time series. And this numpy numpy is a library for the Python programming language adding support to four large multi-dimensional array and matrices along with a large collection of highle mathematical function to operate on these arrays. Okay. So for plotting we will import some libraries like seabon as SNS. This is nothing just a short form of we don't have to write again and again CON c we can write SNS. So then another one is from wordcloud portloud mattplot lib dot pip plot as plt. Okay, so Seabone is a library that uses Matt plot li underneath to plot graphs. It will be used to visualize zandom distribution and the word cloud is a visual representations of words. Cloud creators are used to highlight popular words and phrases based on frequency and relevance. They provide you with quick and simple visual insights that can lead to more in-depth analysis. And this mattplot li mattplot li is a plotting library for the python programming language and its numerical mathematic ext extension numpy. It provides an object- oriented API for embedding plots into application using general purpose UI. Okay. Like tinker wxython QT or gtk. So let's import some NLTK natural language toolkit. So I will write import NLTK. Okay. from NLTK dot stem importizer then from analytic dot corpus imports and from NL ticket dot tokenize port tokenize NLTK the natural language toolkit or more commonly NLTK is a suit of libraries and programs for symbolic and statical natural language processing for English written in Python programming language and this is stop words. Stop words are words that are so common they are basically ignored by typical tokenizers and this word tokenize is a function in Python that splits a given sentence into words using the analytical library. Okay. So let's import some scikitlearn library. So for that I will write from skarn dot model collection import train test. Okay. Then from skarn dot feature extraction dot text import vectorizer. And then from skarn dot matrices matrix import confusion metric classification. Okay. So, scikitlarn is a free source software machine learning library for Python programming language. It features various classification, regression and clusting algorithms including support vector machine learning, logistic regression and many others like random forest classifier. And this train test split method is used to split our data into train and test set. First, we need to divide our data into features like X and Y labels. And this TF ID vectorizer converts a collection of raw documents into a matrix of TF IF features the fast text or what to vectorizer what embedding Python implementation and this confusion matrix. A confusion matrix is a table that is used to define the performance of a classification algorithm. Okay. Then we'll import some libraries like prom skarn do linear model port logistic regression. So then from learnm port then from [Music] import random forests classifier. Okay. Then from skarn dot name base portoli base. Okay. So everything is correct. You will see while running. So logistic regression estimate the probability of an event occurring such as voted or didn't vote based on a given data set of the independent variable. SVC logistic regression estimate sorry linear support vector machine SVC is an algorithm that attempts to find a hyper plane to maximize the distance between classified samples and this random forest classifier creates a set of decision trees from a randomly selected subset of the training set and this Bernoli NBO base is a part of the name base family it is based on Bernoli distribution ution and accept only binary values that is zero or one. So let's import some tensorflow. So import tensorflow dot compad dot v2 and then import tensorflow data sets as tfds. So, TensorFlow is a free and open-source library for machine learning and artificial intelligence across a range of task but has a particular focus on training and inference of deep neural networks. Okay, let's import warnings. Nothing. Warning. The warnings import string import pickle. So everything is basic. Just let's see the pickle. Typically is a Python is primarily used in serializing and deserializing a Python object structure. Okay, let's run it. Let's see how many error come. After that we will load the data set and uh we will go through data visualization. Okay. Word cloud cannot import name word cloud. Okay. C C will be capital here. random forest. Okay, it's still loading here. Let's see. Okay, so loading is done. So now let's load the data set. So we'll write data equals to PD dot read CSV name test. So you can find this data set on the description box below. According to question polarity ID, comma date, comma query Let's see you pocket commareware. polarity. Okay. Seems fine. Let me change this first is using RNN. Okay. So here I will write data plus data dot sample. Let's do one. Okay. So, let me like brief uh tell you that what we are going. Okay. Let me brief you like what we will do in this sentiment analysis using RNA. So in this demo like you will see uh text processing on Twitter data set and after that we will perform different machine learning algorithms on the data such as logistic regression random forest classifier SVC nas to classify positive and negative dudes. After that I will also build RNN recurrent neural network which is the best fit for such textual sentiment analysis. Okay. Since it's a sequential data set which is requirement for the RNN network. So let's dive into. So now we will see the data data visualization data set details target like the polarity of the tweets zero negative. Okay. then the date like date of the tweet and the polarity and the user that what tweeted then the text okay so I will write print data set data comma shape okay let me first do like Yes. Yeah. So there are 20 or you can say two like rows and six number of columns. Okay. So it is a huge data. I will you can find this data set from the description box below. So here let's see the data and why I use head.ad Head is used for like for showing top 10 rows of the data set. If you will use tail instead of head, it will show the last 10 rows of the data set. Okay. Here polarity zero. Zero means negative and four means positive. Okay. Like you can consider as 01. This is ID, date, then query, then user, then the text. Okay. So I will do data clarity. Okay. These are the 04. Okay. Uniqueness. Zero means negative and the four means positive. Replacing the value four as one for the ease of understanding what I said to you you can consider as 01. So data polarity data polarity to one and then data. So now you can see 0 1 0 1 0 1 1 0. Okay. So if you will write only head it will show the top five rows only. Okay. So now let's use one Python function describe data dotribe. So as you can see here count is two lakh and the mean of the particular row is this and the ID is this standard deviation minimum value the 25% the 50% and the 75% and the maximum okay let's see the number of positive versus negative tag sentence okay so here I I will write positives to data polarity data dot polarity = 1. Then it is data polarity data dot polarity is equals to zero. Print total length of the data is dot format data dot shape. Yep. Now I will print the total length, the negative and the positive. Okay. So number of positive Okay. Format positives. So I will copy this and paste it here. And here I will do the changes for the negatives. Okay. Now let's see. So here polarity is not defined. So as you can see the total length of the data is two lakh and the number of positive sentences is like one lakh 46 and number of negatives okay spelling this the number of negative text sentences is 99,954. Okay. So now we have a brief data. So now let's get a word count p of text. So for this I will write count words done length of start split. Then now let's plot a word count distribution for both positive and negative. So I will create a bar plot. So for that I will write it word count. So data text dot apply count. Okay. Then I will write P positive= to data. Then count data dot polarity is equals to 1. And so let me copy this only. Here I will write zero and okay then plt dot figure and figure size to 12 Thanks. Okay. Then plt LT dota 45 then plt dot x label word count plt dot y label and frequency we'll write uh g dot comma n Hello. Alpha = to 0.5. positive. Okay. Then let's make a legend also. Location should be right. False. Data word count equals to Okay, my bad. So as you can see the positive and the negatives. Okay. So these are the like word count distribution for both positive and negative. Okay. Now let's uh what we can do we can do the get like get the common words in training data set for the training data set. So for that I will do from collections import counter or words to or Test data text to line dot split for word and words. If length of word than two all dot one dot lower here I can type counter all words dot most common then I need 20. So as you can see these are the most common word used like in every sentence the and you for have that I am but just like this out over all. So these are the most common words like it used the is used like 64,000 times and like this URL is used for 8,000 times something like that. So now we will do some data pro data processing. Okay. Now let's do the data processing. So div and SNS dot current plot data polarity. Okay, these are the uh negatives and this positives. There is a slight change I guess that is why it's not looking so much of different like there's a slight 46 different so that is why it's looking almost same. Okay. So now removing the unnecessary columns like query, user, word count, data dot drop, date query and word count. X = 1 comma place to true. Okay. Uh A will be true. So here I will write data what is this? No. Okay my bad. So here I will write data dot drop id comma one then data dot head the data see we have only the two the polarity and the text. Okay. So now uh let's see the null values. So data dot um data print. Okay. So there is no null values. So now converting pandas's object to a string type. For that we have to write text to data text. Yeah. that as type. Yeah. So now download the stop words NLDK dot download words you said stop words it's in Finish. Stop words. This These are some you know stop words. So moving forward let's download NLT dot download. K dot download.net. So the prep-processing steps taken are like lower casting each text is converted to lower case. Then remover of URLs will do this. We will do okay. Links starting with HTTP or HTTPS or WW are replaced by like commas and removing usernames, removing short words, removing stop words like limitization is the process we'll do of for the converting a word to its base. Okay. So for that what I will do, we'll just copy the whole code for you. will explain you one by one what I've done. Okay. So this is a course for the URL pattern for removing all the WW https and http type of thing and removing them and I have used pattern for the lower casting removing all the URLs. Okay. Then removing all the usernames like at the red and removing punctuations and stop words. Okay, like this. So now what we have to do data processed weights then data Next dot apply lambda x process then tweets. Okay, then print next. processing. It is taking time will be completed. It will return here the text prep-processing is done. Okay, as you can see the text prep-processing is done. So now let's check data dot add. As you can see see the at the rate and this slices are gone. Okay. So now the text is pre-processed. So now what we will do we will analyze the data. So now we are going to analyze the pre-processed data to get an understanding of it. We will plot word clouds for positive and negative dudes from our data set and see which words occurs the most. Okay. First we will uh create for the negative words or negative tweets you can say. So I will write pl do dot figure then figure size 15. Okay. Then word cloud also word cloud max words 2,00 comma width = to 1,600 comma height = to 800 rate dot join the data dot polarity and I will write here polarity okay equals equals to zero again then processed tweets. Okay. Then here I have to write plt dot show me showcolation linear. Perhaps you forget the comma here. So 2000 then comma width dot generate here. what I can do. Let me run now. Let's see. Hope this time it will work. Guess is still loading. As you can see this is okay like today I am and work don't wish they need much. These are the most negative tweets. Okay, words from negative tweets you can say, right? So let's see the positive tweets. Okay, so the thing will be same. Let me copy paste it here. So for this I will do one it will take little bit of time to come loading like as you can see hit can't okay sorry these are the negative words okay still loading so let's wait for like few seconds. Now you can see the positive words like love, okay, good, lol, and awesome something like that. Okay, so these are some positive verbs. So now let's do the vectorzation and splitting the data like storing into input variable process to X and output variable polarity to Y. Okay, we'll do that. So x equals to data processed with values and pi= to data entity dot values. Okay. Now I will write here print dot shape print y dot.shape. Okay cool. So now what we will do we will convert text to word frequency vectors. Okay. TF to IDF. So this is an acronym that stand for term frequency to inverse document frequency which are the components of the resulting scores assigned to each word. Okay. So term frequency this summarize how often a given word appears within a document and inverse document frequency this downscales word that appear a lot across documents. Okay. So now here we will convert a collection of raw documents to a matrix of TF to IDF features. Okay. Then I will write entertrisis [Music] and sublinear. X = to vector dot fit transform printed. Okay. Print number of feature comma length vector do get their names. So number of feature words are like 1703 to1. Okay. Now we will do like now let's print the shape. Yeah. So now we will do the split uh spread to train and test. So the pre-provisioned data is divided into two sets of data. Training data and the testing data. So data set upon which the model would be trained on contains 80% data. The test data is the data set upon which model would be tested again contains 20% of data. So for that I will write text train test come up train Test test size = to 0.20 2 random state to 101. Okay. Random state. So what I will do? I will do the you know print the shape of X train, Y train, X test, Y test like how many columns are there? Rows not column exactly the rows are there. Okay. So we'll paste there. So see extra like this is a total was like two lakh. Okay. So 1 lakh 60,000 in training as we discussed earlier like 80% in training and 20% in testing. Okay. So now let's do the model building. Okay. Model evaluating functions. So now let's make a model. Okay. And first I will do I will write and then I will explain you the whole. Okay. So here what I did uh this will tell you the accuracy of the model of training data and the testing data. Okay. Then we will predict the values for test data set and the evaluation for the data set. Then we will compute and plot the confusion matrix. Okay, the both the categories negative positives. Okay, group name will be true negative and the false positive. Okay, so there's nothing that's let's run it. So now what we will do? We will do first for the logistic regression. So here I will write LG equals to logistic regression. Okay. Then history equals to LG do fit X train, Y train with model evaluate. Let's see. This is for the logistic regression. Okay, as you can see the accuracy of the training data is 83%, the testing data is 77%. Okay, so this is the confidence matrix the predictive value like these are the categories. Now let's see for the linear SPM. For that I will write SPM equals to SVC then SVM dot fit train then model evaluate of SVM. Okay. And after that we will do for random forest and the N base. Okay. Then we will start with the RNN. So as you can see the accuracy of training data is very pretty good 93% and logation is 83% and the testing is less than regression model. Let's see for the random forest. So I will write here RF equals to random forest fire = to 20. criterion equals to appropals to 50. Then RF dot fit X train, Y train and model evaluate. Okay, loading. Let's see the accuracy how it will come. After this we will do for the name base and after that we will move on to the our main model RNN recurrent neural networks. It's still loading. Yes, it will take little bit of time. So as you can see the confusion matrix. Okay. So training data accuracy is 75% very less. So now let's see the last model name base. Okay. So, NB equals to NB NB dot fit SP, wide train. Okay. Then model evaluate. So NA base training 867. So as for linear SEC has the best test training uh accuracy you can say and the best testing accuracy is 76 70 76.45 77 see logistic regression. So now let's move to the our main model RNN. So what is RNN? Recurrent neural network are the start are the state-of-the-art algorithm for sequential data and are used by Apple CD and Google search voice. It is the first algorithm that remembers its input due to an internal memory which make it perfectly suited for machine learning problem that involve sequential data. And there is one more thing embedding layer. Embedding layer is one of the available layers in KAS. This is mainly used in natural language processing related applications such as language modeling but it can also be used with other tasks that involve neural networks. While dealing with NLP problems, we can use pre-trained word embedding such as glow. Alternately, we can also train our own embeddings using kas emitting layer LTM layer long short-term memory networks usually called LSTMs. I have made already many videos you can check it out were introduced by Skyder. These have widely been used for speech recognition, language processing, sentiment analysis and text prediction. Before going deep into LSTM, we should first understand the need of LSTM which can be explained by the drawback of practical use of RNN. So let's start with RNN. Okay. So here I will importing some libraries. Okay. So after that I will write import kas version 2.110. Okay fine. So now let's print X test. Come on. White train. by test train test weights comma data dot polarity dot values then test size equals to 0.2 2 test size 0.2 means like 80 and 20% thing 802 training and then 20% to testing. Okay. Then let's the model evaluation. Okay. So I will these are relu sigmoid all the you know the layers. So now this epoch it will run till 5,000 like count will go till 5,000. Okay see the 5,000 and it will go to 1 to 10. So it will take time. So I will get back to you after this completing this. Okay. Now as you can see uh the box ran successfully. Okay. So what should I do? But I will leave some space here. So now we will see the positive and negative outcome. Okay. This is something like testing. Okay. We will test. We will predict. We will give one uh a sentence and then we will predict it is coming right or wrong. The accuracy is giving right or wrong. Okay. So here I will write sequence equals to tokenizer dot text to sequences okay then I'll write this data science article this worst. Okay. So here I will write test equals to pet sequences and here I will write sequence comma max length to max length. Then I will write here prediction equals to model. We write model model then we'll write model 12 dot predict then test. Okay. If prediction is greater than 0.5 means 50%. then it should print positive. Okay. Else negative. Okay. Let me run this. Okay. Sequential object has no. Okay. stick the negative because here is the word worst it is showing correct. Now check from the RNN model. So model equals to kas dot models dot load models. Here we will load RNN model. RNN_model dot SG file. It is pre-trained model. Okay. pretend RNN model. So sequence tokenizer dot text to sequences. Then I will write here this this ML course is best. Okay. S equals to P sequences sequence X okay then prediction equals to model dot predict Then test if prediction is greater than 0.5 in positive 0.5 means 50% more than 50%. Else print negative attribute load models positive because this ML course is best. So there is no negative word. Okay. So what we will do now we will do model saving loading and prediction. Okay. So for that uh I will write import pickle file = to open vectorzer then here I will Pickle dot dump dump vector file vector. Okay. So like this I have to write for name base logistic regression SVM and random forest. So what I will do right here. Okay. Let's run this. Okay. Now what we have to do? We have to predict using saved model. Okay. What we will do here? We will load model first then we will predict. Okay. So first I will write the function name load models and we will load the vectorzer. So file equals to open vectorizer dot pickle RBizer file. file dot close. Now I'm loading the logistic regression model. So for that we have to write open B LG = to PL code file then file dot close and riser LG. Okay. Yeah. So now we will predict the sentiment. So for that I will write here predict riser. text. Okay. Uh so here we will predict the sentiment. So for that text equals to process Then demands for sentiment in text. Then text dot transform. This is okay. Then sentiment model dot predict. So here I will make a list of text with sentiment. So for that I will write data equals to empty array. Then for text prediction and zip text x, sentiment dot append text prediction. Okay. Then we will convert the list into pandas data frames. So for that I will write df = to ed dot data frame comma columns person next comma sentiment then df equals to df dot Replace comma 1, positive and de. So at last I will write here if equals to then here we will loading the model factorizer LG plus load Here we text to classify like what should be in the list. So like text I will like I love machine name. So John be so So here df equals to date comma text then print df. I love machine learning. Positive. B is so active. Positive J. I feel so good. Negative. Okay. There is and yeah. See now it's coming. Okay. This is how you can do the sentiment analysis using uh RNN model. Here we have loaded RNN model. So it is showing right. So let's do them as poses right here. at point negative. Okay, RNN model is working. So right today we are going to explore K nearest neighbors or KN&N which is one of the most popular algorithms in data science. Python is a powerful tool for data science and KN&N is great for classifying data by predicting the category of sample base on its closest neighbor. This algorithms is used in many field like healthcare, finance and agriculture helping us make decision based on data. The best part it is really easy to use. You just need to pick a number for K and choose a distance function to compare data points. However, KN&N has it downsides. It doesn't work well with the large data set and it require proper scaling of the data to get accurate result. In this video, we will show you how KN&N work with real data set, the Iris data set. We'll walk you through simple Python code and demonstrate how to find the best K value to maximize your model's accuracy. So stay tuned to see KNN in action. So welcome to the demo part. So I'm here using Google Collab. So you can use any of your favorite ID like Jupyter notebook, Intelligi, Visual Code Studio, anything. Okay. So let me rename this file as KNN classification. Okay, cool. So let me tell you that KN&N can be used for the classification, regression, predictive problems. So KN falls in the supervised learning family of algorithms. Okay. So we will measure the distance between the K neighbors and the first step will be we will choose the number of K of neighbors. Then we'll take the K nearest neighbors of the new data point according to your distance metric. And the step three will be our among these case neighbor count the number of data points of each category. Okay. Step four, we will assign the new data points to the category where you counted the most. Okay, so let's start. First, let's import some library. import numpy sp and let's import pandas sp. So everyone knows what is numpy and the pandas. Okay. So, Numpai is a library for the Python programming language adding support for uh you know large multi-dimensional arrays. Okay. Along with the large collection of uh what to say uh highlevel mathematical functions and whereas pandas is a software library written for the Python programming for the data manipulation and analysis uh all the data frames and uh data structures it offers for the manipulating numerical tables. Okay. And the time series you can see. So moving forward uh we'll import uh our data set. So uh you can download the data set from the description box below. Okay. Data set equals to pd dot read csv. The data name is iris dot csv. Okay. This is how you read uh your data in Python. Okay. Yeah. Data set is loaded. So data set dot shape shape. Okay. Yeah. So data uh set dot shape is used for how many numbers of rows and columns present in your data set. Okay. 150 rows and six columns. So let me tell you brief about data set. This ISIS data set. So iris data set include three iris species with uh 50 samples each as well as some properties about each flower. So one flower species is linearly separable from the other two you can say but the other two are not linearly separable from each other. Okay. And this shape I told you we can get a quick idea of how many instances of rows and columns are present in our data set. So let's see our data set data set dot head. So head is used for uh you know by you can see top five rows of your data set using head and if you will use tail you instead of head you can see the last five rows of your data set. Cool. Okay. So columns are ID sample length sample width petal length petal width and the species. Cool. Then moving forward let's describe our data sets. So these are the basics uh basic function okay of Python you can say. So data set dot describe what describes do is it will give you count of all the rows mean value standard deviation value minimum value what is the 25% okay of all the values in the particular row what is the 50% what is the 75% what is the maximum maximum is 150 you can say okay and 25% of 150 is 38.25 into five this okay of all the columns if it is normal uh you know character so it won't give you any data okay cool yeah so moving forward uh let's now take a look at the number of instances row belong to each classes okay so we will write data set dot group by species dot size. Okay. Uh spec S is capital that's why it's showing the error. Yeah. So you can see Iris Satossa are 50, Iris vertical are 50 and virginica is 50. Okay. And the data type type is integer. Cool. So as you can see data set contains six columns like ID, sample length, sample width and petal length, petal width and spacing. The actual features are described by columns 1 to four. the last columns labels or samples. Okay. So firstly we need to split data into two arrays like X features and Y labels. So how we will do this? By writing code like feature columns equals to sample length sample width petal length petal width. Just remember you are writing correct name. Okay. Then x = to data set feature columns. Okay. Dot values. Then y equals to data set species dot values. Cool. Then let me run it. So this is uh how we can split the data set okay into two arrays X and the Y. Okay. In X there are feature columns. These four columns are there and in Y species column is there. Okay. And what is the species column? This Satossa virginica and all this. Cool. Then now we will do label encoding. So as you can see labels are categorical. Kimmer classifier does not accept the string labels. So we need to use label encoder to transform them into numbers. Okay. Then iris satossa correspond to zero. Iris vericy color correspond to one and Iris virginica correspond to two. Okay. 012. Cool. So how we can write from skarn dot pre-processing import label encoder okay so what we'll do label encoder transform them into numbers okay so I will write l equals to label encoder okay my bad label encoder. Okay. Then y = to ele alate transform y. Okay. So now I will run it. Yeah. Correct. So splitting data set into training set and the test set now. Okay. So now we'll split data set into training set and test set to check later on whether or not a classifier work correctly or not. Okay. So here I will write from skarn dot not cross. I will write it here. Skarn domodel selection import train test split. So what is train test split? Train test split is a model validation procedure that reveals how your model performs on your new data. Okay. And what is X train XS Y train Y test in Python. Okay. Let me first write it and then I will let you know. Okay. Then I will write here x train comma x test comma y train comma y test. Okay train test split then x comma y comma test size 0.2 and the random is this. Okay. Okay. Some error came. Okay. Underscore model selection. Yeah. Cool. So what is X train X test? Y train Y test. Okay. So X train and Y train sets are used for training and fitting the model. Okay. So the X test and the Y test are the set used for testing the model and it's predicting the right outputs level. Okay. So here you can see test size is 0.2. into means 80% is for testing or sorry 80% is for training and 20% is for testing for the new data. Cool. Yeah. So now we will see some uh let's do some data visualization. Okay. So here I will write import mattplot lib dotpipplot as plt then import cb as sns then here I will write person mattplot lib in line. So what is mattplot lib? Mattplot lib is a plotting library for the python programming language and it's numerical mathematic extension. Okay. So numpy it provides an object API for embedding plots into application using generating purpose GUI toolkits like kintter and python QT or gtk okay whereas seabon seabon is a library for making statical graph in python. It builds on the top of mattplot liib and integrates closely with pandas data structure. Okay, seabond helps us to explore and understand the data. Okay, so I will run it here. I will write from pandas dot plotting import parallel coordinates then plt dot figure size should be 15a 10. Okay. Then parallel coordinates. Okay. Then data set dot drop. I don't need id, x is one. Okay. Then comma spaces. then plt dot title and let parallel coordinates plot. Okay. And you can give some font size equals to 20. Then font weight equals to okay let's add bold only bold. then plt dot x label then features comma font size to 15 then plt dot y label then I'll write here features values comma font size equals to 15 Okay. then plt dot legend then locals to 1 comma I will write have frame on equals to true comma shadow equals to true comma face color equals to white should be capital and comma edge color equals to okay plt dot show okay some error is there plt dot figure size okay spelling mistake Okay, one more error. PLT. Legend. Okay. Face color. Okay. Some spelling stick. So yeah, let me make it output in full screen. Yeah. So parallel coordinates is a plotting technique for plotting you know multivariate data. So it allows one to see clusters in the data and to estimate other stat visually. So using parall coordinate points uh you know are represented as the connected line segments as you can see. Okay. And each vertical line represent one attribute and one set of connected line segments represent one data point. Okay. And points that tend to cluster will appear closer together. Okay. So this uh this color is iris satossa and this is irisy color and this ba one is iris virginica as you can see in the legend. Okay, cool. So moving forward let's create another graph and curves. Okay. So here I will write from pandas dotplotting import scatter. No this plotting import and curves. Okay. Then plt dot figure. It's AI based. So it's giving me suggestions. Suggestions. Suggestions. Okay. So sometimes suggestions are good but not always. Yeah. Let's carry on. Figure size is 15, 10. Then Andrew curves then I will add data set dot drop id access spaces and plt and curves plot. Okay fine. Okay let me add we don't need x label and all. Let me add legend. PLT dot legend. Then same LOC equals to 1. Then proposition size. Then I will write here size is 15 of frame on equals to two comma shadow equals to true. So this is uh truly based upon you if you want to add legend or not or you can skip. If you want to skip you can skip. Okay. Face color equals to white. Then h color equals to black. Okay. Then plt dot show. Yeah. So no error is there. So let me first view the output in full screen. Okay. So Andrew curves these are the endoc curves. Okay. You can see the graph in the curve. So Andrew curve allow one to plot multiariate data as a large number of curves. So that are created using the attributes of samples. Okay. As coefficient for 4year series. Okay. So by coloring these curves differently for each class it is possible to visualize data clustering. So curves belongs belonging to samples of the same class will usually be closer together and they form large structure. Okay, as you can see here and this is the legend why we are setting the four color is should be white and yeah and the edge color should be black. Okay and frame on shadow should be there. You can see the shadow okay like this. If you want to skip you can skip this part legend part legend part. Okay. But uh like it's good to have it's good practice to have this cool. So let's create one small pair plot. Okay. So I will write here plt dot figure. Then I will write sns dot pair plot. Then I will write data set dot drop. We'll drop again ID. we don't need comma x is is 1. Then I will write u = to species size = to 3 then markers equals to o sd. Yeah, cool. Then plt dot show it's running. Yeah. So first let me make it to full screen. Yeah. So here pair wise is useful when you want to visualize the distribution of the variable or the relationship between the multiple variable separately within subsets of your data set. Okay. So here this blue one is satsa verol and the virginica. Okay. So these are some uh graph you can say or here you can see sample length. Okay. So green one is virginica and versol are almost having same saple length. Okay. And here are the different types of graphs. Okay. If you don't want to uh let's say if you don't know how to read this graph, you can use this graph or you can use this graph. Either you can use this graph. Okay. That's the power of pair pair wise plots you can say. And the sample width cm. Okay. Satossa. No. Okay, virginica and verticola again having almost same sample width. Okay, and here as well. Okay, this stoa is in different form. Okay. Yeah. So, we'll uh create one small uh pair of u one small graph then we'll move forward. Okay. Uh I will write here plt dot. So now we are creating box plot figure. Okay. Then data set dot drop. Then again ID commais should be one dot box plot. Okay. Then figure size I chose this. Yeah. Let's run it. Yeah. Again see this is the box plot. Petal length same petal width sap length sle width okay according to width it is showing like from this to this width we have the veric color and from this to this width this length sorry sample length we have uh that virginica lies here Iris virginica and from here to here iristosa lies okay the sample cool and like you can use 3D models you can use different types of charts. Okay, I did four and four are enough to read the data set. Okay, so now we will do uh some KN classification. Okay, we will make prediction and we will see the accuracy of our data set. Okay, so now what I will do? I will write here from skarn dot neighbors import k neighbors. Okay. write from SQL dot neighbors import K neighbor classifier. Okay. Then I will write here from skarn dot matrix import confusion matrix comma accuracy accuracy score. Okay, then I will write from skarn dot model selection import cross value score. Okay, so here what we I did uh we are fitting the classifier to the training set and loading the libraries basically. Okay, then we'll initiate a learning model K equals to three. Okay. So here I will write classifier. Let me give one space. Classifier equals to neighbors. Classifier and neighbors three. Okay. Then fitting the model. I will write here classifier dot fit x train, y train. Okay. Then uh now we will predicting the test uh test set results. Okay. So y prediction equals to classifier dot predict x test. Cool. Okay. From Okay. Spelling mistake. Okay. Okay. Yeah. I guess it's fine now. So now uh let's evaluate the prediction. So here I will uh build the confusion matrix. Okay. So here I will write cm confusion matrix equals to confusion matrix uh y test y prediction. Okay. Then here I will write cm. Okay. So what is confusion matrix? Basically, so I confusion matrix uh you know is a table that shows how well a model performs by comparing its prediction to the actual values. Okay. So what it show is a confusion uh matric displays the number of correct and incorrect prediction of each class in models whose either it can give you true positive true negative false positive false negative either it can give you zero or one. Okay. So yeah moving forward let's calculate the model accuracy. This is the main part. If the model accuracy is low it means uh your analysis of you know classification is not good. Okay. So accuracy it should be more than 80 at least. accuracy equals to accuracy score y test comma y prediction into 100 otherwise it will give me in points so I will write print model canon model accuracy is oh I will write plus STR I will round it. Okay. Accuracy I will add the percent. So why I wrote this accuracy to round? So I don't want after points I need only two numbers. Okay. I don't want like 6 7 8 9 10 11 12 like this. Okay. I need only like 80.20 like this. Cool. Let's run this. see KN&N model accuracy is 96.67 67 that's why I wrote two here and the percent should be there so 96 which is very very very good okay so this is how you can find uh the accuracy so now let's find the optimal number of neighbors in K okay basically finding the best K so we will use uh using cross validation parameter okay tuning so first I will create the list of K for KN okay so here I will write K list equals to list range 1 comma 50A 2. Here I'm I will create the list of CV score. Okay. So here I will write CV scores. Okay. Okay. I have to give brackets. Yeah. So we'll here perform the 10fold cross validation. Okay, I will explain you what is cross validation. Don't worry. So first let me write for K N K listN equals to K neighbor classifier and here I will add N neighbor neighbors equals to K then scores to cross value code KN&N then X train Y train okay I will write here cross validation equals to 10 comma scoring equals to I will write here accuracy okay and then here CV course dot append and to course dome. Okay. So now yeah let me run it. Okay some error came. Okay the error is the scoring parameter. Yeah. Why? Because here accuracy you can see and I did the spelling mistake. Okay. So now what is cross validation? Of course, cross validation uh you know determine the accuracy of your machine learning model by partitioning the data into two different groups. Okay, called training set and testing set. You can see train test and the testing set. Okay, X and Y. So the data is randomly separated into a certain number of groups or subsets called folds. Okay, you can see the 10 folds we have wrote. Each fold contains about the same amount of okay and there is one more thing validation. So validation is a technique for assessing the accuracy of the model on data set. Okay. And this cross validation we did on new data set. Cool. So now let's find the best K. Okay. So here I will write best K equals to K list. Okay. Before that I will write one thing. I will write here MSE will change into mclassification error. Okay. So equals to one that's X for X in for X in CV scores. Okay. list I will write MSE dot index minimum MSE I will use this bracket square bracket okay so and I will write print the best optimal number of neighbors is percent D person best K. Okay, let's run it. See the best optimal number of neighbors. Okay, N neighbors is 9. Okay, so in the K nearest neighbor KN algorithms. Okay, so K represent the number of neighbors that are considered when classifying a query point. Okay. See these if we will classify this particular point you will get the six. Okay. 1 2 3 4 5 6. Okay. And uh let me show you. If you will classify this portion only so you will get 1 2 3 4 5 6 points like this. Okay. So the best optimal number of neighbor is nine. >> What is Python? Python is a high-level object-oriented programming language developed by Guido Van Roum in 1989 and was first released in 1991. Python is often called a batteries included language due to its comprehensive standard library. A fun fact about Python is that the name Python was actually taken from the popular BBC comedy show of that time, Montipython's Flying Circus. Now let's look at the top features of Python first. So Python has a simple structure and a clearly defined syntax. This allows the learners to pick up the language quickly. So it is easy to learn and use. Python can run on different operating systems such as Windows, Linux and Mac, making it a portable language. It enables programmers to develop the software for several competing platforms by writing a program only once. Third, Python is freely available at the official website. Since it is open source, this means that source code is also available to the public. Now, Python uses an object-oriented approach that encapsulates code within objects. Python provides a collection of libraries for various tasks such as machine learning, web development, and data analysis. And finally, in Python, you don't need to assign the data type of the variable. when you assign some value to the variable, it automatically allocates the memory to the variable at runtime. Now with that, let's move on to the uses of Python programming. So, Python programming language is used to develop desktop applications and build web applications too. It is popularly used in the field of data science, machine learning and artificial intelligence to analyze data, build predictive models and make business decisions. Python is also widely used in game development. Now let's see some of the popular Python frameworks and libraries. Python can be used for web development using frameworks like Zango, Flask, Pyramid and Churi. Now you can build graphical user interfaces using libraries and frameworks such as Tkinter or just Kinter. You can also use PI GTK, PIQT or PYJS or Python JavaScript. Now, Python is also used to perform machine learning tasks using libraries such as TensorFlow, PyTorch, Scikitlearn, Mattplot Lib, and Scypi. You can also perform mathematical computations using numpy and pandas. Now, let's look at the best ids that you can use to write programs in Python and perform specific tasks. So, we have Jupyter notebook, which is part of the Anaconda distribution that is widely used these days. Even for our demo in this video, we'll be using Jupyter Notebook. I'll show you in a while. Then we have the visual code editor from Microsoft. This is also one of the preferred IDEs by learners and companies. Then we also have the popular text editor called Sublime Text editor. Then we also have PyCharm followed by Python and Spider as our top ides. Now let's look at the top companies that are using Python in our day-to-day work. So we have Google, Kora, Facebook, even Netflix, Spotify, and Instagram. Now there are other top product- based service- based and startups that also use Python programming. So what really is Python programming language? Python is an object-oriented highle programming language that supports built-in data structures and dynamic semantics. It supports multiple programming paradigms such as structured, object-oriented and functional programming. Python is often described as batteries included language because it has a comprehensive collection of standard libraries. Python supports different modules and packages which allows program modularity and code reuse. Python was developed by Guido Van Rosum and its implementation started in December 1989. Python 1.0 version was released in the year 1994. Python 2.0 came out in October 2000 while Python 3.0 was released in December 2008. Now that you have got an understanding of the Python programming language, let's now look at the top 10 reasons why you should learn Python. So at number 10, we have ease of use. One of the most common reasons to like Python is that it is quite easy to learn and code. It provides a simple syntax that improves readability and makes it easier to understand. So developers can create any desktop or machine based application using this language. Python is very versatile and is instrumental in artificial intelligence and machine learning. We will talk about this later in the session. Compared to Java or C, C++, it has fewer lines of codes. In the example here, we are printing a hello world program in Java. As you can see, if you have to write a program in Java, you first have to declare the class name along with its scope. Next, using curly braces, you need to pass the main method along with its arguments. And then using system.out.print print len method you can print hello world that's quite a tedious task isn't it the same task of printing hello world can be done using just one line of code in python as shown here you can write the print function and pass whatever you want to display inside the brackets and that will print the output it is so simple that is why Python is considered as a highle language and it's open source You can just download it from the website and start using it. At nine, we have active community. You need a community to learn new technology and friends are your best asset when it comes to learning a programming language. Python has large community support. It has an extensive and active community to assist engineers, developers, analysts, and data scientists with expert support in case of programming errors or issues with the software. You can just go ahead and put your queries in the community forum. The community members will address your queries in real quick time. Communities like Stack Overflow also brings many Python experts together to help learners. Python enhancement proposals or PEP is where the proposals and the improvements are announced. Also, there are a set of recommendations or core values called the Zen of Python written by Tim Peters that represents the guiding principles for Python development. Up next at 8, we have portable and extensible. Multiple cross- language operations can be performed effectively because Python is portable and extensible in nature. For example, if the users have a Python code written on Windows and they want to execute on a Mac operating system or Linux operating system or Solaris, they can easily do it without any amendment. They can also run this code on any platform flawlessly and without any interrupt. Due to its extensibility feature, you can integrate other programming languages such as Java,Net, C and C++ codes with Python. The components of other programming languages can be used with Python and thus it can be used to make a crossplatform suitable application too. So it is a really good feature that Python provides. The next reason to learn Python is testing frameworks. Python supports several built-in flawless testing tools and frameworks that help in debugging and speeding of workflows. Some of the tools and frameworks supported by Python are piest, selenium and splinter. This is the reason for which every tester tries to use Python based tools and frameworks to test any application or code or to validate it in an easier manner. Piest is the most recommended testing framework for functional, integrational, and unit testing. You can run Selenium test scripts using Python programming language to automate various tasks. And Splinter is an open-source tool for testing web applications using Python. It lets you automate browser actions such as visiting URLs and interacting with their items. At number six, we have libraries and packages. Another reason why Python has become so popular in the industry these days is that it has a massive collection of libraries and packages that make your task simple and easy. It has a range of libraries, packages, frameworks, and modules for data manipulation, statistical calculation, web development, machine learning, and data science. Python programmers have developed tons of free and open-source libraries that you can use. You can find many of them via Python package index the repository of Python software. Python provides the default package called pip. Anaconda is a third party Python ecosystem. Other examples include numpy, sci and zango. Then we have scripting and automation. Python is not just a programming language. It can also be used for writing scripts for automating tasks and workflows without human intervention. The code can be written in the form of scripts and executed later. Further, it is interpreted by the machine and checked for errors at runtime. The machine is used to read and interpret the code. Once the developer checks the code, it can further run or be used several times without any interruption. This allows you to automate a set of certain tasks within a program or the same code can be used with other applications as well. At number four, we have web development. Another reason to learn Python is that it makes the web development process so much easier. It provides a wide collection of frameworks that make it easier for developers to develop web applications. Some of the examples are Zango, Flask, Pyramid, Turbo Gears, CherryPie, etc. These frameworks are written in Python which makes the code a lot faster and stable. The task which used to take hours in PHP can be finished in minutes using Python. Python is also used for web scraping. Django offers many elements of intricate programs such as template design, management panel, signing in, signing up, signing out, URL routing, etc. Once the user establishes the framework, all these features become ready to use. Flask is a microwave framework written in Python. Of all the components that are part of this module, they are all ready to execute in the server context. Pinterest and LinkedIn use Flask. Pyramid offers more attributes than Flask. It will assist users with URL routing and authentication support. Turbo Gears is a highly recommended and scalable framework that supports features such as authentication, caching, identification, management of sessions, and pluggable applications. Up next at number three, we have machine learning. The growth of machine learning has been phenomenal in the last 5 years and it's rapidly changing the world around us. Python is one of the most preferred programming languages for machine learning because of its simple syntax and support for several machine learning libraries. Using different libraries and functions in Python, the system can learn and train itself from past data. Once the system is trained, it can then learn to adjust itself to new inputs. Finally, it can make predictions and perform humanlike tasks automatically. At number two, we have data science. Machine learning and data science go hand in hand. Python is robust, scalable and provides extensible visualization and graphics options. Hence, it is widely used in data science. Python has libraries such as numpy for numerical computation of data, pandas for operations to manipulate data on numerical tables and time series. It also provides simply for symbolic computation and sci for technical and scientific computations. It has another library called pyrain which is sought for python based reinforcement learning, artificial intelligence and neural network library. Scikitlearn is the machine learning library for creating classification, regression and clustering algorithms. And finally, it provides PyTorch and TensorFlow for deep learning. Finally coming to the most important and the top reason to learn Python which is career opportunities and salary. Python language provides a variety of job opportunities and promises a high growth graph with huge salary prospects. It is been used by most of the tech giants. Industry leaders using Python are Amazon, Google, Facebook, IBM, NASA, Netflix and YouTube. Next, you can see the Google trends but I have considered three programming languages Python, Java and C++. I have compared them for the past 12 months. You can see it clearly on your screens that Python has become a frontr runner in terms of popularity and web search volume. It means people are interested in Python. They want to learn it and use it in their work. You can also check for the YouTube search. There also you will find that Python programming language is the most searched language on YouTube. Now on your screens you can see the report of PPL which is popularity of programming language index. It is created by analyzing how often language tutorials are searched on Google. It is a leading indicator. The raw data comes from Google trends. The bar graph depicts that Python is the most popular and widely used programming language across the globe followed by Java then JavaScript and C. The popularity of programming language index can help you decide which language to study or which one to use in a new software project. The next graph shows the popularity of Python and Java over the years starting from 2004 till the current period which is 2020. Worldwide, Python is the most popular language. Python grew the most in the last 5 years by 19.4%. 4% and Java lost the most by minus 7.2%. Now let's talk about the different career opportunities and the job roles that you can get into if you learn Python language. First, you can become a Python developer where you will be asked to write and test codes, debug programs, and integrate applications with third party web services. Second, you can become a web developer. Here you will be responsible for writing serverside web application logic. Python web developers usually develop back-end components, connect the application with thirdparty services and support the front-end developers by integrating their work with the Python application. You can also become a data analyst if you know Python. As a data analyst, you have to gather data from multiple sources using scripts. analyze that data, develop and implement databases and data collection systems. You can become a data scientist. As a data scientist, you need to understand the challenges in business and come up with the best solutions using modern tools and techniques to analyze, visualize, and build prediction models to make business decisions. Lastly, you can be a machine learning engineer where you can develop intelligent machines that can learn from vast volumes of data and apply knowledge without human intervention. So there's a lot of scopes if you learn Python. But before we move on, let's understand first what is Jupyter Notebook. So guys, as you can see all over here that Jupyter Notebook is a popular open-source tool that basically allows you to create and share documents which contains codes, equations, you can have visualizations also. Basically, it is used for data analysis, machine learning and scientific research which makes it a very essential tools for developers like data scientists and researchers alike. Now before installing Jupyter notebook I request you that you have Python installed in your system. So the requirement should be Python 3.6 or greater. So now let us officially navigate to the Python's website. So guys as you can see all over here. So on python.org if I click on download Python. So we're going to see that all over here download Python 3.125. So as I already told you that the requirement of Python should be greater than 3.6. So just you can click all over here and you can see the download has started. So guys as you can see all over here that we have installed the Python. Now let us open the file. So you can see the given software is going to installed on this directory. Okay. So just click all over here. So guys as you can see all over here the Python installation of 3.125 is in progress. Let's wait for some time till it gets installed. So as you can see guys all over here that we have successfully installed our Python. Now let us open our terminal and let us check whether Python is correctly installed. So we are going to type python / version. So as you can see all over here we have successfully installed our Python. So guys that was our prerequisite. Now there are two ways to install Jupyter notebook. The first one can be pip. Okay, pip is a package manager or using Anocanda distribution. So let us see with pip first. So guys, pip is a package manager which is used to install and manage software packages libraries written in Python. So you can see all over here that the Python with version greater than 3.6 have default pip installed in them. Okay. So we can use pip command to install our Jupyter notebook. So guys as you can see all over here we have come to the official documentation of jupitter.org and it is saying that installing Jupyter lab with pip command. So what you can do guys you can just copy all over here. You can go right all over here and click on this. Now as you can see all over here it has started downloading the Jupyter lab. So guys, we are going to install our Jupyter lab with the pip command. So this is the official documentation of Jupyter notebook. Okay? And just all you have to do is copy this and type on your terminal. So as you can see all over here it has started downloading the packages which is required to download the Jupyter notebook. Let us wait for some time. Okay guys, so we have successfully completed this step. Now let us move on to our next step. So as you can see all over here. So we have installed. Okay. Then what we have to do then you can type this. We can launch the Jupyter lab with this command on the terminal. Now let us wait. So as you can see all over here guys, we have successfully installed our Jupyter notebook. So you can go all over here and just create a new notebook and you can also choose your kernel and you can start working on your Jupyter notebook. Suppose I'll show you one snippet. So 3 + 5. Let us try to run this notebook. So as you can see it is giving us the eight as answer. So it is following the Python syntax and in this way we have successfully installed our Jupyter notebook using the pip command. So now as you can also see all over here you can also install Jupyter notebook with this command pip install notebook and then you can just open it. This is also an another alternative. Similarly, you can install with vio also same command and just open the vio. Now if you are using any other operating system like Mac OS or Linux then you can install by bre install jupy lab. So homeu will be the package manager for Mac OS and Linux. So I hope so you are pretty clear with how to install Jupyter notebook with the pep command. Now I have downloaded Anacondas from this official website. So as you can see all over here this is the official website of Anaconda. Okay. Now just type your email and you can just download it. So similarly as you can see after installing I'm going to launch my installer and let us click next. Okay. Let us click agree. Okay. And let us install this on the given directory. Let us wait for some time till the installation gets complete. So guys as you can see all over here we have completed our installation of Anoconda. So just click on finish and you can say we have successfully installed our Anocanda. Now let us open our Anonda navigator. So just click on. So as you can see all over here just right click on this and our Anocanda navigator will be opened. So as you can see all over here this is our Anocanda navigator and it is loading the packages and for us to install the Jupyter notebook. So as you can see all over here just click on launch. So guys if you click on launch it is going to open our Jupyter notebook. So as you can see all over here it is saying launching the Jupyter notebook and it is hosted on localhost 8889. So this is our hosted Jupyter notebook and similarly you can create a new notebook all over here and in this way you can start working >> LLMs. If you ever wondered how machine learning can now understand and generate humanlike text you are in the right place. From chatboards like chat GPT to AI assistant that powers search engines, LLMs are transforming how we interact with technology. One of the most exciting advancement in this space is Google's Gemini or OpenAI Charging large language model designed to push the boundaries of what AI can achieve. In this video, we will explore what LLMs are, how they work, and why models like Gemini are critical for the future of AI. Google Gemini is part of a new wave of AI models that are smarter, faster, and more efficient. It is designed to understand context better, offer more accurate responses, and integrate deeply into service like Google search and Google Assistant providing more humanlike interactions. So, we will break down the science behind LLMs, including their massive training data set, transformer architecture, and how models like Gemini use deep learning innovation to change industries. Plus, we will compare Google Gemini to other popular LMS such as OpenAI, Chat GP models, showing how each of these technologies is used to power chat bots, virtual assistants, and other AIdriven application. By end of this video, you will have a clear understanding o

Original Description

🔥IITK - Professional Certificate Course in Generative AI and Machine Learning (India Only) - https://www.simplilearn.com/iitk-professional-certificate-course-ai-machine-learning?utm_campaign=SIyTpBNbKAo&utm_medium=Lives&utm_source=Youtube ️🔥 Professional Certificate in AI and Machine Learning - https://www.simplilearn.com/professional-aiml-program?utm_campaign=SIyTpBNbKAo&utm_medium=Lives&utm_source=Youtube 🔥IITG - Professional Certificate Program in Generative AI and Machine Learning (India Only) - https://www.simplilearn.com/applied-generative-ai-course?utm_campaign=SIyTpBNbKAo&utm_medium=Lives&utm_source=Youtube In this video on Machine Learning with Python full course, you will understand the basics of machine learning and Python. In this Machine Learning tutorial for beginners, we will cover essential machine learning topics like applications of machine learning and machine learning concepts and understand why mathematics, statistics, and linear algebra are crucial. We'll also learn about regularization, dimensionality reduction, and PCA. We will perform a prediction analysis on the recently held US Elections. Finally, you will study the Machine Learning roadmap. Below are the topics covered in this video: 00:00:00 Machine Learning With Python Full Course 2025 00:08:36 Introduction to Machine Learning 00:16:14 Top 10 Applications of Machine Learning 00:32:38 Types of Machine Learning 00:37:46 Machine Learning Algorithms 00:38:14 Linear Regression 00:46:52 Decision Tree 01:23:25 Clustering 01:26:11 K-Means Clustering 02:18:03 Data and its types 03:29:22 Probability 04:07:53 Multiple Linear Regression 04:45:55 Confusion Matrices 05:59:54 KNN 06:23:40 Support Vector Machine 07:14:40 Principle Component Analysis(PCA) 07:53:01 Corona Virus Analysis ✅ Subscribe to our Channel to learn more about the top Technologies: https://bit.ly/2VT4WtH ⏩ Check out the Machine Learning tutorial videos: https://bit.ly/3fFR4f4 #MachineLearningCourse #MachineLearningFullC

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Simplilearn · Simplilearn · 0 of 60

← Previous Next →

Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn

Ethical Hacking Full Course 2026 | Ethical Hacking Course for Beginners | Simplilearn

AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn

AWS Full Course 2026 | AWS Cloud Computing Tutorial for Beginners | AWS Training | Simplilearn

Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn

Data Structures And Algorithms Full Course | Data Structures and Algorithms Tutorial | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn

Microsoft Azure Full Course 2026 | Azure Tutorial for Beginners | Azure Training | Simplilearn

Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn

Shopify Tutorial For Beginners 2026 | Shopify Course | shopify dropshipping | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn

🔥Feeling Stuck? How Upskilling Can Boost Your Career! #shorts #simplilearn

Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn

Growth Hacking In Marketing | Learn Growth Hacking Marketing Strategies | Simplilearn

🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn

🔥Cracked 3 Job Offers with One AIML Course! | 20–30% Salary Hike #shorts #simplilearn

Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn

Top 10 Must-Have Figma Plugins for UI/UX Designers in 2026 | Figma Plugins | Simplilearn

Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn

Business Analytics Full Course 2026 | Business Analytics Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story

Simplilearn Reviews | Getting future-ready with course in Artificial Intelligence | Roopam’s story

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead

Simplilearn Reviews | How David Went From Seasoned Engineer to AI Innovator #GetCertifiedGetAhead

Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn

Complete Social Media Marketing Strategy for 2026 | Social Media Marketing Strategy | Simplilearn

🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts

🔥Top 4 Cybersecurity Certifications You Need! #simplilearn #shorts

🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn

🔥Cloud Engineer Salary in India 2026 | City-Wise Breakdown #shorts #simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Java Developer Course | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn

Social Media Marketing Full Course | Social Media Marketing Tutorial For Beginners | Simplilearn

How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn

How To Create LLM Chatbot Demo 2026 | Build a LLM Chatbot From Scratch | Simplilearn

Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn

Digital Supply Chain Management Certification | Supply Chain Management Course | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

ITIL Full Course 2026 | ITIL 4 Foundation Course | ITIL Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Integrating AI & Music | Diego's Story

Simplilearn Reviews | Integrating AI & Music | Diego's Story

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

Digital Marketing Full Course 2026 | Digital Marketing Tutorial For Beginners | Simplilearn

SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn

SEO Full Course 2026 | SEO Tutorial for Beginners | SEO Training | SEO Explained | Simplilearn

PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn

PMP Vs CAPM: Which Certification Should You Choose? | PMP Vs CAPM | Simplilearn

Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn

Complete Data Analyst Roadmap 2026 | How To Become A Data Analayst In 2026 | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn

🔥5 Jobs That Are Most Likely Safe from Layoffs in Today’s Market #shorts #simplilearn

🔥Git vs GitHub – What's the Difference?

🔥Git vs GitHub – What's the Difference?

What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn

What Goes Behind Building the Likes of Uber and Netflix? | Product Management Tutorial | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

AI Agents Full Course 2026 | AI Agents Tutorial for Beginners | How to Build AI Agents | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Full Stack Developer Course 2026 | Full Stack Java Developer Tutorial for Beginners | Simplilearn

Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn

Product Life Cycle 2025 | Stages Of Product Life Cycle | Product Life Cycle Tutorial | Simplilearn

Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn

Project Management Full Course 2026 | Project Management Tutorial | PMP Course | Simplilearn

PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn

PCB Design Course 2025 | PCB Designing Explained | How To Make PCBs | Simplilearn

Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn

Python Full Course 2026 | Python Data Analytics Tutorial For Beginners | Simplilearn

🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn

🔥Top Product Management Skills You Need to Succeed in 2026 #shorts #simplilearn

SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn

SQL For Data Analytics 2026 | Essential SQL Commands | SQL Tutorial For Beginners | Simplilearn

Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey

Simplilearn Reviews | Paving Way To Success With AI & ML Course | Soumik’s Upskilling Journey

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained

Learn Snowflake In 45 Mins | Snowflake Tutorial | What Is Snowflake | Snowflake Explained

🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn

🔥ML Career Tip – How to Start Learning Machine Learning in 60 Seconds! #shorts#simplilearn

🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn

🔥Agile vs Waterfall in 60 Seconds #shorts #simplilearn

Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn

Excel Full Course 2026 | Excel Tutorial For Beginners | Microsoft Excel Course | Simplilearn

What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn

What Are AI Agents? | Types Of AI Agents | AI Agents Explained | AI Agents Tutorial | Simplilearn

How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn

How To Create a Product Roadmap In 2026 | Product Roadmap | What Is Product Roadmap | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

SQL Full Course 2026 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn

🔥What Is Phishing? #shorts #simplilearn

🔥What Is Phishing? #shorts #simplilearn

Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn

Cloud Computing Full Course 2026 | Cloud Computing Tutorial | Cloud Computing Course | Simplilearn

Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji

Simplilearn Reviews | Overcoming Rejection & career plateau to finding a New Job : Bhaskar Banerji

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Six Sigma Full Course 2026 | Six Sigma Green Belt Training | Six Sigma Training | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

Generative AI Full Course 2026 | Gen AI Tutorial for Beginners | Gen AI Explained | Simplilearn

VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn

VLSI Design Course 2026 | VLSI Tutorial For Beginners | VLSI Physical Design | Simplilearn

Related AI Lessons

Super Mario is mathier than you think

Super Mario's world is full of mathematical concepts, making it a great example of how math is used in real-world problem-solving

MIT Technology Review

A Geometry Puzzle With 3 Circles

Solve a geometry puzzle involving 3 circles using mathematical reasoning and visualization techniques

Medium · Data Science

The Consecutive Integers Divisibility Trick

Learn the Consecutive Integers Divisibility Trick to simplify difficult proofs in mathematics and programming

Medium · Programming

The Mayans Invented Zero Before Most of the World — Here Is Their Number System in Python

Learn about the Mayan number system and its implementation in Python, highlighting the importance of zero in their base-20 system

Medium · Python

Chapters (17)

Machine Learning With Python Full Course 2025

8:36 Introduction to Machine Learning

16:14 Top 10 Applications of Machine Learning

32:38 Types of Machine Learning

37:46 Machine Learning Algorithms

38:14 Linear Regression

46:52 Decision Tree

1:23:25 Clustering

1:26:11 K-Means Clustering

2:18:03 Data and its types

3:29:22 Probability

4:07:53 Multiple Linear Regression

4:45:55 Confusion Matrices

5:59:54 KNN

6:23:40 Support Vector Machine

7:14:40 Principle Component Analysis(PCA)

7:53:01 Corona Virus Analysis

How to Open OSM Files (OpenStreetMap Data)

File Extension Geeks