Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
In today's episode we are starting by talking about the first classification algorithm. This is the K-Nearest Neighbors Classification.
K-Nearest Neighbors Blog Post: https://www.neuralnine.com/k-nearest-neighbors-classification-from-scratch-in-python/
Website: https://www.neuralnine.com/
Instagram: https://www.instagram.com/neuralnine
Twitter: https://twitter.com/neuralnine
GitHub: https://github.com/NeuralNine
Programming Books: https://www.neuralnine.com/books/
Outro Music From: https://www.bensound.com/
Subscribe and Like for more free content!
What You'll Learn
The video demonstrates K-Nearest Neighbors classification using scikit-learn and numpy in Python, covering its application in binary and multi-class classification, and evaluating its performance using accuracy score.
Full Transcript
what is going on guys welcome to this Python tutorial series for machine learning in today's video we're going to talk about the que neighbors classifier K nearest neighbors classification which allows us to classify unknown data by looking at data that it's already classified so let us get into the explanation so let's always let us start out with a quick explanation in paint and for this I'm going to once again draw a coordinate system on the x-axis we're going to have the height or the heights of persons and here we're going to have the weight or the weights of these persons and now what we could do is we could say okay the red people are the overweight people they have a lot of weight even though they're not very tall so they have a low height but a very high weight so we would classify them as overweight so because you know you could have a lot of weight but also be very tall which would be normal and you would be somewhere around here which is not necessarily overweight because you know if you're two meters tall having a having like 100 kilograms it's not that bad but if you're 150 and you have a hundred kilograms probably you're overweight and we're going to do the same thing with a blue class these are the very tall people that are not that have a very low weight so we could say these are the underweight people with a very skinny people and this is the blue class now of course in this example we don't have any quote/unquote normal people so we don't have average people we only have the extremes but I use this example here to show you how the KE neighbors classification works so what we want to do is we went we have this classified these classified data points because this is a supervised learning algorithm we already totale the model that we have the red points and the blue points and what we now do is we get a new example in gray one an unclassified one which is somewhere like here and we want our model to now say is this point a blue point or a red point is this person overweight or underweight or skinny basically and this is what K neighbors does for us it looks at the neighbors too termen if this point is a rat point or a blue point now we as humans would immediately classify this point to be rat rather than blue we would probably say it's not even red but it's definitely not blue it's more red than blue and the algorithm does the same things because what it does is it says K is a certain number for example K could be 1 K could be 2 K could be 6 whatever and what it does is it takes K amount of neighbors to compare are the unknown point do so it says okay the nearest neighbor is this one and if K is 1 it only looks at this one point and says okay this point is red so your rat then if case two it looks at the second nearest point and a third nearest point and so on and in the end of course these are now all red but if you let's say if we would have some green class over here what would happen if we have K equals 2 or actually 2 is not a good number here would let's say we have K equals 4 what we could do is we could have this is the nearest second nearest sorry my drawings are quite shitty today again then this might be the third nearest and even though it's not true let's assume that's the third nearest the fourth nearest point we would now say okay we have one Green Point but three red points so we classify this point as a rat point so we're looking at the nearest neighbors and of course what is important is that you should pick K should be a number that is not divisible or by the amount of classes so if you have three classes or actually the other way around I think the amount of class it should not be divisible by K because if you have three classes you you should not pick three points because if I have or actually K the value 3 for K sorry if you have a point right here ah which is actually not that good but let's say you have it somewhat like here and you have K equals three you would have this one this one and this one had roughly the same distance and this is of course not what we want so we want to have a classification that makes sense so if you have three classes you might want to pick four or you might want to pick one one it's always good because one is binary but of course one it's not accurate because what happens if I have for example a bunch of green points here a lot of them and they're pretty near to the gray one but then I have a red point for some reason an outlier that is here and if I only look at one if I only say K equals one look for one neighbor what happens is I take this one even though obviously you would classify this as a green point because it's nearer to this point to this class sorry so this is how cane Abers basically works to classification as a in the same way that I don't explain linear regression or I didn't explain with linear regression in much detail I'm not going to do this with K neighbors because I have a blog post and Dad I'll link it down in the in the description so if you're interested in the K neighbors algorithm from scratch and Python the mathematics behind it check out the blog post and maybe I'll make some videos about that in the future so this is the basic way that K neighbors classification works now let us look at a practical example in Python of the K neighbors classification and we're not going to classify overweight people we're going to look at a data set that comes from scikit-learn and this is the breast cancer data set this data set has a lot of parameters about cancer about different tumors and it classifies tumors is either malignant or benign so either bad tumors or good tumors and we're going to do that with the K neighbors classification so what we're going to do is we're going to say from SK learn dot datasets import and we're going to import load breast cancer this is the function to load the data set and also we're going to import sq learn from SK learn Abers import k neighbors classifier and maybe we're also going to need numpy who knows maybe not and one more thing we need to import from SK learn dot model selection import train test split so that we can split our data into training and test data yeah so what we're going to do first is we're going to get a look on the data so we're going to take a look on the data and see what it looks like we're just going to say data equals our load breast cancer now we're going to say print data so let's go ahead it's a little bit laggy right now so that we can see what kind of columns and what kind of data we have in here okay this is the data itself I think if you want to know the features we've got to say data dot features and data dot targets to know the targets I think this is how it works feature maybe feature and not features not true what did I do in the example oh sorry feature names feature names feature names and target names make sense though there you go as you can see these are the features that we have off a tumor we have the texture the radius the area the compactness and all kinds of different parameters that we can use to classify a tumor as either a malignant or benign and then we have the two classes the two targets malignant or benign so that good basically or harmful and not harmful so this is the data set that we have and what we're going to do now is we're going to split this data set up into training data and testing data so we're going to say X train X test I think it was like that then Y train and Y test equals train test split data and we want to have a test size off actually we do need data dot dot features I think or no actually we need data data so we need to pass a numpy array of data data because this is actually the data is actually the feature data and the target is the target so we got a pass NP array data data and NP dot array data dot target there you go and the test size is 0.2 so basically what we're doing is we're saying take the data which is the feature data so take all these parameters all these rows of these values put them into a numpy array then also take all the results all the classes all the classifications all the targets and put them in another numpy array and then split 20% of that or put 20% of that into test data and 80% of that into training data so we split the data also we shuffle a little bit in there because the split is not always the same it does not take the first 80 in the last 20 it shuffles around before it splits so it's also a little bit randomized which is good and then we have this data we have the training data and a test data and the next step of course is to define a classifier so we're going to say CLF for a classifier or call it whatever you want equals K neighbors classifier and now we can specify a I think it was n neighbors and neighbors and we can specify how many neighbors we want to look at and I'm going to pick 3 because 3 is a good number we have two classes so 3 is fine because we cannot have a tie there we gotta have one that's one class that has one more neighbor so we have the classifier and what we do now is we say CLF train on the x train Y train data so we use the training data for the features in the training data for the target and then we train them all on that or actually I'm quite student why did I say train fit sorry see laugh not fit and then we use the training data after we have done that what we're going to do is we're going to evaluate them all so we're going to say print CL f dot score which is the model for testing how well a model performs and we're going to say CL f dot score and we're going to use the tests data to check if the training was good or enough if the mouth performs well because the kassitus of course data that the model has never seen before and as you can see down here we got a 93 point 85 or 86 actually if you round it up accuracy which is really good because it means that if we give if if we have a new tumor that we don't know if it is malignant or benign we get all these parameters and feed it into a model it will tell us with a 93 point eight percent accuracy as this tumor is malignant or benign which is pretty good actually because otherwise you would have to guess and guessing is always 50/50 if you have two classes so 93 percent is actually very good now of course what you could do is you could just go ahead and say see LF not predict and pass a list or an umpire array actually with all these parameters and then classify them as either malignant or benign which you would do if you would have a or an application in the medical area but for this tutorial doesn't make sense to just make up some values so if you wanted to predict tumor data you would have to get new tumor data put it in into a numpy array pass it to the predict method and then you would get either a malignant or benign as a result and this is how it works so this is how you use que neighbors classification in Python so that's it for today's video thank you very much for watching I hope you learned something I hope you enjoyed this video if so hit the like button and support this channel with a like also feel free to ask questions and give feedback in the comment section down below and of course subscribe to this channel if you want to see more free videos in the future so thank you very much for watching soon a next video and bye [Music] you
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from NeuralNine · NeuralNine · 42 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
▶
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
Python Beginner Tutorial #5 - Loops
NeuralNine
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
Python Beginner Tutorial #7 - Functions
NeuralNine
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
Python Beginner Tutorial #9 - File Operations
NeuralNine
Python Beginner Tutorial #10 - String Functions
NeuralNine
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
Python Intermediate Tutorial #6 - Queues
NeuralNine
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
Python Intermediate Tutorial #9 - Recursion
NeuralNine
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
Python Intermediate Tutorial #11 - Logging
NeuralNine
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
Python Machine Learning #4 - Support Vector Machines
NeuralNine
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
Making Text Images Readable Again with Python and OpenCV
NeuralNine
Neural Networks Simply Explained (Theory)
NeuralNine
Motion Filtering with OpenCV in Python
NeuralNine
Top 5 Programming Languages To Learn in 2020
NeuralNine
Simple TCP Chat Room in Python
NeuralNine
Image Classification with Neural Networks in Python
NeuralNine
Edge Detection with OpenCV in Python
NeuralNine
S&P 500 Web Scraping with Python
NeuralNine
Simple Sentiment Text Analysis in Python
NeuralNine
Introduction - Algorithms & Data Structures #1
NeuralNine
More on: Supervised Learning
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
FastAPI for Production AI: From Notebook to Scalable APIs
Dev.to AI
FastMCP 3.0 Cut My MCP Server Code in Half. Here’s How.
Medium · Python
Price elasticity model [R]
Reddit r/MachineLearning
Beyond the Credit Score: What 1.3 Million Loans Reveal About Who Actually Repays
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI