K-Nearest Neighbors Classification From Scratch in Python (Mathematical)
Key Takeaways
The video demonstrates implementing a K-Nearest Neighbors classifier from scratch in Python, using Euclidean distance as the distance measure, and visualizing the results using matplotlib. The implementation includes fit and predict functionality, and the code is designed to work with multiple dimensions.
Full Transcript
what is going on guys welcome back in this video today we're going to implement a machine learning classification algorithm from scratch in Python namely the K nearest neighbors classifier so let us get right into it [Music] all right so we're going to implement K nearest neighbors classification from scratch in this video today meaning we're not going to use psychic learn we're not going to use tensorflow or any other machine learning library we're just going to use numpy so that we can work with arrays efficiently and we're going to use mat.lib for the visualization everything else will be written in core Python and from scratch so we're going to implement the distance measure from scratch and we're going to also implement the fit in predict functionality from scratch now one thing that might be interesting to you guys is that there is also a text base so a written version of this tutorial you can find on my website neuronline.com K nearest neighbor's classification from scratch and python now this post is quite old so the code that we're going to write in this video today will not be 100 the code of this blog post it will be more professional today and better code however uh since this blog post exists I can just explain the mathematics briefly here because I already have um basic visualizations and formulas here so the basic idea of K nearest neighbors classification is very simple as the name already says we're using the K nearest neighbors K being a number a variable the K nearest neighbors to determine the class of a new point so let me just open here the most advanced visualization tool there is paint and let's say we have a data set where we have I'm going to use now a red brush here we have a couple of data points here that have the red color as their class so to say so they belong to the class red and then we have some that belong to the class blue and they're clearly separated in this case now oftentimes in real life data sets are more mixed so you will have some red points here's some Blue Points here but let's say this is the data set now what the k nearest neighbors classification algorithm does is it takes a new Point let's say this one here and it looks at the K nearest neighbors meaning it calculates the distance to all these points to every single point also the other ones here and then it takes the K nearest points so the points the K points with the smallest distance and the class it ends up having is the class that the most points have so in this case probably the K nearest points are this one uh probably this one maybe this one all of them are red now let's say we also have a blue point here this would not change the result if we're looking at the three nearest neighbors the three nearest neighbors are this one this one and let's say this one in this case we have two red instances one blue instance so the class of this new point of this unlabeled point would be read according to the three nearest Neighbors that's the basic idea and this is what we're going to implement today now the distance this is important is going to be the euclidean distance so basically what you use uh let me open paint again basically when you have a point here and you have a point here the euclidean distance is just the distance in general so meaning x and y basically the square root of x squared plus y squared uh so the Pythagorean theorem that's the basic distance here and this works for n Dimension so it doesn't have to work only in two Dimensions the formula in general of the euclidean distance is this one um depending on how many dimensions you have and dimensions this is how you calculate it so you just take the square root of 0.1 and minus 0.2 squared for each axis so x-axis y axis z-axis and so on depending on how many dimensions we have so very simple just a Pythagorean theorem in N Dimensions uh nothing too complicated and this is what we're going to implement in this video today so in order to train a k nearest neighbors classifier we don't have to do anything because the training is already just the points existing the actual work happens when you make a prediction because when you have a bunch of data points a couple of Blue Points a couple of red points this is already your model so to say because you already have those points it's not like with other machine learning algorithms where you have to train a model and then you have to fit that model and then you use that model to make predictions the K nearest neighbors classification model is so to say the data itself because all we need to do is we need to calculate distances to the points and we can only calculate distances to the points when we have a new point so we cannot do anything without having a new point to calculate the distances too um yeah so that's basically what we're going to do we're going to start here by importing numpy SNP and matplomplib.pi plot splt by the way if you don't have these libraries installed you just go into your command line when you type pip install numpy matplotlip like this and then we're also going to use here from collections this is core python we're going to just import the counter so that we can count the most common class in the K nearest neighbors um and what I'm going to do now here is I'm going to artificially Define some points so we're going to say here points equal a dictionary we're going to have the blue class which is going to have the following points to four one three I'm looking at my second screen because I want to take the exact same points that I used in the preparation because those are uh well separated um so those are the coordinates for the Blue Points um and then we also want to have red points and the red points are going to be 5 6 4 5 4 6. 66 and 5 4. now you can also come up with your own points those are just some coordinates that are separated and uh what we want to classify now is a new point so we're going to say here new point is going to be equal to 3 3 for example which is somewhere in the middle uh but probably closer to Blue so this is in two Dimensions we're also going to look at an example in three dimensions but we're going to start with a 2d example here and what we want to do first is we want to have a very simple one-line function that calculates the euclidean distance between two points and this can be done very easily with numpy uh so this is not a cheating because we're not using numpy to do the actual to to just apply the actual formula we're just using numpy so that we can easily do calculations with the array so what we're doing here is we're taking P which is one point so let me just Define the structure here um euclidean distance from P to Q those are two points now we want to take every dimension of p in every dimension of Q and get the difference so basically P minus q and we want to square that we want to square that distance and then we want to sum it up and you want to take the square root now if you're working with basic python list you would have to iterate over all the dimensions you would have to use loops and stuff like that with numpy we can just do NP array we turn P into an array now it's a list so this is P for example this is q for example or actually this this would be P this would be Q for example we're turning both of them into numpy arrays and when you apply subtraction onto a numpy array that's not the same that happens that's not the same thing that happens with lists so I think when you subtract lists uh it's not even supported but when you subtract one numpy array from another numpy array you subtract the values from all the other values so P minus Q means every value in P minus every value in Q is the result and what we want to do with this is we want to say NP dot sum I want to sum up all the differences I want to square root not Square we want to square them and then we want to take the square root of that of that or actually let me just see does that make sense no if we we actually is this a this is a mistake in my code here let me just see how I did it here euclidean distance actually I don't think that that's correct right because we're squaring and then we're taking the square root what we should be doing as it's written here in the formulas we should Square the distance so this is actually a mistake in my blog post so don't read it we're squaring the actual distance so we're doing this we're subtracting um we're squaring that we're summing that up and then we're taking the square root so MP square root of NP sum is that not what I did here right so here I have some of yeah this is this is wrong this is how you want to do it because you want to subtract here you want to square the distance you want to square the difference then you want to take the sum of all that and then you want to do the square root because because if you do the square root of the square you're just getting the sum right so this is not what you want this is how you calculate the euclidean distance and then what we want to do is want to we want to create a simple classifier so k nearest neighbors I'm just going to call it like that K nearest neighbors um and for the Constructor we don't want to have too many things we just want to Define K so the nearest neighbors to look at we're going to set it by default to three I'm going to say South dot k equals k and sound on points are going to be none so remember I said training the K nearest neighbors classifier means having points points existing is a trained model in this case so what we want to do here is we want to say fit we want to provide the training data in this case so the points um and basically self.points equals points that's the training there's not much more that we can do um and then for the predict this is where the actual work happens predicting means calculating the distances between um the one new points to predict on uh to make a prediction for between this point and all the other points so what we want to do here is we want to say predict um self and then you point is the input here and we want to keep track of all the distances and we're going to then say four category I don't want to use class because class is a keyword uh four categories so four blue and red would want to do for category in South dot points uh we want to say for point in that category so self dot points category uh not not a string actually the variable category so four all blue points four all red points would want to do is wanna get the distance which we're going to calculate using the euclidean distance function between the current point that we're looking at in the new point and then we want to append to the distances list um a list with two elements the first one being the actual distance and the second one being the class so the category um because then we're going to get just a k nearest points with the smallest distances and then we're going to just look at the categories and sum them up and this is done by saying categories equal category index one to get the actual category not just a distance for category in sort it sort it is important we want to sort the distance so we sort the distances by distance so the smallest distances are going to be the first elements we sort the list and then we're going to just get the first K instances so up until self.k so in this case we'll get the first three instances uh the three instances with the smallest distance those are the categories we we don't get the full distance we don't get the full list here we just get the category so we would have blue blue or blue red blue whatever we would just have the categories of the three nearest instances and then what we want to do is want to say result equals counter counter categories and return the most common one zero zero to get the actual uh result and then we return the result and that's basically how you do a k nearest neighbors classification so what we're going to do now is we're going to create a classifier k nearest neighbors k equals three so basically the default parameter we don't need to even specify it and clf is going to be fit onto the points now it's important of course for this particular classifier that the structure looks like this so if you pass a numpy array or a pandas data frame it's not going to be able to handle that you would have to do some pre-processing for that but this is also not the focus of this video to make it compatible with all input types but what we are going to do now is we're going to print clf predicts and we're going to predict the new point and this gives us blue now this is not very satisfying because we don't see anything we don't see how it works and all that so what we're going to do now is we're going to visualize these distances we're going to visualize the classification using mapped up lip so I'm going to start a new section here visualize um let me zoom in a little bit and we're going to say here the axis it's going it's going to be a new subplot we're going to set the grid to True with a color off three two three two three two so gray basically um and we're going to say axis figure axis figure dot set face color we're going to apply a dark theme here this is basically what I did for the blog post so um this is the theme that we're going to use here this one um so we're just setting some parameters here tick underscore params are going to be the x-axis is going to have white ticks the y-axis is also going to have white ticks and then we're going to plot the actual point so four point in points of category blue in this case we're going to scatter all these points so Point index 0 coordinate 0 you could say x coordinate point y coordinate and the color is going to be um basically a blue which is going to be one zero four dc8 with s equal to 60 I think that's the size um then we're going to copy that we're going to paste this we're going to change this to red and I'm going to just provide you now ff000 um and then we're going to also have the new point so we're going to say here the new class of the point is going to be whatever the prediction says so in this case it will be blue we predict on a new Point what class it is and then we say the color of this new point is going to be uh basically ff000 if new class equals red else it's going to be 1 0 4 dc8 and then we just plot this new Point as well so scatter it onto the axis new point zero new point one color is whatever the color happens to be based on the class the marker is going to be a star so that we have a different um marker for this new point to be able to differentiate then we're going to say s equals to 100 and Z order is 100 basically just for the for the placing in the plot so if it's on top or not if some other element is in the same area um and then what we're going to do is we're going to draw a dashed line towards each point so the distance we're going to visualize the distances so we're going to use the same Loop structure here I'm going to copy this but we're not going to scatter so I'm going to remove this and I'm going to remove this but for each point in blue we will say axis plot and we're going to plot new point zero new point uh not new point zero sorry new point zero point zero so the x coordinates and new point one two point one so the Y coordinates uh and the color of the line is going to be the blue color so one zero 4D C8 the line style as I set will be dashed and the line width is going to be thin so one we're going to copy this now down below the only thing that we're going to change is the color everything else stays the same um and then that's basically it we need to say PLT show we can run this and if I didn't make any mistakes uh I think I forgot some color option right we need to say axis dot set underscore face color because that's the color of the figure but the face color is something else and this is going to be black there you go so now we have this basic visualization of the K nearest neighbors classification we have this new Point here those are the distances to the individual points and you can see probably the nearest point is this one then the second nearest Point probably this one and then probably this one so all of them are blue um what you will notice is of course that we can Force Red if we just add enough red points and we set K to the number of all points so if I have here one two three four five Blue Points five red points if I add one more red point so I have six red points and five blue points and I include all the points uh we're of course going to get read as a class so we can actually do that if you don't believe me so we're going to add now a new Point wherever it is it can be very far away now we have 11 points instead of 10 points and I can include all of them in the calculation so I can go and set where do I create a classifier here I can say K I can set k equal to 11 and this will automatically likely end up being read just because it looks at all the points and even though this is super far away even if all of them were super far away and all of them were super close all the blue ones were super close all the red ones were super far away it would still get uh it would still become red because it just includes all the points and we have more red points than Blue Points um so of course it doesn't make sense to just increase the number of K for accuracy here all right so that's it for two Dimensions now let's do the same thing for three dimensions all we're going to do now here is we're going to just include a bunch of new um we're going to include an extra Dimension so we're going to include a z coordinate in all these points um so five one three six for example and down below we're going to say five six five four five two four six one six six one five four six and then maybe ten ten four let's just keep this point here uh and a new point is going to be three three one for example or let's go three three four do I have it more in the center um that's new point we don't have to change anything about this function it's still the same because it includes all Dimensions this is a very flexible function we can use it with a hundred Dimensions as well we also don't have to change anything about the classifier because the classifier is also able to work with 100 Dimensions all we need to change is the visualization so what we need to do now is we need to say uh instead of uh where's the code instead of saying now axis equals PLT subplot we're going to say figure equals PLT figure we're going to say figure size equals um 15 12 for example and then axis is going to be figure dot add underscore subplot and we're going to set the projection this is the important part I think it would have also worked with the previous uh structure but the important things that we pass projection equals 3D because now we're working on a three-dimensional plot and no longer on a two-dimensional plot so all we need to change here now is we need to add this index to every word so this new dimension point zero point one point two point zero point one 0.2 also of course um here for the new point and then of course also for the lines so when we plot a line now we're plotting it in three dimensions so we need to add a new list here with new point two and point two uh and the same thing needs to be done here as well so new point to point two comma that should actually be it if I didn't forget anything and now you can see we have this three-dimensional plot here uh where K nearest neighbors classification also works so you can see we can draw the lines it's still blue because those points are just closer we can also try to move the point let me just see what coordinates we have here um okay we cannot see the labels because uh I messed up the color scheme but I think if we move the point um just to higher values if I say five it's a bit too much five five and maybe two or something okay it's still blue uh why is it still blue should it be blue how many points is it looking at okay guys so everything we just did was basically uh stupid because we didn't return the actual distance so we had a bunch of non-objected with objectives was just counting we need to of course return the square root uh in order to make a calculation so everything we did up until now also in two Dimensions was not correct now the results happened to be correct because blue was the correct answer but it was just by coincidence but the whole code was actually correct the only thing that was missing was this return statement here because without returning the actual square root the return value of the function is none and it just counts uh the categories so whatever you have more of will be the actual class but now it should be red there you go so now it works and if I move it back to where it was so three three four I think it should be probably blue again there you go now it works now it makes sense this also works in two Dimension now two Dimensions now and then four and ten and 100 Dimensions but this is how you implement the K nearest neighbors classification from scratch all right so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leave a comment in the comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye thank you [Music]
Original Description
Today we implement a K-Nearest Neighbors classifier from scratch in Python.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: https://www.neuralnine.com/books/
💻 The Algorithm Bible Book: https://www.neuralnine.com/books/
👕 Programming Merch: https://www.neuralnine.com/shop
🌐 Social Media & Contact 🌐
📱 Website: https://www.neuralnine.com/
📷 Instagram: https://www.instagram.com/neuralnine
🐦 Twitter: https://twitter.com/neuralnine
🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/
📁 GitHub: https://github.com/NeuralNine
🎙 Discord: https://discord.gg/JU4xr8U3dm
🎵 Outro Music From: https://www.bensound.com/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from NeuralNine · NeuralNine · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
Python Beginner Tutorial #5 - Loops
NeuralNine
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
Python Beginner Tutorial #7 - Functions
NeuralNine
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
Python Beginner Tutorial #9 - File Operations
NeuralNine
Python Beginner Tutorial #10 - String Functions
NeuralNine
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
Python Intermediate Tutorial #6 - Queues
NeuralNine
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
Python Intermediate Tutorial #9 - Recursion
NeuralNine
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
Python Intermediate Tutorial #11 - Logging
NeuralNine
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
Python Machine Learning #4 - Support Vector Machines
NeuralNine
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
Making Text Images Readable Again with Python and OpenCV
NeuralNine
Neural Networks Simply Explained (Theory)
NeuralNine
Motion Filtering with OpenCV in Python
NeuralNine
Top 5 Programming Languages To Learn in 2020
NeuralNine
Simple TCP Chat Room in Python
NeuralNine
Image Classification with Neural Networks in Python
NeuralNine
Edge Detection with OpenCV in Python
NeuralNine
S&P 500 Web Scraping with Python
NeuralNine
Simple Sentiment Text Analysis in Python
NeuralNine
Introduction - Algorithms & Data Structures #1
NeuralNine
More on: AI Pair Programming
View skill →Related Reads
📰
📰
📰
📰
Will AI Replace Backend Developers Before Frontend Developers? The Truth in 2026
Medium · AI
Six Laws for Talking to AI
Dev.to · Antonio Zhu
The End of Clicking: How AI Is Quietly Changing Computers Forever
Medium · AI
The AI Band Problem: Why Suno Still Struggles With Four Distinct Singers
Medium · AI
🎓
Tutor Explanation
DeepCamp AI