Linear Regression From Scratch in Python (Mathematical)

NeuralNine · Advanced ·💻 AI-Assisted Coding ·4y ago

Key Takeaways

The video implements linear regression from scratch in Python, covering mathematical concepts such as error minimization, mean squared error, and gradient descent. It provides a step-by-step guide on how to implement the algorithm using Python, including data loading, visualization, and optimization.

Full Transcript

[Music] what is going on guys welcome back in today's video we're going to implement linear regression from scratch in python and a warning right ahead it's gonna be mathematical so let's get right into it all right so let's get started with the very basics of linear regression what is linear regression and what can it be useful for and we're going to start with an example right away let's say we have a bunch of students and these students have an exam and they have a certain study time each of them has a certain study time that they invested for this exam so let's say we have the study time here it can go from zero hours up until not really infinity but pretty much infinity depending on what the maximum number is of the students so you can study two weeks in advance one year in advance you can study one minute in advance uh from zero up until let's say limitless and then we have an exam score and the exam score is the actual result of the test and it can go from zero points to a hundred points or from zero percent to a hundred percent so what we can do now is let's say we have a data set with these data points and we want to plot this here on a two-dimensional coordinate system that's not the most beautiful coordinate system i know but let's say the x-axis is the study time and the y-axis is the exam score now if we were to visualize all these points we would probably see something like that with a bunch of data points here people who don't study at all and get pretty low scores people who study a little bit get better scores but still some that get pretty bad scores and all that um and we can see all these data points people that study a lot and get a lot of uh points or a very high score then some people who study a lot and don't get very good grades at all some people who don't study at all and get very good good grades those are the outliers but most of the points are going to be somewhere here in the middle now what linear regression tries to do is it tries tries to find a line that fits these points the best now best is a little bit subjective because certain regression types or certain algorithms use different uh approaches because some say okay i don't care about outliers i care a lot about outliers and so on depending on what you want you have to choose a different procedure but for linear regression what we're interested in is minimizing the error and the error is basically let's say i have a linear function here a blue line like that and this is obviously not the best function that we can find for this for these data points but the error of that function would be to basically go for each point and see if this function was correct we would predict that for this x value here this would be the y value and then what we do is we just go down and see okay this here is the error and this here is the error and this here is the error this is just a difference from the prediction to the actual reality that is the error okay and what we want to do is we want to find a function that minimizes that error and this is what linear regression is about so the structure of a function is basically of a linear function is basically y equals m times x plus b so m is basically the steepness and b is the distance here so we expect the line the final line to be something like that to be pretty much fitting the points maybe be influenced a little bit by the by the outliers and so on but all in all we want them to fit most points we want to minimize the error in total and for this of course what we need to do is we need to minimize the error function this the actual minimization process is a little bit more complicated but the error function in and of itself needs to be defined first so that we can minimize it because in order to minimize something with the gradient descent algorithm that we're going to talk about in a second but in order to minimize something it has to be something that produces a value because we want to minimize the output by tweaking the little things that we can't weak in our case if we look at m times x plus b we want to manipulate we want to tweak m and b so that for x i get the best possible y with the least error and the error function needs to be defined in order to minimize it so the error function is going to be e capital e is going to be 1 divided by n which is i'm going to to tell you in a minute becau in a second why it's 1 divided by n times the sum from i equals 0 up until n and now we get y i which is the actual value that we get here so this point is y i and this point is y i the value of the y value of this point is the actual y i value and from this we subtract y i uh hat for example so this is just the the predicted value this is this value here and we can actually remove that and replace it by m times x plus b so that is the error and we can square that and that would be the mean squared error and we can add an i here to make it a little bit more accurate so if you have never seen this before this is called the mean squared error function don't be confused it's not really complicated it may look complicated because it's math and for a lot of people math looks just complicated but all we're doing here is this is a mathematical way of writing that for each point and n is the amount of points for each point from zero from the zeroth point from the first point to the last point what we do is we get the y value of the actual point so this y value here or this y value here or this y value here the actual value and we subtract um the position of the function y value so if we have a function like this we will take the actual y value here and this y value here and this is the difference so this is what we're going to get and then we square that difference and in the end we take all these distances all these arrows that we have and we divide them by n which is the amount of points so we get the mean squared error the mean squared error so this is what this function does it's just a fancy way of saying take the difference from all the actual points and what the function would predict the proper y value would be square that difference and divide it by the amount of points that we have to get the mean squared error this is the error function that we're trying to minimize when it comes to linear regression all right so this next part here is going to be very technical and mathematical because we're going to talk about partial derivatives about calculus so if you're not interested in that or if you don't understand calculus at all you can skip to the coding part directly i would not recommend it because even if you don't understand everything it's good to just understand or listen to what is happening behind the scenes even if you don't understand all of the math um so i would recommend you to watch that part as well but it may be a little bit boring because it's very technical and mathematical i personally think it's one of the most exciting things to understand what is exactly happening how does this optimization work but i can understand if some of you guys say i don't want to listen to that i want to get to the coding just keep in mind that you won't understand fully what is happening behind the scenes so let's get into it um we want to minimize that error function we want to get the lowest possible e for our line we want to find the line that gives us the lowest possible e so the only thing that we can influence is the m and the b the x is just the input and the y is just the output we want to find m and b so that we can minimize e that is our goal and how can we do that we can do that by taking the partial derivative with respect to m and with respect to b because that gives us the direction of the steepest ascend with respect to m and b so how can we change m to maximally increase e and how can we change b to maximally increase e now you might say okay didn't we want to decrease e yes we did want to decrease e but if you just take the opposite direction if you have the direction of the steepest ascent you can just go the opposite direction and you have the direction of the steepest descent so this is what we're going to do we're going to take the partial derivative with respect to m and b and then we're just going to go to the opposite direction of this gradient so we're going to say the partial derivative of e with respect to m is going to be or is actually 1 divided by n times i equals 0 to n and now we have this squared here so we say 2 times those are just the basic calculus rule so if you don't know calculus don't be confused you don't need to understand everything minus m times x i plus b and now we need to also multiply with the inside derivative which is basically just this is a sum so we just ignore it and this is the factor so what we end up with is negative x i um and now we can simplify that by just extracting the two and the negative so we can say okay this is nothing but just minus 2 divided by n times the sum from i equals 0 up until n and then we have x i times y i minus m times x i plus b so that is the partial derivative with respect to m now let's do the same thing for b it's basically the same but the difference is that we don't have the x i because here we don't have a factor so this is actually the same thing um we basically have negative 2 divided by n times sum from i to n and then just y i minus m times x i plus b that is it and this those two things give us the direction of the steepest ascent with respect to m and b and all we need to do now is we need to go to the opposite direction that's all we need to do so if we want to improve m and b all we need to do is with each iteration we say take the current m and what you do is you assign to it the current or take the new m what you do is you assign to it the current m minus a learning rate we're going to talk about that in a second times the direction of the steepest ascent so basically e and m and the same for b b equals b minus l this is more of programming notation not necessarily a mathematical notation but that is what we do so we know okay in this direction let's say uh we increase the error the most so what we do is we just go the opposite direction and we do that with each iteration because it changes right so sometimes in this direction it's going to be the steepest descent uh then in another direction then again in another direction especially if we don't just deal with two variables but with many different variables we don't do this only with uh score and with um study time sometimes we have like 10 20 a thousand different features that we have to take into account when doing a linear regression so it's not always just two things and this again this is the direction of the steepest ascent which is why we subtract because we don't want to go to that direction we want to go to the opposite of that direction otherwise we would have a plus here and not a minus and the learning rate is basically how big uh how how large are the steps that we take now the larger the steps the faster we're going to get to the actual optimization but the lower the learning rate the better it's going to be in the end the the better the result is going to be because because we're paying attention to details much more so i think we're going to go with a learning rate of about 0.001 which is or maybe one more zero we're going to do that and now that we have all the math handled we're going to do this in python now we're going to take all this theory and turn it into python code and for this we're going to start by installing two libraries however since we're implementing linear regression from scratch those two libraries are not going to be related to the linear regression algorithm we're just going to use pandas to load a data set from a csv file and we're going to use matpotlib for visualization the whole linear regression process will be implemented by us from scratch but if you don't have pandas in math.lib you want to go to cmd and say pip install pandas and pip install matplotlib like that and once you have that you can just go ahead and say import pandas spd and import matplotlib.pyplot splt now what you're going to need is some sort of data set you can craft your own you can just make up some values or you can if you're comfortable with that take an actual data set and apply linear regression onto it for this video i have just crafted my own sample csv file just some random values that have a certain trend inside of them we have x y and then just a bunch of random values here randomly generated however you can you can as i said go with a real data set and we can also interpret those as study time and score if you want to however i think we have some values above 100 as you can see here 117 so it's not entirely accurate but we can just go ahead and call this study time and score so this is just a basic comma separated value file csv file with some random values we can load them in by saying data equals pd dot read csv data dot csv and then we can take a look at it by saying print data to see the structure of the data frame nothing too fancy two columns study time score and the values we can also go ahead and visualize them by saying plt scatter data dot um what was it study time and data dot score then plt dot show like that and there you go so those are the data points that we're going to use for this regression example here you can actually pick any data points the data is not the focus here we just want to have a properly working algorithm so i'm going to delete the visualization here and we're going to start with the loss function so we're going to say def loss function you can also call this mean squared error and to this function we need to pass the m value the b value and the points that we have the actual data points so in our case data and what we're going to do here is we're basically going to say we have a total error which starts at zero and what we do is we add all the individual squared errors to that and in the end we divide by the amount of points so we say 4i in range length and what was it points we're going to say the x value that we have is the points dot i log so at the location i we want to have the study time as the x value and then at the location i want to have the score as the y value and the error is basically just what we had already so if we look at my paint here uh what was the error function there you go here this is the error function we're just gonna write it in a pythonic way now um so total error let me just see how i wrote it total error plus equals and then the loop is basically the sum and what we write in here is the iteration of this sum uh sigma symbol so we say to the total error we want to add y the actual y point minus what we thought then y point should be based on m and b so m times x plus b and the whole thing squared because it's the mean squared error and in the end what we do is we just say total error divided by float length of points like that so that is the basic loss function it tells us how off how much we're off from the actual result now what you need to know about this loss function here is that we're actually not going to use it because what we're actually interested in is just minimizing it and it's already included in the gradient descent because we cannot just have this function and tell python take the derivative of it so we need to do it manually and we have done it manually already so this function is more like a function that you can use if you want to calculate the loss manually but we're actually not going to use it in the final optimization process so let's go ahead and implement the gradient descent we're going to call this function gradient descent we're going to pass here m now b now the current values here we're going to pass the points and want to pass a learning rate l um and what we're going to do now is we're basically going to just um start with a gradient for m of 0 and for b as well and then we're going to say n is just the length of the points the amount of points and what we're going to do now is we're going to just perform the gradient descent again we're not going to use that function this is just a separate function that you can use manually here we're going to already use the the formulas that we had here so those two formulas here that we already talked about these partial derivatives and those already include the loss function to some degree so what we're going to do now is we're going to say 4i in range in so for each point we're going to do is we're going to take that point so x equals then points i lock i dot study time and then y equals i lock i dot score and what we do now is we calculate the gradient based on the function here again all i'm doing here is just typing these two functions to these two lines this negative 2 divided by n whatever the loop is the sum symbol here all i'm doing here is just putting this into python code if you want to understand what's happening here you need to go back to the mathematical explanation so we're just going to say m gradient plus equals and then negative 2 divided by n actually the negative is out here negative 2 divided by n [Music] times x times y minus m at this particular moment in time times x plus b at this moment in time and then b is basically the same but without the x there you go um then once we have that once these iterations are done we know what uh which direction we have to move into or away from actually so what we do is we basically say okay the new m is going to be the m now um and we're going to say minus m gradient so in the opposite direction but with a learning rate of l that determines how much we move same for b and that is actually it and in the end of course we return m and b both values that is the gradient descent function we can actually get rid of that function if you're not interested in it but that is all we need to do in order to perform the linear regression so now we just need to execute everything we say okay what is uh m we start with zero b we start with zero we can pick any starting values if we want to we will use a learning rate of 0.0001 and we're going to use 100 iterations also called epochs we can also go with thousand actually so let's go with a thousand and now what we do is we just say four i in range epochs so for the amount of iterations here we're just going to say m and b are going to be gradient descent off m b and the data and the learning rate so we're constantly going to get a better and better and better and better at estimating the perfect m and b and in the end we can print not plt we can print m and b and we can plot the results we can say plt dot scatter data dot study time data dot score color is going to be black and then plt dot plot and we can plot the trend line here which is basically just a list and we can look at the csv file all the values should be uh all the x values at least should be more than 20 and less than 80 so we can just go with list range 20 to 80 just to have some values and then we're going to say m times x plus b for x in range 20 80. there you go and the color of the regression line will be red then plt dot that's it you can now run this maybe you want to print the epoch so let's say if i modulo 50 equals 0 print epoch i like that inside of the string maybe there you go let's see if that works or if we made mistakes epos0 50 100 let's just turn the number down so we get the results faster let's go with 300 and then we should see pretty decent results 0 50 100 and we should see a pretty solid trend line here or regression line there you go seems about right as you can see this is a pretty good actually i think it's the optimal linear regression line here because we had like 300 iterations those are the values here for m and b so this is how you implement linear regression from scratch in python so that's it for today's video hope you enjoyed hope you learned something if so let me know by hitting a like button and leaving a comment in the comment section down below and of course don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free other than that thank you very much for watching see you next video and bye [Music] you

Original Description

In this video we implement the linear regression algorithm from scratch. This episode is highly mathematical. ◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾ 📚 Programming Books & Merch 📚 🐍 The Python Bible Book: https://www.neuralnine.com/books/ 💻 The Algorithm Bible Book: https://www.neuralnine.com/books/ 👕 Programming Merch: https://www.neuralnine.com/shop 🌐 Social Media & Contact 🌐 📱 Website: https://www.neuralnine.com/ 📷 Instagram: https://www.instagram.com/neuralnine 🐦 Twitter: https://twitter.com/neuralnine 🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/ 📁 GitHub: https://github.com/NeuralNine 🎙 Discord: https://discord.gg/JU4xr8U3dm 🎵 Outro Music From: https://www.bensound.com/ Timestamps: (0:00) Intro (0:19) Mathematical Theory (12:48) Implementation From Scratch (24:05) Outro
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from NeuralNine · NeuralNine · 0 of 60

← Previous Next →
1 Visualizing Stock Data With Candlestick Charts in Python
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
2 Python Beginner Tutorial #1 - Installation and First Program
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
3 Python Beginner Tutorial #2 - Variables and Data Types
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
4 Python Beginner Tutorial #3 - Operators and User Input
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
5 Python Beginner Tutorial #4 - If Statements and Conditions
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
6 Python Beginner Tutorial #5 - Loops
Python Beginner Tutorial #5 - Loops
NeuralNine
7 Python Beginner Tutorial #6 - Sequences and Collections
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
8 Python Beginner Tutorial #7 - Functions
Python Beginner Tutorial #7 - Functions
NeuralNine
9 Python Beginner Tutorial #8 - Exception Handling
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
10 Python Beginner Tutorial #9 - File Operations
Python Beginner Tutorial #9 - File Operations
NeuralNine
11 Python Beginner Tutorial #10 - String Functions
Python Beginner Tutorial #10 - String Functions
NeuralNine
12 Python Intermediate Tutorial #1 - Classes and Objects
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
13 Python Intermediate Tutorial #2 - Inheritance
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
14 Python Intermediate Tutorial #3 - Multithreading
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
15 Python Intermediate Tutorial #4 - Synchronizing Threads
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
16 Python Intermediate Tutorial #5 - Events and Daemon Threads
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
17 Python Intermediate Tutorial #6 - Queues
Python Intermediate Tutorial #6 - Queues
NeuralNine
18 Python Intermediate Tutorial #7 - Sockets and Network Programming
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
19 Python Intermediate Tutorial #8 - Database Programming
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
20 Python Intermediate Tutorial #9 - Recursion
Python Intermediate Tutorial #9 - Recursion
NeuralNine
21 Python Intermediate Tutorial #10 - XML Processing
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
22 Python Intermediate Tutorial #11 - Logging
Python Intermediate Tutorial #11 - Logging
NeuralNine
23 Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
24 Python Data Science Tutorial #2 - NumPy Arrays
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
25 Python Data Science Tutorial #3 - Numpy Functions
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
26 Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
27 Python Data Science Tutorial #5 - Subplots and Multiple Windows
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
28 Python Data Science Tutorial #6 - Matplotlib Styling
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
29 Python Data Science Tutorial #7 - Bar Charts with Matplotlib
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
30 Python Data Science Tutorial #8 - Pie Charts with Matplotlib
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
31 Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
32 Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
33 Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
34 Python Data Science Tutorial #12 - Pandas Series
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
35 Python Data Science Tutorial #13 - Pandas Data Frames
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
36 Python Data Science Tutorial #14 - Pandas Statistics
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
37 Python Data Science Tutorial #15 - Pandas Sorting and Functions
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
38 Python Data Science Tutorial #16 - Pandas Merging Data Frames
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
39 Python Data Science Tutorial #17 - Pandas Queries
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
40 Python Machine Learning Tutorial #1 - What is Machine Learning?
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
41 Python Machine Learning Tutorial #2 - Linear Regression
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
42 Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
43 Python Machine Learning #4 - Support Vector Machines
Python Machine Learning #4 - Support Vector Machines
NeuralNine
44 Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
45 Python Machine Learning Tutorial #6 - K-Means Clustering
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
46 Python Machine Learning Tutorial #7 - Neural Networks
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
47 Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
48 Generating Poetic Texts with Recurrent Neural Networks in Python
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
49 Stock Portfolio Visualization with Matplotlib in Python
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
50 Analyzing Coronavirus with Python (COVID-19)
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
51 Making Text Images Readable Again with Python and OpenCV
Making Text Images Readable Again with Python and OpenCV
NeuralNine
52 Neural Networks Simply Explained (Theory)
Neural Networks Simply Explained (Theory)
NeuralNine
53 Motion Filtering with OpenCV in Python
Motion Filtering with OpenCV in Python
NeuralNine
54 Top 5 Programming Languages To Learn in 2020
Top 5 Programming Languages To Learn in 2020
NeuralNine
55 Simple TCP Chat Room in Python
Simple TCP Chat Room in Python
NeuralNine
56 Image Classification with Neural Networks in Python
Image Classification with Neural Networks in Python
NeuralNine
57 Edge Detection with OpenCV in Python
Edge Detection with OpenCV in Python
NeuralNine
58 S&P 500 Web Scraping with Python
S&P 500 Web Scraping with Python
NeuralNine
59 Simple Sentiment Text Analysis in Python
Simple Sentiment Text Analysis in Python
NeuralNine
60 Introduction - Algorithms & Data Structures #1
Introduction - Algorithms & Data Structures #1
NeuralNine

This video teaches how to implement linear regression from scratch in Python, covering mathematical concepts and practical implementation. It provides a step-by-step guide on how to load data, visualize it, and optimize the model using gradient descent. By watching this video, viewers can learn how to implement linear regression and improve their machine learning skills.

Key Takeaways
  1. Define the mean squared error function
  2. Find the partial derivatives of the error function with respect to m and b
  3. Minimize the error function using calculus
  4. Implement the loss function using mean squared error
  5. Update m and b using gradient descent
  6. Plot the data and the trend line
💡 The video highlights the importance of mathematical optimization in machine learning, particularly in linear regression, and provides a practical guide on how to implement it using Python.

Related AI Lessons

Up next
Azure Security Priorities for 2026: Identity, Governance, AI Security & Zero Trust
Valto Microsoft Specialists
Watch →