Linear Regression Mathematical Intuition

Krish Naik · Beginner ·🔢 Mathematical Foundations ·7y ago

Key Takeaways

Provides mathematical intuition for linear regression

Full Transcript

hello all today we will be discussing about the maths intuition behind a regression problem statement in my previous videos I have already shown a lot of practical application examples with respect to simple linear regression and multiple linear relation but in this very particular video I am going to describe or discuss about the detail explanation on the max of the regression part so everybody remembers that the real the linear regression or the simple linear regression or multiple linear regression is basically given by the equation y is equal to MX plus C and this is basically my best fit line by using this equation I actually try to find out the best fit line over here my M is basically the slope right and C is basically the intercept let me just discuss about this first what does this basically means suppose I have a problem statement saying that with respect to the size of the house I have some prices in my data set so some of the points will be somewhere populated like this you know and what we are actually considering or what we are trying to implement by using a regression line regression algorithm is basically that we try to create a best fit line such that this best fit line what you know it will be indicate that for my future size suppose for this particular size I want to find out what may be the price of the particular house then what I can do is that I can actually point to this particular point over here and I can find find out the price value and I'll be able to determine the price with the help of this best fit line so this is my best fit line and as discussed this bedford line is basically indicated by the equation y is equal to MX plus C now what does this component mean that is my M value and my C value that is always remember whenever your size is 0 and your price is 0 right or whenever your size is 0 right not price is 0 I'll just cut this ok whenever your size is 0 now over here when I consider this size I am basically considering that my x value is 0 so this from this equation but if my x value is 0 so if my x value is 0 if I equate in this particular equation my Y will basically be Z C so my Y will basically be the C which is the constant or B in person and this basically indicates that when my size is zero at this particular point where does my price actually point to the y axis okay so this particular point in this particular equation is basically my C which is my intercept so when my size is zero and what is the point that I have in the y axis for the price that is the point that is called as C the other thing that I want to discuss is about M this M basically indicates that suppose over here I have something like thousand square okay pounds per metre a thousand square feet and here I have suppose 1,100 square feet right now within this unit change suppose if I am considering that 100 meter is my unit change in my x-axis within this unit change with this unit change what is the change in your best fit value what is the change of the price that we are actually considering it as a slope so this particular slope is basically indicating that with the unit change in my x axis that means my size what will be the change in your Y axis you know and that is what we are trying to find them now now the other way that you may be thinking that how can we think of a best fit line you know in this particular equation see one ways that I can draw multiple best fit like you know multiple best fit line like this and what I should try to do is that I should try to minimize this distance you know minimize this particular distance or this particular error such that if I do the summation of all the error it should be minimal right it should be minimal so this particular error we should try to find out the summation of all the error and whichever best fit line gives us the minimal error that value will will basically indicate or give me the slope value that is M and some value as C C which is my which will be my intercept now again I'll just clear this diagram and just try to explain you properly now you see this oops sorry now suppose I have this okay I have this x and y value suppose my X is basically my size and my is basically my price and I have a lot of points that is getting populated you know now from this I found out that best fit line is this one okay now what we should consider while selecting the best fit line I will define a function which is called as cost function again guys this cost function is very very important so this cost function is also a basic thing that we'll also be learning in D plumbing so it is very important to understand this so make sure that it was this particular video till the end so now this cost function can basically be indicated that as we said that the distance between the best fit point that is this and my real point should be minimal so I can write an equation saying as 1 by 2 m M basically means that it is the number of points all the points you know with respect to the x and y axis 1 by 2 m and say summation of I is equal to 1 by M 1 1 M and then I will write it as Y hat minus y whole square and this is what I'm going to write it down now you know that my Y hat will basically be indicated by y is equal to MX plus C this points that I have over here right this is basically my Y hat you know this is my Y hat the points that we find out or that we predict in the best fit line is basically a Y hat and this Y basically indicates the real points you know it indicates the real points so we should try to minimize this error we should try to minimize this error and while minimizing whichever whichever best fit line gives you the minimum error that is circuit as the best fixed line but now the next question Rises that just by using this equation how do I find out the best fit line because I can have multiple best fit lines right I can have lot of best fit lines you know and then from that I have to compute all the summation and then try to find out what is the minimum value that will not exactly work you know that is actually that will actually more amount of time and which will unnecessary waste your amount of processing power and amount of cross unit cells you know you can't just select million number of lines and try to find out the cost function instead I just show you a more efficient way so to begin with I'm going to give you an example again so suppose I am considering x and y ok so this is my X ok and this is my Y and suppose this values I have like 1 2 3 & 4 right similarly I have we have 1 2 3 4 now suppose let me consider some points over here suppose my first point my Y point is basically when my x value is 1 my Y value is also 1 so I will draw my first point over here my x value is 2 my Y value will also be my second point my x value is 3 my Y value will also be 3 suppose these are my three points ok and now this is my data this is my real data ok and this real data is basically given by the equation y is equal to X because when I am saying my x value is 1 my Y value is 1 now the next thing is that I need to find out my best fit line you know for this particular point I'll write the equation as Y is equal to MX plus C ok now let me consider that my C value is 0 okay the reason why I'm telling C value is 0 because I'll try to find out a best fit line through this particular points and I will consider that that C value passes through the origin you know the origin when my x value 0 my Y value is right and again there is a reason why I am making it is C is equal to 0 because I am going to draw a diagram over here which if I make C is equal to 0 then I will be able to draw 2d diagram if not if I consider C is equal to with some other value then I have to basically draw a 3d rider and for a 3d diagrams it will definitely be very difficult for me to draw it over here ok so I am considering the C value as 0 indicating that it passes through the origin ok when my X values here and my Y values so when I make the C value as 0 my new equation is something like Y is equal to X now this Y hat is basically indicating my best fit line so for X is equal to 1 4x is equal to 1 let me equate X is equal to 1 and try to find out my y hat value and let me consider that my M slope initially I am just initializing my M as 1 okay so when my M is 1 so let me just equate it over here okay so when my M is 1 so this basically indicates that my Y hat is basically 1 multiplied by 1 let me just write it in a lighter format so Y y hat will be actually 1 right so my first point after 0 my best fit line will pass through this Y hat okay so this particular point also has Y hat and this particular point also has Y now for my X is equal to 2 what will be my Y hat you know that my slope is 1 and my x value is 2 so it will be 2 so then again this line gets extended and gets passed through this particular point similarly Y hat when I value is 1 and my x value is 3 because my exile is 3 and this particular point is 3 then again my Y hat will actually be 3 so it will pass through this particular point now this is what my best fit line is you know my best fit line for this particular value when my slope is 1 when my M value is 1 very important now after I get this particular equation I will basically find my cost function dice remember I have already discussed about the cost function and the formula is something like 1 by 2 M summation of 1 to M Y hat minus y whole square so this particular value have to reduce it you know I have to find out this error and try to reduce it sorry oh no what I'm going to do is that I'm going to equate this and suppose for the summation first of all when my x value was worried my x value is 1 what was y hat Y hat was 1 so I'll write it as 1 minus what was my Y value when X is 1 the real Y value is 1 so I'd like 1 minus 1 whole square plus then when my X is 2 my Y hat was 2 so this will be 2 minus 2 whole square plus 3 minus 3 poles but when I equate all these things obviously my M value is 3 points so M is actually 3 you know so 1 by 6 multiplied by 0 is nothing but 0 ok so now this is very very clear when my em value when my slope was one you know and for this point I got the first function as zero so what I will do is that I will try to draw one more diagram and this is the most important diagram guys please focus into this so here it is I'm having 0.5 1 1.5 2.0 2.5 right and suppose this y-axis basically indicates my cost function my cost function suppose I write it as J of some M value slope okay and my here my slope is basically my M value and here also I will write it as 0.5 11.52 now what I am trying to do is that everybody please spoken into this what I am trying to do it in in this particular thing is that with respect to every M value that I have in each lies what is the cost function that I have got I going to plot it over now initially here with respect to the M value as one you know I have got my cost function as zero so what I'm going to do my M value is 1 over here my cost function is 0 so this is the point that I am going to get I hope it is pretty clear now in my next step what I will do I change this m 1 suppose I take my M value as point 5 ok now with respect to M is equal to 0.5 for this equation if I equate okay my Y hat for X is equal to 1 will be point 5 my Y hat for X is equal to 2 will be 1 and my Y hat for X is equal to 3 will be 1 point 5 see how I'm getting this guys you just have to equate this see when my slope is 0.5 when my X is 1 point 5 into 1 point 5 right my Y hat is 1 1 into 1 1 then when my slope is point 5 and my x value is 3 you know 3 into 0.5 is 1 point 5 so I will be getting new points somewhere like this 1 2 and so this is my oh sorry this is my one point one so when I draw my best fit line it'll look like this you know now when I try to find out the cost function when my M value is 0.5 you know I will be getting you just have to equate in this it will be 1 by 2 M summation of I is equal to 1 by M and it will be nothing but suppose my for X is equal to 1 my y hat was actually 0.5 0.5 minus what was my wife it was 1 whole square plus similarly you will do it for 101 minus 2 whole square plus 1.5 minus 3 whole square so just equate in this particular equation place the M value as 3 and M basically means oh here M is nothing but or you can just write it as n if you are getting confused okay the N is basically the number of points okay and if you compute this you will be getting somewhere around 0.58 now when I say 0.5 it you know this is your cost function when your M is 0.5 your cost function is 0.5 so here you can see when your M is 0.5 your cost function is 0.5 it so it will be coming somewhere here the next point will be coming somewhere ya know then similarly for different different M values you know you'll be getting points which will form this kind of curvature this kind of curvature for different different M balance you know and when you draw this when you draw this you will be getting a diagram which looks somewhere like this which looks somewhere like this and this is basically called as a gradient descent now this gradient descents plays a very important role guys which I'm going to explain you in the next screen now once you get this gradient descent when should you know that you should stop you know for selecting a m value which looked good for this regression line or for the best fit line that is the next thing that I'm going to discuss so before that I'm going to clear all this diagram and let me just focus on two thing one is the gradient descent as I said that this is my M value this is my cost function that is J of M and here I will write it as 0.5 1 - sorry 1.5 to 2.5 right similarly here I like it as 0.5 1 to 2.5 P okay and you can see that my graph looks something like this a second okay so my graphs look something like this suppose I'm getting this point somewhere populated like this and this point populated like this I'm just going to draw my gradient descent again it may not be approximately correct but I so it may not be approximately correct but I'm just trying to draw this diagram properly for you okay so here it is I'm going to draw this which looks like this and which looks somewhere like this okay guys so this is basically my gradient descent which I have drawn again I like it as gradient descent now in my previous diagram that I've already shown you that based on different different M values you are getting different different points and finally we could follow this particular structure but the next thing is that how to be arrived to this particular region and this region is basically called as global minima global minima now the next thing is that I need to arrive it at this particular position so for for that initially suppose I consider that based on some M value I got my initial point somewhere with you know somewhere over here so when I get my initial point over here that basically means that I have to move downwards right so in order to move downwards I will basically write a theorem which is called as convergence theorem now for this convergence theorem basically says that the M value you should subtract with M minus derivative of M you know derivative of M with respect to M you know derivative of M with respect to M such that you know this derivative multiplied by one more value which is called as learning rate which is called a which is basically indicated by alpha so this is my learning okay now let me just show you why this particular equation works okay now suppose initially and this derivative is basically my slope okay this derivative is basically my slope okay now let me just tell you now suppose for some for some M value I got to this particular point or particularly this position and then I'll be applying my convergence theorem now convergence theorem basically says that I have to subtract with the slope of this particular point so if I want to find out the slope you just have to draw a straight line like this and this particular straight line is basically helping you to find out the derivative of the slope when I draw this particular slope the next thing I have to find out whether this is a positive slope or a negative so that is important to find out now how to find out whether this is positive or negative now you should see or focus on the right-hand of the slope and the left-hand of the slope in the right side or the right-hand of the slope is pointing downwards you know is pointing downwards at that and you can say that this is basically a negative slope you know then negative slope now you can see that at this particular point suppose my M value was somewhere like five minus point five okay then your feasible M value is somewhere around one now when you find out your negative value your your this particular point value is having a negative slope what you do is that you subtract em with some and then whenever you do a negative slope derivative you will be getting a negative value okay and this alpha or learning rate will be a smaller one I'll tell you why we have to select this as a smaller value and I say smaller value then this value can be somewhere like point zero zero okay so when I take a negative slope okay when I find out the derivative of a negative slope then it will be a negative value and which it'll be a very minimal or small value okay now when I do like this I'll just write it as M plus some positive nine you know a smaller positive man because C minus into minus is plus right so plus all positive smaller value because this is a smaller value of learning rate so what will happen is that that basically indicates my M value should increase for minus point five and I should come nearer to one so this step will be very very small you know it will be very very small and as iterations and different M values get selected this will be moving towards this particular global minima point now the next thing is that if I select this learning rate as a larger value like 1 ok 1 point 0 0 what will happen is that instead of taking this smaller step this point may jump to some other points like this ok this may take a longer jump and it may not reach this global minimum even after many regressions even after many hydration it may not reach this global so for that reason we usually select the learning rate value as a very smaller value you know and okay let me just consider that suppose for men suppose I selected a random M value and I got the point somewhere here it's somewhere here suppose if I got it somewhere now you should see that when I get this point over here if I try to find out the slope of this particular point or the derivative of this particular point I see that my right hand side is pointing upwards and my left hand side is pointing downwards so this is basically my positive slope and when I find trying to find out a derivative of a positive slope this basically indicates that my derivative will be nothing but M minus this derivative will basically be a positive value and then I'm going to multiply with my learning rate then which will be nothing but M minus some smaller value when I do n minus smaller value then you can see that initially suppose my m value was 2 I have to reach it to 1 so it is subtract a smaller value and so all the same it is very important to understand that how our learning rate should be very very small and this convergence theorem is very very important to reach this particular global minimum point so as soon as it reaches over here at this particular point if I try to find out the slope the slope will be 0 the slope will be 0 and when I have a slope is 0 that time my M value will actually my M value we'll specify that this should be the value of this should be the slope of the best fit point that fit line until then I have to follow this convergence so once I am a once I get to this particular point this particular location when my slope is zero I will basically or my algorithm will basically be considering this M value as my best fit as the slope of the best fits line and that is the point where I have to stop training and that is the point which should be able to determine that that is the value of -5 you know and this basically indicates the whole explanation the theoretical concepts along with maths of the whole linear regression algorithm now the next thing is that if I have multiple multiple features in suppose if I just I don't just have only one independence each I have multiple independent at that particular point of time my gradient descent will look like a 3d diagram or a 4d diagram based on the number of features and each and every feature will try to move towards the global minimum point which will be this particular minima I hope you like this particular discussion guys I hope you like the step by step process and how we derived it please go through this video from once again from the starting and now after understanding this thing how do you implement it I'll be providing the link you can see in the top right corner I'll be attaching a link and there you can actually see by limitation part of simple linear regression and multiple linear regression I hope you like this particular video um please do subscribe to channel if you have not subscribed and yes I'll be coming up again with us some new good videos where I will be discussing all about this mathematical combinations then we'll try to derive all the things in front of you okay so god bless you all keep learning you're doing a great job and thank you for supporting my channel thank you one and all have a great day

Original Description

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. #LinearRegressionMaths You can buy my Book on Finance with ML and DL from the below Link https://www.amazon.in/gp/product/B07Q5W7GB1?pf_rd_p=f2b20090-067d-415f-953d-b8dcecc9109f&pf_rd_r=DT87C838ZN6DYRNTE7QR
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Krish Naik · Krish Naik · 40 of 60

1 Natural Language Processing|Stemming
Natural Language Processing|Stemming
Krish Naik
2 Natural Language Processing|BagofWords
Natural Language Processing|BagofWords
Krish Naik
3 Gaussian distribution or Normal Distribution in statisctics
Gaussian distribution or Normal Distribution in statisctics
Krish Naik
4 Natural Language Processing|TF-IDF for Machine Learning| Text Prerocessing
Natural Language Processing|TF-IDF for Machine Learning| Text Prerocessing
Krish Naik
5 Log Normal Distribution in Statistics
Log Normal Distribution in Statistics
Krish Naik
6 Covariance in Statistics
Covariance in Statistics
Krish Naik
7 Confusion matrix, Precision, Recall| Data Science Interview questions
Confusion matrix, Precision, Recall| Data Science Interview questions
Krish Naik
8 Tutorial 44-Balanced vs Imbalanced Dataset and how to handle Imbalanced Dataset
Tutorial 44-Balanced vs Imbalanced Dataset and how to handle Imbalanced Dataset
Krish Naik
9 Implementing a Spam classifier in python| Natural Language Processing
Implementing a Spam classifier in python| Natural Language Processing
Krish Naik
10 Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset
Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset
Krish Naik
11 Face Recognition using open CV and VGG 16 Transfer Learning
Face Recognition using open CV and VGG 16 Transfer Learning
Krish Naik
12 Pedestrian Detection using OpenCV from Videos
Pedestrian Detection using OpenCV from Videos
Krish Naik
13 Face and Eye Detection from Videos using HAAR Cascade Classifier
Face and Eye Detection from Videos using HAAR Cascade Classifier
Krish Naik
14 Reading, Writing and Displaying images with Opencv| OpenCV Tutorial
Reading, Writing and Displaying images with Opencv| OpenCV Tutorial
Krish Naik
15 OpenCV Installation | OpenCV tutorial
OpenCV Installation | OpenCV tutorial
Krish Naik
16 Face and Eye Detection from Images using HAAR Cascade Classifier
Face and Eye Detection from Images using HAAR Cascade Classifier
Krish Naik
17 Car Detection using HAAR Cascade and Opencv from Videos.
Car Detection using HAAR Cascade and Opencv from Videos.
Krish Naik
18 Using OpenFace for Face recognition in Keras
Using OpenFace for Face recognition in Keras
Krish Naik
19 OpenPose Tutorial with Tensorflow
OpenPose Tutorial with Tensorflow
Krish Naik
20 Multiple Linear Regression using python and sklearn
Multiple Linear Regression using python and sklearn
Krish Naik
21 Dimensional Reduction| Principal Component Analysis
Dimensional Reduction| Principal Component Analysis
Krish Naik
22 Movie Recommender System using Python
Movie Recommender System using Python
Krish Naik
23 TPR,FPR,FNR,TNR, Confusion Matrix
TPR,FPR,FNR,TNR, Confusion Matrix
Krish Naik
24 Precision, Recall and F1-Score
Precision, Recall and F1-Score
Krish Naik
25 Artificial Neural Network for Customer's Exit Prediction from Bank
Artificial Neural Network for Customer's Exit Prediction from Bank
Krish Naik
26 GridSearchCV- Select the best hyperparameter for any Classification Model
GridSearchCV- Select the best hyperparameter for any Classification Model
Krish Naik
27 RandomizedSearchCV- Select the best hyperparameter for any Classification Model
RandomizedSearchCV- Select the best hyperparameter for any Classification Model
Krish Naik
28 K Nearest Neighbor classification with Intuition and practical solution
K Nearest Neighbor classification with Intuition and practical solution
Krish Naik
29 K Means Clustering Intuition
K Means Clustering Intuition
Krish Naik
30 Create custom Alexa Skill- Lambda function- Part2
Create custom Alexa Skill- Lambda function- Part2
Krish Naik
31 Hierarchical Clustering intuition
Hierarchical Clustering intuition
Krish Naik
32 Implement Transfer Learning with a generic Code Template
Implement Transfer Learning with a generic Code Template
Krish Naik
33 Gender Classifier and Age Estimator using Resnet Convolution Neural Network
Gender Classifier and Age Estimator using Resnet Convolution Neural Network
Krish Naik
34 Unlock Your Application With Your Face using OpenCV
Unlock Your Application With Your Face using OpenCV
Krish Naik
35 Draw rectangle from webcam and sketch process it on a live feed
Draw rectangle from webcam and sketch process it on a live feed
Krish Naik
36 Complete Life Cycle of a Data Science Project
Complete Life Cycle of a Data Science Project
Krish Naik
37 How we can apply Machine Learning in Finance
How we can apply Machine Learning in Finance
Krish Naik
38 Deep Learning in Medical Science
Deep Learning in Medical Science
Krish Naik
39 How to switch your career to Data Science.
How to switch your career to Data Science.
Krish Naik
Linear Regression Mathematical Intuition
Linear Regression Mathematical Intuition
Krish Naik
41 Handle Categorical features using Python
Handle Categorical features using Python
Krish Naik
42 Machine Learning Algorithm- Which one to choose for your Problem?
Machine Learning Algorithm- Which one to choose for your Problem?
Krish Naik
43 DBSCAN Clustering Easily Explained with Implementation
DBSCAN Clustering Easily Explained with Implementation
Krish Naik
44 Curse of Dimensionality Easily explained| Machine Learning
Curse of Dimensionality Easily explained| Machine Learning
Krish Naik
45 Feature Selection Techniques Easily Explained | Machine Learning
Feature Selection Techniques Easily Explained | Machine Learning
Krish Naik
46 Tutorial 29-R square and Adjusted R square Clearly Explained| Machine Learning
Tutorial 29-R square and Adjusted R square Clearly Explained| Machine Learning
Krish Naik
47 Cross Validation using sklearn and python | Machine Learning
Cross Validation using sklearn and python | Machine Learning
Krish Naik
48 Handling Missing Data Easily Explained| Machine Learning
Handling Missing Data Easily Explained| Machine Learning
Krish Naik
49 Deploy Machine Learning Model using Flask
Deploy Machine Learning Model using Flask
Krish Naik
50 Deployment of Deep Learning Model using Flask
Deployment of Deep Learning Model using Flask
Krish Naik
51 How to Visualize Multiple Linear Regression in python
How to Visualize Multiple Linear Regression in python
Krish Naik
52 K Nearest Neighbour Easily Explained with Implementation
K Nearest Neighbour Easily Explained with Implementation
Krish Naik
53 Predicting Heart Disease using Machine Learning
Predicting Heart Disease using Machine Learning
Krish Naik
54 Predicting Lungs Disease using Deep Learning
Predicting Lungs Disease using Deep Learning
Krish Naik
55 Stock Sentiment Analysis using News Headlines
Stock Sentiment Analysis using News Headlines
Krish Naik
56 Random Forest(Bootstrap Aggregation) Easily Explained
Random Forest(Bootstrap Aggregation) Easily Explained
Krish Naik
57 Voting Classifier(Hard Voting and Soft Voting Classifier)
Voting Classifier(Hard Voting and Soft Voting Classifier)
Krish Naik
58 Credit Card Fraud Detection using Machine Learning from Kaggle
Credit Card Fraud Detection using Machine Learning from Kaggle
Krish Naik
59 Hyperparameter Optimization for Xgboost
Hyperparameter Optimization for Xgboost
Krish Naik
60 Tutorial 45-Handling imbalanced Dataset  using python- Part 1
Tutorial 45-Handling imbalanced Dataset using python- Part 1
Krish Naik

Related AI Lessons

Up next
How to Open OSM Files (OpenStreetMap Data)
File Extension Geeks
Watch →