Linear Regression From Scratch - Part 3
Key Takeaways
Linear Regression is implemented from scratch using gradient descent and QR decomposition, with a focus on mathematical deductions and linear algebra.
Full Transcript
hey everyone my name is amal and welcome to a series of videos very very will be implementing linear regression from scratch this is the third video of the series of videos on linear regression from scratch let's go right into it okay we need to essentially minimize J with respect to theta in this equation J of theta is equal to norm of Y minus X theta whole square this is a continuous convex and quadratic function and therefore will have a global minima when it's a continuous convex and quadratic function I mean it's a function that looks somewhat like this it is continuous everywhere it is convex because it has a bulge at towards en it looks like a bowel and quadratic because it has a quadratic part and therefore it will have a global minimum and if you look at the figure here the global minima is clearly evident and that is what we want to find we want to find the point where we can minimize this equation so we find the minimum by finding that one place where gradient is 0 in all directions how do we find the minimum of this function or this equation is by finding that one place where gradient is 0 so gradient is a concept and calculus we will not get into what gradient is what differentiation is all of that but if you want a refresher maybe you can just go and search gradients in calculus and you will get what happens we will see some aspects of differentiation here but we are not going to go into much detail now let's get started how does one take gradient of square norm of a vector we all J of theta is given by norm of Y minus exterior the whole square can I write this as a product our Y minus X theta times itself and in linear algebra this is how I will write it so for if you look at the first equation gradient of J is equal to gradient of Y - X theta transpose times y minus X theta going further now we will multiply these two expressions individually so gradient of J is gradient of Y transpose Y - X theta transpose Y - y transpose X theta - theta transpose times X transpose X times Taylor now Y transpose Y is a constant with respect to theta so this term disappears and the next two terms are equal because transpose of a scalar GA is scalar right so if you have a scalar and if you take transpose and a scalar and the final term has quadratic form and the general rule is X transpose times a X is equal to a transpose X plus ax but because product of a matrix with itself is always symmetric we can write X transpose ax is equal to 2 ax and these results will help us simplify this about equation further ok now if if you're getting confused what I'm essentially saying is y transpose Y is a constant with respect to theta so this term is not does not matter and differentiation of a constant is zero right the gradient is essentially taking the differentiation here and then you see this next two terms and they are both equal right if you if you look at them they're both equal because transpose of scalar J is scalar so both of them can be said that they're equal and eventually the third term has a quadratic form and if you look at that it follows the general rule X transpose ax is equal to a transpose X plus ax here X transpose straight a transpose a is X transpose X and X is Theta so that will be broken into two times a X which is two times X transpose X times theta which we will see further so like we discussed earlier we can now write the gradient as gradient of J is minus two X transpose y plus two x transpose x ta da this is coming from the previous equation just simplifying the previous equation and add the minimum we've discussed this earlier at the minimum the gradient in all directions is zero therefore for the minimum case we have zero equal to minus two X transpose y plus two X transpose X T because we said at minimum point the gradient is zero and now we are saying we want to find that point so zero is equal to minus two X transpose y plus two X transpose X tera doing some reshuffling here to X transpose X theta is equal to two X transpose Y X transpose X theta is equal to X transpose Y and this is the so-called normal equation this is not something that we put out initially but this is the normal equation that will allow us to find theta this is a very important result that we have reached right from where we started and this is important because this equation is in terms of only X and y if you look at this equation you will see that there is X theta and Y and we want to find theta right we've almost reached the end we've almost reached the end we have a way to find theta now but how are we going to solve this how are we actually going to find theta we can find theta by substituting and eventually finding X and in just substituting x and y in finding theta but that's an even simpler way and that is since we already know x and y we can just plug in X&Y dissolve for teyla but there's a faster and more numerically stable way using the in algebra techniques and this is what we will discuss in the next few slides and this is called QR decomposition and in linear algebra a matrix a can be decomposed or factorized as a is equal to QR where Q is an orthogonal matrix and R is an upper triangular matrix okay so if you have a matrix a you can represent that as a product of two matrix this is very similar to the factorization we learn in our high school in the middle school or high school right we are trying to find the factors of this matrix a Q is an orthogonal matrix what it means is that its inverse exists or if you multiply Q with its transpose or inverse you would get an identity matrix and R is an upper triangular matrix we will see what an upper triangular matrix is as we go but why do we use this technique because it's easier for computation so if you have an equation like ax equal to B where you would like to find X then what you could do is compute a is equal to Q are first and then write the problem as our X is equal to Q inverse B which is very easy to solve given Q is orthogonal therefore Q inverse is equal to Q transpose and R is an upper triangular matrix which helps to solve the equation use it using a technique called back substitution consider the following matrix if you see this matrix like I said we will look into the upper triangular matrix in the future this the matrix on the left with elements 4 2 6 0 2 2 0 0 8 it's an upper triangular matrix where all the elements below the diagonal are 0 and essentially what is happening here as you see this matrix R times X is equal to Q inverse B and using this you are trying to find X okay and how do we do this on multiplying the matrices we will get 4 times X 1 plus 2 times X 2 plus 6 times X 3 is equal to 4 2 times X 2 plus 2 times X 3 is equal to 4 8 X 3 is equal to 16 we solve the third equation first and we get X 3 is equal to 2 very simple and this is wrong and this is why it is advantageous to have R as an upper triangular matrix alright and we stop basically after we find X 3 we substitute the value for x 2 in the second equation we get X 2 equal to 2 X 2 equal to 0 similarly we can substitute X 2 and X 3 to get X 1 is equal to minus 2 now we have X 1 X 2 and X 3 now we're coming to the linear regression part of what we wanted to solve here almost at the end recall that we wanted to solve the following equation X transpose times X data is equal to X transpose Y and we discussed earlier but we could decompose a matrix a is equal to QR using QR decomposition now in this case let's now say that QR is a QR decomposition of X in that case we will get Q transpose times R transpose into Q R into theta is equal to QR transpose times y essentially what we're doing is represented so essentially what we are doing is replacing X which you are because we used the value of x to find Q and R right so our transpose times Q transpose Q our Catera is equal to R transpose Q transpose Y since Q is orthogonal transpose q is equal to I identity matrix therefore we will get R transpose R theta is equal to R transpose Q transpose Y R theta is equal to Q transpose Y we already have the values of R 2 and not here it should be Y sorry RQ and y we plug it in the above equation to find theta yes that's it we found him and with this we come to an end that's it folks we've successfully managed to find the coefficients Taylor given a certain X comma Y using statistical deductions and linear algebra
Original Description
In this video, we will go further into one of the most common technique used to solve regression problems in Machine Learning - Linear Regression. This is a continuation from the previous video. Please do watch the first two parts before you watch this. Links to the previous videos:
Part I - https://www.youtube.com/watch?v=qZufGTepWqE
Part II - https://www.youtube.com/watch?v=wO_y-nTIBHU
This video will serve as an easy reference and an introduction to understanding the mathematical background for the Linear Regression technique. Please do give it a thumbs up if you found the video useful and stay tuned for the next part!
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Imaad Mohamed Khan · Imaad Mohamed Khan · 10 of 34
1
2
3
4
5
6
7
8
9
▶
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Does AI know Fashion? - Mitali Sodhi - Mantissa Data Science Meetups
Imaad Mohamed Khan
Mantissa Data Science Webinar - 1 with Santhosh Shetty
Imaad Mohamed Khan
Recommender Systems - Imaad Mohamed Khan - Mantissa Data Science Meetups
Imaad Mohamed Khan
Data Science is more than just Data Scientist - Different Roles in the field of Data Science
Imaad Mohamed Khan
What topics to prepare for Data Science Interviews in 2020?
Imaad Mohamed Khan
Programming as a human activity
Imaad Mohamed Khan
What are the languages or tools used by Data Scientists in their work?
Imaad Mohamed Khan
Linear Regression From Scratch - Part 1
Imaad Mohamed Khan
Linear Regression From Scratch - Part 2
Imaad Mohamed Khan
Linear Regression From Scratch - Part 3
Imaad Mohamed Khan
Journey into Data Science - Fireside chat with Adarsha and Karthikeyan
Imaad Mohamed Khan
Off the ground - Python in 5 Steps
Imaad Mohamed Khan
How LinkedIn uses Data Science to build your feed - LinkedIn Feed Algorithm Explained
Imaad Mohamed Khan
Fireside chat with Eric Weber - Learnings in Data Science
Imaad Mohamed Khan
Part 2 - How LinkedIn uses Data Science to build your feed | LinkedIn Feed Algorithm Explained
Imaad Mohamed Khan
Using Streamlit's Share Feature to easily deploy (and share) videos using Github
Imaad Mohamed Khan
Airbnb Experiences Ranking Algorithm Explained - Part I
Imaad Mohamed Khan
Airbnb Experiences Ranking Algorithm Explained - Part II
Imaad Mohamed Khan
Airbnb Experiences Ranking Algorithm Explained - Part III
Imaad Mohamed Khan
Big Data, Hadoop and Machine Learning Explained using Dams
Imaad Mohamed Khan
Fireside Chat with Hiromu Hota - Transitioning from Research to Industry
Imaad Mohamed Khan
Introduction to Anomaly Detection and One Class Classification
Imaad Mohamed Khan
Reading and manipulating Google Sheets (GSheets) using Python libraries
Imaad Mohamed Khan
Writing to Google Sheets (GSheets) using Python libraries
Imaad Mohamed Khan
Fireside Chat with Mirza Rahim Baig - Business Problem Solving and Data Science Career Tips
Imaad Mohamed Khan
Six types of Data Analysis you will do as a Data Scientist
Imaad Mohamed Khan
Automatic Speech Recognition (ASR) with Facebook AI's wav2vec 2.0 model using Huggingface
Imaad Mohamed Khan
9 Anti-patterns to avoid MLOps mistakes
Imaad Mohamed Khan
8 pitfalls to avoid while using Machine Learning Interpretation Techniques (SHAP, PDP, LIME, PFI)
Imaad Mohamed Khan
Fireside Chat with Shadab Khan - AI in Healthcare and Data Science Career Tips
Imaad Mohamed Khan
Features and Feature Engineering in Machine Learning - An Introduction
Imaad Mohamed Khan
Building your own AI text generation tool with aitextgen using GPT-2/GPT-3
Imaad Mohamed Khan
Organising Data Science projects using CRISP-DM
Imaad Mohamed Khan
Introduction to Prompt Engineering
Imaad Mohamed Khan
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
How to Learn a Hard Technical Skill Without Burning Out
Dev.to · Anas Kalthoum | FreeBrain
After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.
Medium · Machine Learning
How AI Learns with Less Labeled Data
Medium · Machine Learning
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
🎓
Tutor Explanation
DeepCamp AI