Linear Regression From Scratch - Part 3

Imaad Mohamed Khan · Beginner ·📐 ML Fundamentals ·5y ago

Key Takeaways

Linear Regression is implemented from scratch using gradient descent and QR decomposition, with a focus on mathematical deductions and linear algebra.

Full Transcript

hey everyone my name is amal and welcome to a series of videos very very will be implementing linear regression from scratch this is the third video of the series of videos on linear regression from scratch let's go right into it okay we need to essentially minimize J with respect to theta in this equation J of theta is equal to norm of Y minus X theta whole square this is a continuous convex and quadratic function and therefore will have a global minima when it's a continuous convex and quadratic function I mean it's a function that looks somewhat like this it is continuous everywhere it is convex because it has a bulge at towards en it looks like a bowel and quadratic because it has a quadratic part and therefore it will have a global minimum and if you look at the figure here the global minima is clearly evident and that is what we want to find we want to find the point where we can minimize this equation so we find the minimum by finding that one place where gradient is 0 in all directions how do we find the minimum of this function or this equation is by finding that one place where gradient is 0 so gradient is a concept and calculus we will not get into what gradient is what differentiation is all of that but if you want a refresher maybe you can just go and search gradients in calculus and you will get what happens we will see some aspects of differentiation here but we are not going to go into much detail now let's get started how does one take gradient of square norm of a vector we all J of theta is given by norm of Y minus exterior the whole square can I write this as a product our Y minus X theta times itself and in linear algebra this is how I will write it so for if you look at the first equation gradient of J is equal to gradient of Y - X theta transpose times y minus X theta going further now we will multiply these two expressions individually so gradient of J is gradient of Y transpose Y - X theta transpose Y - y transpose X theta - theta transpose times X transpose X times Taylor now Y transpose Y is a constant with respect to theta so this term disappears and the next two terms are equal because transpose of a scalar GA is scalar right so if you have a scalar and if you take transpose and a scalar and the final term has quadratic form and the general rule is X transpose times a X is equal to a transpose X plus ax but because product of a matrix with itself is always symmetric we can write X transpose ax is equal to 2 ax and these results will help us simplify this about equation further ok now if if you're getting confused what I'm essentially saying is y transpose Y is a constant with respect to theta so this term is not does not matter and differentiation of a constant is zero right the gradient is essentially taking the differentiation here and then you see this next two terms and they are both equal right if you if you look at them they're both equal because transpose of scalar J is scalar so both of them can be said that they're equal and eventually the third term has a quadratic form and if you look at that it follows the general rule X transpose ax is equal to a transpose X plus ax here X transpose straight a transpose a is X transpose X and X is Theta so that will be broken into two times a X which is two times X transpose X times theta which we will see further so like we discussed earlier we can now write the gradient as gradient of J is minus two X transpose y plus two x transpose x ta da this is coming from the previous equation just simplifying the previous equation and add the minimum we've discussed this earlier at the minimum the gradient in all directions is zero therefore for the minimum case we have zero equal to minus two X transpose y plus two X transpose X T because we said at minimum point the gradient is zero and now we are saying we want to find that point so zero is equal to minus two X transpose y plus two X transpose X tera doing some reshuffling here to X transpose X theta is equal to two X transpose Y X transpose X theta is equal to X transpose Y and this is the so-called normal equation this is not something that we put out initially but this is the normal equation that will allow us to find theta this is a very important result that we have reached right from where we started and this is important because this equation is in terms of only X and y if you look at this equation you will see that there is X theta and Y and we want to find theta right we've almost reached the end we've almost reached the end we have a way to find theta now but how are we going to solve this how are we actually going to find theta we can find theta by substituting and eventually finding X and in just substituting x and y in finding theta but that's an even simpler way and that is since we already know x and y we can just plug in X&Y dissolve for teyla but there's a faster and more numerically stable way using the in algebra techniques and this is what we will discuss in the next few slides and this is called QR decomposition and in linear algebra a matrix a can be decomposed or factorized as a is equal to QR where Q is an orthogonal matrix and R is an upper triangular matrix okay so if you have a matrix a you can represent that as a product of two matrix this is very similar to the factorization we learn in our high school in the middle school or high school right we are trying to find the factors of this matrix a Q is an orthogonal matrix what it means is that its inverse exists or if you multiply Q with its transpose or inverse you would get an identity matrix and R is an upper triangular matrix we will see what an upper triangular matrix is as we go but why do we use this technique because it's easier for computation so if you have an equation like ax equal to B where you would like to find X then what you could do is compute a is equal to Q are first and then write the problem as our X is equal to Q inverse B which is very easy to solve given Q is orthogonal therefore Q inverse is equal to Q transpose and R is an upper triangular matrix which helps to solve the equation use it using a technique called back substitution consider the following matrix if you see this matrix like I said we will look into the upper triangular matrix in the future this the matrix on the left with elements 4 2 6 0 2 2 0 0 8 it's an upper triangular matrix where all the elements below the diagonal are 0 and essentially what is happening here as you see this matrix R times X is equal to Q inverse B and using this you are trying to find X okay and how do we do this on multiplying the matrices we will get 4 times X 1 plus 2 times X 2 plus 6 times X 3 is equal to 4 2 times X 2 plus 2 times X 3 is equal to 4 8 X 3 is equal to 16 we solve the third equation first and we get X 3 is equal to 2 very simple and this is wrong and this is why it is advantageous to have R as an upper triangular matrix alright and we stop basically after we find X 3 we substitute the value for x 2 in the second equation we get X 2 equal to 2 X 2 equal to 0 similarly we can substitute X 2 and X 3 to get X 1 is equal to minus 2 now we have X 1 X 2 and X 3 now we're coming to the linear regression part of what we wanted to solve here almost at the end recall that we wanted to solve the following equation X transpose times X data is equal to X transpose Y and we discussed earlier but we could decompose a matrix a is equal to QR using QR decomposition now in this case let's now say that QR is a QR decomposition of X in that case we will get Q transpose times R transpose into Q R into theta is equal to QR transpose times y essentially what we're doing is represented so essentially what we are doing is replacing X which you are because we used the value of x to find Q and R right so our transpose times Q transpose Q our Catera is equal to R transpose Q transpose Y since Q is orthogonal transpose q is equal to I identity matrix therefore we will get R transpose R theta is equal to R transpose Q transpose Y R theta is equal to Q transpose Y we already have the values of R 2 and not here it should be Y sorry RQ and y we plug it in the above equation to find theta yes that's it we found him and with this we come to an end that's it folks we've successfully managed to find the coefficients Taylor given a certain X comma Y using statistical deductions and linear algebra

Original Description

In this video, we will go further into one of the most common technique used to solve regression problems in Machine Learning - Linear Regression. This is a continuation from the previous video. Please do watch the first two parts before you watch this. Links to the previous videos: Part I - https://www.youtube.com/watch?v=qZufGTepWqE Part II - https://www.youtube.com/watch?v=wO_y-nTIBHU This video will serve as an easy reference and an introduction to understanding the mathematical background for the Linear Regression technique. Please do give it a thumbs up if you found the video useful and stay tuned for the next part!
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Imaad Mohamed Khan · Imaad Mohamed Khan · 10 of 34

1 Does AI know Fashion? - Mitali Sodhi - Mantissa Data Science Meetups
Does AI know Fashion? - Mitali Sodhi - Mantissa Data Science Meetups
Imaad Mohamed Khan
2 Mantissa Data Science Webinar - 1 with Santhosh Shetty
Mantissa Data Science Webinar - 1 with Santhosh Shetty
Imaad Mohamed Khan
3 Recommender Systems -  Imaad Mohamed Khan - Mantissa Data Science Meetups
Recommender Systems - Imaad Mohamed Khan - Mantissa Data Science Meetups
Imaad Mohamed Khan
4 Data Science is more than just Data Scientist - Different Roles in the field of Data Science
Data Science is more than just Data Scientist - Different Roles in the field of Data Science
Imaad Mohamed Khan
5 What topics to prepare for Data Science Interviews in 2020?
What topics to prepare for Data Science Interviews in 2020?
Imaad Mohamed Khan
6 Programming as a human activity
Programming as a human activity
Imaad Mohamed Khan
7 What are the languages or tools used by Data Scientists in their work?
What are the languages or tools used by Data Scientists in their work?
Imaad Mohamed Khan
8 Linear Regression From Scratch - Part 1
Linear Regression From Scratch - Part 1
Imaad Mohamed Khan
9 Linear Regression From Scratch - Part 2
Linear Regression From Scratch - Part 2
Imaad Mohamed Khan
Linear Regression From Scratch - Part 3
Linear Regression From Scratch - Part 3
Imaad Mohamed Khan
11 Journey into Data Science - Fireside chat with Adarsha and Karthikeyan
Journey into Data Science - Fireside chat with Adarsha and Karthikeyan
Imaad Mohamed Khan
12 Off the ground - Python in 5 Steps
Off the ground - Python in 5 Steps
Imaad Mohamed Khan
13 How LinkedIn uses Data Science to build your feed - LinkedIn Feed Algorithm Explained
How LinkedIn uses Data Science to build your feed - LinkedIn Feed Algorithm Explained
Imaad Mohamed Khan
14 Fireside chat with Eric Weber - Learnings in Data Science
Fireside chat with Eric Weber - Learnings in Data Science
Imaad Mohamed Khan
15 Part 2 - How LinkedIn uses Data Science to build your feed | LinkedIn Feed Algorithm Explained
Part 2 - How LinkedIn uses Data Science to build your feed | LinkedIn Feed Algorithm Explained
Imaad Mohamed Khan
16 Using Streamlit's Share Feature to easily deploy (and share) videos using Github
Using Streamlit's Share Feature to easily deploy (and share) videos using Github
Imaad Mohamed Khan
17 Airbnb Experiences Ranking Algorithm Explained - Part I
Airbnb Experiences Ranking Algorithm Explained - Part I
Imaad Mohamed Khan
18 Airbnb Experiences Ranking Algorithm Explained - Part II
Airbnb Experiences Ranking Algorithm Explained - Part II
Imaad Mohamed Khan
19 Airbnb Experiences Ranking Algorithm Explained - Part III
Airbnb Experiences Ranking Algorithm Explained - Part III
Imaad Mohamed Khan
20 Big Data, Hadoop and Machine Learning Explained using Dams
Big Data, Hadoop and Machine Learning Explained using Dams
Imaad Mohamed Khan
21 Fireside Chat with Hiromu Hota - Transitioning from Research to Industry
Fireside Chat with Hiromu Hota - Transitioning from Research to Industry
Imaad Mohamed Khan
22 Introduction to Anomaly Detection and One Class Classification
Introduction to Anomaly Detection and One Class Classification
Imaad Mohamed Khan
23 Reading and manipulating Google Sheets (GSheets) using Python libraries
Reading and manipulating Google Sheets (GSheets) using Python libraries
Imaad Mohamed Khan
24 Writing to Google Sheets (GSheets) using Python libraries
Writing to Google Sheets (GSheets) using Python libraries
Imaad Mohamed Khan
25 Fireside Chat with Mirza Rahim Baig - Business Problem Solving and Data Science Career Tips
Fireside Chat with Mirza Rahim Baig - Business Problem Solving and Data Science Career Tips
Imaad Mohamed Khan
26 Six types of Data Analysis you will do as a Data Scientist
Six types of Data Analysis you will do as a Data Scientist
Imaad Mohamed Khan
27 Automatic Speech Recognition (ASR) with Facebook AI's wav2vec 2.0 model using Huggingface
Automatic Speech Recognition (ASR) with Facebook AI's wav2vec 2.0 model using Huggingface
Imaad Mohamed Khan
28 9 Anti-patterns to avoid MLOps mistakes
9 Anti-patterns to avoid MLOps mistakes
Imaad Mohamed Khan
29 8 pitfalls to avoid while using Machine Learning Interpretation Techniques (SHAP, PDP, LIME, PFI)
8 pitfalls to avoid while using Machine Learning Interpretation Techniques (SHAP, PDP, LIME, PFI)
Imaad Mohamed Khan
30 Fireside Chat with Shadab Khan - AI in Healthcare and Data Science Career Tips
Fireside Chat with Shadab Khan - AI in Healthcare and Data Science Career Tips
Imaad Mohamed Khan
31 Features and Feature Engineering in Machine Learning - An Introduction
Features and Feature Engineering in Machine Learning - An Introduction
Imaad Mohamed Khan
32 Building your own AI text generation tool with aitextgen using GPT-2/GPT-3
Building your own AI text generation tool with aitextgen using GPT-2/GPT-3
Imaad Mohamed Khan
33 Organising Data Science projects using CRISP-DM
Organising Data Science projects using CRISP-DM
Imaad Mohamed Khan
34 Introduction to Prompt Engineering
Introduction to Prompt Engineering
Imaad Mohamed Khan

This video teaches how to implement Linear Regression from scratch using gradient descent and QR decomposition, covering the mathematical deductions and linear algebra involved. It's a continuation of the previous videos and provides a hands-on approach to solving regression problems. By watching this video, viewers will gain a deeper understanding of Linear Regression and its applications in Machine Learning.

Key Takeaways
  1. Find the minimum of the function by finding the point where the gradient is 0 in all directions
  2. Decompose matrix X into Q and R using QR decomposition
  3. Replace X with Q and R in the equation X transpose X = X transpose Y
  4. Solve for theta using R transpose Y and Q transpose Y
  5. Find coefficients theta using statistical deductions and linear algebra
💡 Using QR decomposition can provide a more efficient and stable way to solve linear regression equations, especially for large datasets.

Related AI Lessons

Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →