Linear Regression From Scratch - Part 3

Imaad Mohamed Khan · Beginner ·📐 ML Fundamentals ·5y ago

Skills: ML Maths Basics90%Supervised Learning80%

Key Takeaways

Linear Regression is implemented from scratch using gradient descent and QR decomposition, with a focus on mathematical deductions and linear algebra.

Full Transcript

hey everyone my name is amal and welcome to a series of videos very very will be implementing linear regression from scratch this is the third video of the series of videos on linear regression from scratch let's go right into it okay we need to essentially minimize J with respect to theta in this equation J of theta is equal to norm of Y minus X theta whole square this is a continuous convex and quadratic function and therefore will have a global minima when it's a continuous convex and quadratic function I mean it's a function that looks somewhat like this it is continuous everywhere it is convex because it has a bulge at towards en it looks like a bowel and quadratic because it has a quadratic part and therefore it will have a global minimum and if you look at the figure here the global minima is clearly evident and that is what we want to find we want to find the point where we can minimize this equation so we find the minimum by finding that one place where gradient is 0 in all directions how do we find the minimum of this function or this equation is by finding that one place where gradient is 0 so gradient is a concept and calculus we will not get into what gradient is what differentiation is all of that but if you want a refresher maybe you can just go and search gradients in calculus and you will get what happens we will see some aspects of differentiation here but we are not going to go into much detail now let's get started how does one take gradient of square norm of a vector we all J of theta is given by norm of Y minus exterior the whole square can I write this as a product our Y minus X theta times itself and in linear algebra this is how I will write it so for if you look at the first equation gradient of J is equal to gradient of Y - X theta transpose times y minus X theta going further now we will multiply these two expressions individually so gradient of J is gradient of Y transpose Y - X theta transpose Y - y transpose X theta - theta transpose times X transpose X times Taylor now Y transpose Y is a constant with respect to theta so this term disappears and the next two terms are equal because transpose of a scalar GA is scalar right so if you have a scalar and if you take transpose and a scalar and the final term has quadratic form and the general rule is X transpose times a X is equal to a transpose X plus ax but because product of a matrix with itself is always symmetric we can write X transpose ax is equal to 2 ax and these results will help us simplify this about equation further ok now if if you're getting confused what I'm essentially saying is y transpose Y is a constant with respect to theta so this term is not does not matter and differentiation of a constant is zero right the gradient is essentially taking the differentiation here and then you see this next two terms and they are both equal right if you if you look at them they're both equal because transpose of scalar J is scalar so both of them can be said that they're equal and eventually the third term has a quadratic form and if you look at that it follows the general rule X transpose ax is equal to a transpose X plus ax here X transpose straight a transpose a is X transpose X and X is Theta so that will be broken into two times a X which is two times X transpose X times theta which we will see further so like we discussed earlier we can now write the gradient as gradient of J is minus two X transpose y plus two x transpose x ta da this is coming from the previous equation just simplifying the previous equation and add the minimum we've discussed this earlier at the minimum the gradient in all directions is zero therefore for the minimum case we have zero equal to minus two X transpose y plus two X transpose X T because we said at minimum point the gradient is zero and now we are saying we want to find that point so zero is equal to minus two X transpose y plus two X transpose X tera doing some reshuffling here to X transpose X theta is equal to two X transpose Y X transpose X theta is equal to X transpose Y and this is the so-called normal equation this is not something that we put out initially but this is the normal equation that will allow us to find theta this is a very important result that we have reached right from where we started and this is important because this equation is in terms of only X and y if you look at this equation you will see that there is X theta and Y and we want to find theta right we've almost reached the end we've almost reached the end we have a way to find theta now but how are we going to solve this how are we actually going to find theta we can find theta by substituting and eventually finding X and in just substituting x and y in finding theta but that's an even simpler way and that is since we already know x and y we can just plug in X&Y dissolve for teyla but there's a faster and more numerically stable way using the in algebra techniques and this is what we will discuss in the next few slides and this is called QR decomposition and in linear algebra a matrix a can be decomposed or factorized as a is equal to QR where Q is an orthogonal matrix and R is an upper triangular matrix okay so if you have a matrix a you can represent that as a product of two matrix this is very similar to the factorization we learn in our high school in the middle school or high school right we are trying to find the factors of this matrix a Q is an orthogonal matrix what it means is that its inverse exists or if you multiply Q with its transpose or inverse you would get an identity matrix and R is an upper triangular matrix we will see what an upper triangular matrix is as we go but why do we use this technique because it's easier for computation so if you have an equation like ax equal to B where you would like to find X then what you could do is compute a is equal to Q are first and then write the problem as our X is equal to Q inverse B which is very easy to solve given Q is orthogonal therefore Q inverse is equal to Q transpose and R is an upper triangular matrix which helps to solve the equation use it using a technique called back substitution consider the following matrix if you see this matrix like I said we will look into the upper triangular matrix in the future this the matrix on the left with elements 4 2 6 0 2 2 0 0 8 it's an upper triangular matrix where all the elements below the diagonal are 0 and essentially what is happening here as you see this matrix R times X is equal to Q inverse B and using this you are trying to find X okay and how do we do this on multiplying the matrices we will get 4 times X 1 plus 2 times X 2 plus 6 times X 3 is equal to 4 2 times X 2 plus 2 times X 3 is equal to 4 8 X 3 is equal to 16 we solve the third equation first and we get X 3 is equal to 2 very simple and this is wrong and this is why it is advantageous to have R as an upper triangular matrix alright and we stop basically after we find X 3 we substitute the value for x 2 in the second equation we get X 2 equal to 2 X 2 equal to 0 similarly we can substitute X 2 and X 3 to get X 1 is equal to minus 2 now we have X 1 X 2 and X 3 now we're coming to the linear regression part of what we wanted to solve here almost at the end recall that we wanted to solve the following equation X transpose times X data is equal to X transpose Y and we discussed earlier but we could decompose a matrix a is equal to QR using QR decomposition now in this case let's now say that QR is a QR decomposition of X in that case we will get Q transpose times R transpose into Q R into theta is equal to QR transpose times y essentially what we're doing is represented so essentially what we are doing is replacing X which you are because we used the value of x to find Q and R right so our transpose times Q transpose Q our Catera is equal to R transpose Q transpose Y since Q is orthogonal transpose q is equal to I identity matrix therefore we will get R transpose R theta is equal to R transpose Q transpose Y R theta is equal to Q transpose Y we already have the values of R 2 and not here it should be Y sorry RQ and y we plug it in the above equation to find theta yes that's it we found him and with this we come to an end that's it folks we've successfully managed to find the coefficients Taylor given a certain X comma Y using statistical deductions and linear algebra

Original Description

In this video, we will go further into one of the most common technique used to solve regression problems in Machine Learning - Linear Regression. This is a continuation from the previous video. Please do watch the first two parts before you watch this. Links to the previous videos: Part I - https://www.youtube.com/watch?v=qZufGTepWqE Part II - https://www.youtube.com/watch?v=wO_y-nTIBHU This video will serve as an easy reference and an introduction to understanding the mathematical background for the Linear Regression technique. Please do give it a thumbs up if you found the video useful and stay tuned for the next part!

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Imaad Mohamed Khan · Imaad Mohamed Khan · 10 of 34

← Previous Next →

Does AI know Fashion? - Mitali Sodhi - Mantissa Data Science Meetups

Does AI know Fashion? - Mitali Sodhi - Mantissa Data Science Meetups

Imaad Mohamed Khan

Mantissa Data Science Webinar - 1 with Santhosh Shetty

Mantissa Data Science Webinar - 1 with Santhosh Shetty

Imaad Mohamed Khan

Recommender Systems - Imaad Mohamed Khan - Mantissa Data Science Meetups

Recommender Systems - Imaad Mohamed Khan - Mantissa Data Science Meetups

Imaad Mohamed Khan

Data Science is more than just Data Scientist - Different Roles in the field of Data Science

Data Science is more than just Data Scientist - Different Roles in the field of Data Science

Imaad Mohamed Khan

What topics to prepare for Data Science Interviews in 2020?

What topics to prepare for Data Science Interviews in 2020?

Imaad Mohamed Khan

Programming as a human activity

Programming as a human activity

Imaad Mohamed Khan

What are the languages or tools used by Data Scientists in their work?

What are the languages or tools used by Data Scientists in their work?

Imaad Mohamed Khan

Linear Regression From Scratch - Part 1

Linear Regression From Scratch - Part 1

Imaad Mohamed Khan

Linear Regression From Scratch - Part 2

Linear Regression From Scratch - Part 2

Imaad Mohamed Khan

Linear Regression From Scratch - Part 3

Linear Regression From Scratch - Part 3

Imaad Mohamed Khan

Journey into Data Science - Fireside chat with Adarsha and Karthikeyan

Journey into Data Science - Fireside chat with Adarsha and Karthikeyan

Imaad Mohamed Khan

Off the ground - Python in 5 Steps

Off the ground - Python in 5 Steps

Imaad Mohamed Khan

How LinkedIn uses Data Science to build your feed - LinkedIn Feed Algorithm Explained

How LinkedIn uses Data Science to build your feed - LinkedIn Feed Algorithm Explained

Imaad Mohamed Khan

Fireside chat with Eric Weber - Learnings in Data Science

Fireside chat with Eric Weber - Learnings in Data Science

Imaad Mohamed Khan

Part 2 - How LinkedIn uses Data Science to build your feed | LinkedIn Feed Algorithm Explained

Part 2 - How LinkedIn uses Data Science to build your feed | LinkedIn Feed Algorithm Explained

Imaad Mohamed Khan

Using Streamlit's Share Feature to easily deploy (and share) videos using Github

Using Streamlit's Share Feature to easily deploy (and share) videos using Github

Imaad Mohamed Khan

Airbnb Experiences Ranking Algorithm Explained - Part I

Airbnb Experiences Ranking Algorithm Explained - Part I

Imaad Mohamed Khan

Airbnb Experiences Ranking Algorithm Explained - Part II

Airbnb Experiences Ranking Algorithm Explained - Part II

Imaad Mohamed Khan

Airbnb Experiences Ranking Algorithm Explained - Part III

Airbnb Experiences Ranking Algorithm Explained - Part III

Imaad Mohamed Khan

Big Data, Hadoop and Machine Learning Explained using Dams

Big Data, Hadoop and Machine Learning Explained using Dams

Imaad Mohamed Khan

Fireside Chat with Hiromu Hota - Transitioning from Research to Industry

Fireside Chat with Hiromu Hota - Transitioning from Research to Industry

Imaad Mohamed Khan

Introduction to Anomaly Detection and One Class Classification

Introduction to Anomaly Detection and One Class Classification

Imaad Mohamed Khan

Reading and manipulating Google Sheets (GSheets) using Python libraries

Reading and manipulating Google Sheets (GSheets) using Python libraries

Imaad Mohamed Khan

Writing to Google Sheets (GSheets) using Python libraries

Writing to Google Sheets (GSheets) using Python libraries

Imaad Mohamed Khan

Fireside Chat with Mirza Rahim Baig - Business Problem Solving and Data Science Career Tips

Fireside Chat with Mirza Rahim Baig - Business Problem Solving and Data Science Career Tips

Imaad Mohamed Khan

Six types of Data Analysis you will do as a Data Scientist

Six types of Data Analysis you will do as a Data Scientist

Imaad Mohamed Khan

Automatic Speech Recognition (ASR) with Facebook AI's wav2vec 2.0 model using Huggingface

Automatic Speech Recognition (ASR) with Facebook AI's wav2vec 2.0 model using Huggingface

Imaad Mohamed Khan

9 Anti-patterns to avoid MLOps mistakes

9 Anti-patterns to avoid MLOps mistakes

Imaad Mohamed Khan

8 pitfalls to avoid while using Machine Learning Interpretation Techniques (SHAP, PDP, LIME, PFI)

8 pitfalls to avoid while using Machine Learning Interpretation Techniques (SHAP, PDP, LIME, PFI)

Imaad Mohamed Khan

Fireside Chat with Shadab Khan - AI in Healthcare and Data Science Career Tips

Fireside Chat with Shadab Khan - AI in Healthcare and Data Science Career Tips

Imaad Mohamed Khan

Features and Feature Engineering in Machine Learning - An Introduction

Features and Feature Engineering in Machine Learning - An Introduction

Imaad Mohamed Khan

Building your own AI text generation tool with aitextgen using GPT-2/GPT-3

Building your own AI text generation tool with aitextgen using GPT-2/GPT-3

Imaad Mohamed Khan

Organising Data Science projects using CRISP-DM

Organising Data Science projects using CRISP-DM

Imaad Mohamed Khan

Introduction to Prompt Engineering

Introduction to Prompt Engineering

Imaad Mohamed Khan

This video teaches how to implement Linear Regression from scratch using gradient descent and QR decomposition, covering the mathematical deductions and linear algebra involved. It's a continuation of the previous videos and provides a hands-on approach to solving regression problems. By watching this video, viewers will gain a deeper understanding of Linear Regression and its applications in Machine Learning.

Key Takeaways

Find the minimum of the function by finding the point where the gradient is 0 in all directions
Decompose matrix X into Q and R using QR decomposition
Replace X with Q and R in the equation X transpose X = X transpose Y
Solve for theta using R transpose Y and Q transpose Y
Find coefficients theta using statistical deductions and linear algebra

💡 Using QR decomposition can provide a more efficient and stable way to solve linear regression equations, especially for large datasets.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Related AI Lessons

How to Learn a Hard Technical Skill Without Burning Out

Learn how to acquire hard technical skills without burnout by creating a sustainable learning plan

Dev.to · Anas Kalthoum | FreeBrain

After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.

Learn what makes a standout ML candidate after interviewing over 100 applicants

Medium · Machine Learning

How AI Learns with Less Labeled Data

Discover how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Medium · Machine Learning

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2

Learn the basics of the TypeScript compiler to write better JavaScript code

Medium · JavaScript

Learn Deep Learning by Hand (Beginner's Guide - Part 1)