Linear Regression, Clearly Explained!!!

StatQuest with Josh Starmer · Beginner ·📄 Research Papers Explained ·8y ago

Skills: Research Methods90%Reading ML Papers80%ML Maths Basics70%

Key Takeaways

Linear regression is explained using least squares to fit a line to data, with R-squared and P-value calculations to determine the goodness of fit, and the process is demonstrated with steps to calculate and interpret these values, including the use of the F-distribution and degrees of freedom.

Full Transcript

sailing on a boat headed towards stat Quest join me on this boat let's go to stat Quest it's super cool hello and welcome to stat Quest stat Quest is brought to you by the friendly folks in the genetics department at the University of North Carolina at Chapel Hill today we're going to be talking about linear aggression AKA General linear models part one there's a lot of parts to linear models but it's a really cool and Powerful concept so let's get right down to it I promise you I have lots and lots of slides that talk about all the nitty-gritty details behind linear regression but first let's talk about the main ideas behind it the first thing you do in linear regression is use lease squares to fit a line to the data the second thing you do is calculate r squared lastly calculate a P value for R 2 there are lots of other little things that come up along the way but these are the three most important Concepts behind linear regression in the stat Quest fitting aligned to data we talked about fitting a line to data duh but let's do a quick review I'm going to introduce some new terminology in this part of the video so it's worth watching even if you've already seen the earlier stack Quest that said if you need more details check that stack Quest out for this review we're going to be talking about a data set where we took a bunch of mice and we measured their size and we measured their weight our goal is to use mouse weight as a way to predict Mouse size first draw a line through the data second measure the distance from the line to the data Square each distance and then add them up terminology alert the distance from the line to the data point is called a residual third rotate the line a little bit with the new line measure the residuals Square them and then sum up the squares now rotate the line a little bit more sum up the squared residuals it's etc etc etc we rotate and then sum up the squared residuals rotate then sum up the squared residuals just keep doing that after a bunch of rotations you can plot the sum of squared residuals and corresponding rotation so in this graph we have the sum of squared residuals on the Y AIS and the different rotations on the x-axis lastly you find the rotation that has the least sum of squares more details about how this is actually done in practice are provided in the stat Quest on fitting align to data so we see that this rotation is the one with the least squares so it will be the one to fit to the data this is our least squares rotation superimposed on the original data bam now we know why the method for fitting a line is called least Square now we fit a line to the data this is awesome here's the equation for the line least squares estimated two parameters a yaxis intercept and a slope since the slope is not zero it means that knowing a mouse's weight will help us make a guess about that Mouse's size how good is that guess calculating r s is the first step in determining how good that guess will be the stat Quest R 2 explained talks about you got it R SAR let's do a quick review I'm also going to introduce some additional terminology so it's worth watching this part of the video even if you've seen the original stat Quest on R squ first calculate the average amount size okay I've just shifted all the data points to the Y AIS to emphasize that at this point we are only interested in Mouse size here I've drawn a black line to show the average Mouse size bam now sum the squared residuals just like in least squares we measure the distance from the mean to the data point and Square it and then add those squares together terminology alert we'll call this SS mean for sum of squares around the mean note the sum of squares around the mean equals the data minus the mean squared the variation around the mean equals the data minus the mean squared divided by n n is the sample size in this case n equal 9 the shorthand notation is the variation around the mean equals the sum of squares around the mean divided by n the sample size another way to think about variance is as the average sum of squares per Mouse now go back to the original plot and sum up the squared residuals around our least squares fit we'll call This Ss fit for the sum of squares around the least squares fit the sum of squares around the least squares fit is the sum of the distances between the data and the line squared just like with the mean the variance around the fit is the distance between the line and the data squared divided by n the sample size the shorthand is the variation around the fitted line equals the sum of squares around the fitted line divided by n the sample size again we can think of the variation around the fit as the average of the sum of squares around the fit for each Mouse in general the variance of something equals the sum of squares divided by the number of those things in other words it's an average of sum of squares I mention this because it's going to come in handy in a a little bit so keep it in the back of your mind okay let's step back a little bit this is the raw variation in Mouse size and this is the variation around the least squares line there is less variation around the line that we fit by least squares that is to say the residuals are smaller as a result we say that some of the variation in Mouse size is explained by taking mouse weight into account in other words heavier mice are bigger lighter mice are smaller R squar tells us how much of the variation in Mouse size can be explained by taking mouse weight into account this is the formula for R 2 it's the variation around the mean minus the variation around the fit divided by the variation around the mean let's look at an example in this example the variation around the mean equals 11.1 and the variation around the fit equals 4.4 so we plug those numbers into the equation the result is that R2 equals 0.6 which is the same thing as saying 60% this means there is a 60% reduction in the variance when we take the mouse weight into account alternatively we can say that mouse weight explains 60% of the variation in Mouse size we can also use the sum of squares to make the same calculation this is because when we're talking about variation everything's divided by n the sample size since everything's scaled by n we can pull that term out and just use the raw sum of squares in this case the sum of squares around the mean equals 100 and the sum of squares around the fit equals 40 plugging those numbers into the equation gives us the same value we had before r^2 = 0.6 which equals 60% 60% of the sums of squares of the mouse size can be explained by mouse weight here's another example we're also going to go back to using variation in the calculation since that's more common in this case knowing mouse weight means you can make a perfect prediction of mouse size the variation around the mean is the same as it was before 11.1 but now the variation around the fitted line equals zero because there are no residuals plugging the numbers in gives us an R 2 equal to 1 which equals 100% in this case Mouse w explains 100% of the variation in Mouse size okay one last example in this case knowing mouse weight doesn't help us predict Mouse size if someone tells us they have a heavy Mouse well that Mouse could either be small or large with equal probability similarly if someone said they had a light Mouse well again we wouldn't know if it was a big mouse or a small Mouse because each of those options is equally likely just like the other two examples the variation around the mean is equal 11.1 however in this case the variation around the fit is also equal 11.1 so we plug those numbers in and we get r^2 = 0 which equals 0% in this case mouse weight doesn't explain any of the variation around the mean when calculating the sum of squares around the mean we collapse the points onto the Y AIS just to emphasize the fact that we were ignoring mouse weight but we could just as easily draw a line y equals the mean Mouse size and calculate the sum of squares around the mean around that in this example we applied R 2 to a simple equation for a line y = 0.1 + 0.78 * X this gave us an R 2 of 60% meaning 60% of the variation in Mouse size could be explained by mouse weight but the concept applies to any equation no matter how complicated first you measure square and sum the distance from the data to the mean then measure square and sum the distance from the data to the complicated equation once you've got those two sums of squares just plug them in and you've got r squared let's look at a slightly more complicated example imagine we wanted to know if mouse weight and tail length did a good job predicting the length of the mouse's body so we measure a bunch of mice to plot this data we need a threedimensional graph we want to know how well weight and tail length predict body length the first Mouse we measured had weight equal 2.1 tail length equals 1.3 and body length equals 2.5 so that's how we plot this data on this 3D graph here's all the data in the graph the larger circles are points that are closer to us and represent mice that have shorter tails the smaller circles are points that are further from us and represent mice with longer Tails now we do a least squares fit since we have the extra term in the equation representing an extra Dimension we fit a plane instead of a line here's the equation for the plane the Y value represents body length least squares estimates three different parameters the first is the Y intercept that's when both tail length and mouse weight are equal to zero the second parameter 0.7 is for the mouse weight the last term 0.5 is for the tail length if we know a mouse's weight and tail length we can use the equation to guess the body length for example given the weight and tail length for this mouse the equation predicts this body length just like before we can measure the residuals Square them and then add them up to calculate R SAR now if the tail length or the Z axis is useless and doesn't make the sum of squares fit any smaller then Le squares will ignore it by making that parameter equal to zero in this case plugging the tail length into the equation would have no effect on predicting the mouse size this means equations with more parameters will never make the sum of squares around the fit worse than equations with fewer parameters in other words this equation Mouse size equals 0.3 plus mouse weight plus flip of a coin Plus plus favored color plus astrological sign plus extra stuff will never perform worse than this equation Mouse size equals 0.3 plus mouse weight this is because least squares will cause any term that makes some of squares around the fit worse to be multiplied by zero and in a sense no longer exist now due to random chance there is a small probability that the small mice in the data set might get heads more frequently than large mice if this happened then we'd get a smaller sum of squares fit and a better r squared here's the frowny face of sad times the more silly parameters we add to the equation the more opportunities we have for random events to reduce sum of squares fit and result in a better r squared thus people report an adjusted r squ value that in essence scales R squar by the number of parameters r s is awesome but it's missing something what if all we had were two measurements we'd calculate the sum of squares around the mean in this case that would be 10 then we'd calculate the sum of squares around the fit which equals zero the sum of squares around the fit equals zero because you can always draw a straight line to connect any two points what this means is when we calculate r s by plugging the numbers in we're going to get 100% 100% is a great number we've explained all the variation but any two random points will give us the exact same thing it doesn't actually mean anything we need a way to determine if the R 2 value is statistically significant we need a P value before we calculate the P value let's review the main Concepts behind r squar one last time the general equation for R 2 is the variance around the mean minus the variance around the fit divided by the variance around the mean in our example this means the variation in the mouse size minus the variation after taking weight into account divided by the variation in Mouse size in other words R2 equals the variation in Mouse size explained by weight divided by the variation in Mouse size without taking weight into account in this particular example R2 equals 0.6 meaning we saw a 60% reduction and variation once we took mouse weight into account now that we have a thorough understanding of the ideas behind r s let's talk about the main ideas behind calculating A P value for it the P value for R SAR comes from something called f f is equal to the variation in Mouse size explained by weight divided by the variation in Mouse size not explained by weight the numerators for R 2 and for f are the same that is to say it's the reduction in variance when we take the weight into account the denominator is a little different these dotted lines the residuals represent the variation that remains after fitting the Line This is the variation that is not explained by weight so together we have the variation in Mouse size explained by weight divided by the variation in Mouse size not explained by weight now let's look at the underlying mathematics just as a reminder here's the equation for R 2 this is the general equation that will tell us if R 2 is significant the meat of these two equations are very similar and rely on the same sums of squares like we said before theer numerators are the same in our Mouse size and weight example the numerator is the variation in Mouse size explained by weight and the sum of squares around the fit is just the residuals squared and summed up around the fitted line so that's the variation that the fit does not explain these numbers over here are the degrees of freedom they turn the sums of squares into variances I'm going to dedicate a whole stat quest to degrees of freedom but for now let's see if we can get an intuitive feel for what they're doing here let's start with these P fit is the number of parameters in the fit line here's the equation for the fit line in a general format we just have the Y intercept plus the slope time x the Y intercept and the slope are two separate parameters that means P fit equals 2 p mean is the number of parameters in the mean line in general that equation is y equal the Y intercept that's what gives us a horizontal line that cuts through the data in this case the Y intercept is the mean value this equation just has one parameter thus p mean equal 1 both equations have a parameter for the Y intercept however the fit line has one extra parameter the slope in our example this slope is the relationship between weight and size in this example P fit minus p mean = 2 - 1 which equal 1 the fit has one extra parameter mouse weight thus the numerator is the variance explained by the extra parameter in our example that's the variance in Mouse size explained by mouse weight if we had used mouse weight and tail length to explain variation in size then we would end up with an equation that had three parameters and P fit would equal three thus P fit minus p mean would equal 3 - 1 which equals 2 now the fit has two extra parameters mouse weight and tail length with the fancier equation for the fit the numerator is the variance in Mouse size explained by mouse weight and tail length now let's talk about the denominator for our equation for f the denominator is the Vari ation and mouse size not explained by the fit that is to say it's the sum of squares of the residuals that remain after we fit our new line to the data y divide sum of squares fit by n minus P fit instead of just n intuitively the more parameters you have in your equation the more data you need to estimate them for example you only need two points to estimate a line but you need three points to estimate a plane if the fit is good then the variation explained by the extra parameters in the fit will be a large number and the variation not explained by the extra parameters in the fit will be a small number that makes f a really large number now that question we've all been dying to know the answer to how do we turn this number into a p value conceptually generate a set of random data calculate the mean and the sum of squares around the main calculate the fit and the sum of squares around the fit now plug all those values into our equation for f and that will give us a number in this case that number is two now plot that number in a histogram now generate another set of random dat data calculate the mean and the sum of squares around the mean then calculate the fit and the sum of squares around the fit plug those values into our equation for f and in this case we get fals 3 so we then plug that value into our histogram and then we repeat with yet another set of random data in this case we got f equals 1 that's plotted on our histogram and we just keep generating more and more random data sets calculating the sums of squares plugging them into our equation for f and plotting the results on our histogram now imagine we did that hundreds if not millions of times when we're all done with our random data sets we return to our original data set we then plug the numbers into our equation for f in this case we got FAL 6 the P value is the number of more extreme values divided by all of the values so in this case we have the value at FAL 6 and the value at FAL 7 divided by all the other randomizations that we created originally if this concept is confusing to you I have a stat Quest that explains P values so check that one out bam you can approximate the h histogram with a line in practice rather than generating tons of random data sets people use the line to calculate the P value here's an example of one standard F distribution that people use to calculate P values the degrees of freedom determine the shape the red line represents another standard F distribution that people use to calculate P values in this case the sample size used to draw the red line is smaller than the sample size used to draw the Blue Line notice that when n minus P fit equals 10 the distribution tapers off faster this means that the P value will be smaller when there are more samples relative to the number of parameters in the fit equation triple bam hooray we finally got our P value now let's riew the main ideas given some data that you think are related linear regression quantifies the relationship in the data this is r s this needs to be large it also determines how reliable that relationship is this is the P value that we calculated with F this needs to be small you need both to have an interesting result hooray he we've made it to the end of another exciting stack Quest wow this was a long one I hope you had a good time if you like this and want to see more stack quests like it why don't you subscribe to my channel it's real easy just click the red button and if you have any ideas of Stack quests that you'd like me to create just put them in the comments below that's all there is to it all right tune in next time for another really exciting stack Quest

Original Description

The concepts behind linear regression, fitting a line to data with least squares and R-squared, are pretty darn simple, so let's get down to it! NOTE: This StatQuest comes with a companion video for how to do linear regression in R: https://youtu.be/u1cc1r_Y7M0 You can also find example code at the StatQuest github: https://github.com/StatQuest/linear_regression_demo For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer 0:00 Awesome song and introduction 0:37 The Main Ideas!!! 1:12 Review of fitting a line to data 4:00 Review of R-squared 12:13 R-squared for a multivariable model 14:16 Why adding variables will never reduce R-squared 16:08 Calculating a p-value for R-squared 25:26 The F-distribution Correction: 25:39 I should have (Pfit - Pmean) instead of the other way around. #statquest #regression

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from StatQuest with Josh Starmer · StatQuest with Josh Starmer · 52 of 60

← Previous Next →

StatQuest with Josh Starmer

StatQuest with Josh Starmer

R-squared, Clearly Explained!!!

R-squared, Clearly Explained!!!

StatQuest with Josh Starmer

Wrapping up dumplings for pot stickers.

Wrapping up dumplings for pot stickers.

StatQuest with Josh Starmer

The standard error, Clearly Explained!!!

The standard error, Clearly Explained!!!

StatQuest with Josh Starmer

That Dude (in the movies)

That Dude (in the movies)

StatQuest with Josh Starmer

How to puree garlic

How to puree garlic

StatQuest with Josh Starmer

Confidence Intervals, Clearly Explained!!!

Confidence Intervals, Clearly Explained!!!

StatQuest with Josh Starmer

RPKM, FPKM and TPM, Clearly Explained!!!

RPKM, FPKM and TPM, Clearly Explained!!!

StatQuest with Josh Starmer

Principal Component Analysis (PCA) clearly explained (2015)

Principal Component Analysis (PCA) clearly explained (2015)

StatQuest with Josh Starmer

StatQuest: RNA-seq - the problem with technical replicates

StatQuest: RNA-seq - the problem with technical replicates

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Christmas In Rio! (now on iTunes!)

Christmas In Rio! (now on iTunes!)

StatQuest with Josh Starmer

Drawing and Interpreting Heatmaps

Drawing and Interpreting Heatmaps

StatQuest with Josh Starmer

Rachel's Song (the ballad of Hazel Motes)

Rachel's Song (the ballad of Hazel Motes)

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Say Your Goodbyes

Say Your Goodbyes

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest: Linear Discriminant Analysis (LDA) clearly explained.

StatQuest: Linear Discriminant Analysis (LDA) clearly explained.

StatQuest with Josh Starmer

Maybe It'll Go Away

Maybe It'll Go Away

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest with Josh Starmer

p-hacking and power calculations

p-hacking and power calculations

StatQuest with Josh Starmer

StatQuest with Josh Starmer

The Coldest Day of the Year

The Coldest Day of the Year

StatQuest with Josh Starmer

StatQuest with Josh Starmer

False Discovery Rates, FDR, clearly explained

False Discovery Rates, FDR, clearly explained

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuickie: Thresholds for Significance

StatQuickie: Thresholds for Significance

StatQuest with Josh Starmer

Logs (logarithms), Clearly Explained!!!

Logs (logarithms), Clearly Explained!!!

StatQuest with Josh Starmer

Bar Charts Are Better than Pie Charts

Bar Charts Are Better than Pie Charts

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuickie: Which t test to use

StatQuickie: Which t test to use

StatQuest with Josh Starmer

Fisher's Exact Test and the Hypergeometric Distribution

Fisher's Exact Test and the Hypergeometric Distribution

StatQuest with Josh Starmer

Standard Deviation vs Standard Error, Clearly Explained!!!

Standard Deviation vs Standard Error, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: DESeq2, part 1, Library Normalization

StatQuest: DESeq2, part 1, Library Normalization

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest: edgeR, part 1, Library Normalization

StatQuest: edgeR, part 1, Library Normalization

StatQuest with Josh Starmer

The Main Ideas behind Probability Distributions

The Main Ideas behind Probability Distributions

StatQuest with Josh Starmer

StatQuest: One or Two Tailed P-Values

StatQuest: One or Two Tailed P-Values

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Sampling from a Distribution, Clearly Explained!!!

Sampling from a Distribution, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: edgeR and DESeq2, part 2 - Independent Filtering

StatQuest: edgeR and DESeq2, part 2 - Independent Filtering

StatQuest with Josh Starmer

The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)

The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)

StatQuest with Josh Starmer

The Sum of Regrets

The Sum of Regrets

StatQuest with Josh Starmer

Lowess and Loess, Clearly Explained!!!

Lowess and Loess, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: Hierarchical Clustering

StatQuest: Hierarchical Clustering

StatQuest with Josh Starmer

StatQuest: K-nearest neighbors, Clearly Explained

StatQuest: K-nearest neighbors, Clearly Explained

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Boxplots are Awesome!!!

Boxplots are Awesome!!!

StatQuest with Josh Starmer

What is a (mathematical) model?

What is a (mathematical) model?

StatQuest with Josh Starmer

Linear Regression, Clearly Explained!!!

Linear Regression, Clearly Explained!!!

StatQuest with Josh Starmer

Linear Regression in R, Step-by-Step

Linear Regression in R, Step-by-Step

StatQuest with Josh Starmer

Maximum Likelihood, clearly explained!!!

Maximum Likelihood, clearly explained!!!

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Using Linear Models for t-tests and ANOVA, Clearly Explained!!!

Using Linear Models for t-tests and ANOVA, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: How to make a Mean Pizza Crust!!!

StatQuest: How to make a Mean Pizza Crust!!!

StatQuest with Josh Starmer

StatQuest: A gentle introduction to RNA-seq

StatQuest: A gentle introduction to RNA-seq

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest: t-SNE, Clearly Explained

StatQuest: t-SNE, Clearly Explained

StatQuest with Josh Starmer

This video explains linear regression using least squares to fit a line to data, with a focus on calculating and interpreting R-squared and P-values, and demonstrates the process with steps and examples.

Key Takeaways

Use least squares to fit a line to data
Calculate R-squared
Calculate P-value for R-squared
Draw a line through the data
Measure the distance from the line to the data
Calculate the sum of squares around the mean and the predicted line or plane
Compare the two sums of squares to calculate R^2

💡 The P-value is calculated using the F-distribution with degrees of freedom that determine the shape, and a smaller P-value indicates a more significant result.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Research Methods

View skill →

Mechanics of Materials III: Beam Bending

Mechanics of Materials III: Beam Bending

Inaugural Lecture: Juliane Reinecke

Inaugural Lecture: Juliane Reinecke

Saïd Business School, University of Oxford

Hands-On Learning: How and Why You Should Build a Home Lab

Hands-On Learning: How and Why You Should Build a Home Lab

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

Does Water Swirl the Other Way in the Southern Hemisphere?

Does Water Swirl the Other Way in the Southern Hemisphere?

Undergraduate Research Forum 2026

Undergraduate Research Forum 2026

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Chapters (9)

Awesome song and introduction

0:37 The Main Ideas!!!

1:12 Review of fitting a line to data

4:00 Review of R-squared

12:13 R-squared for a multivariable model

14:16 Why adding variables will never reduce R-squared

16:08 Calculating a p-value for R-squared

25:26 The F-distribution

25:39 I should have (Pfit - Pmean) instead of the other way around.

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling