The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)

StatQuest with Josh Starmer · Beginner ·📄 Research Papers Explained ·9y ago

Key Takeaways

The video discusses the concept of fitting a line to data, also known as least squares and linear regression, using the sum of squared residuals to measure the fit of the line, and finding the optimal values for the slope and intercept using derivatives.

Full Transcript

when you go on a quest and that Quest is really awesome it's a stack Quest yeah yeah yeah hello and welcome to stack Quest stack Quest is brought to you by the friendly folks in the genetics department at the University of North Carolina at Chapel Hill today we're going to talk about fitting a line to data AKA least squares AKA a linear regression now let's get to it okay you worked really hard you did the experiment and now you got some data here it is plotted on an XY graph we usually like to add a line to our data so we can see what the trend is but is this the best line we should use or does this new line fit the data even better or what about this line is it better or worse than the other options a horizontal line that cuts through the average y value of our data is probably the worst fit of all however it gives us a good starting point for talking about how to find the optimal line to fit our data so now let's focus on this horizontal line it cuts through the average y value which is 3.5 let's just call this point B because different data sets will have different average values on the Y AIS that is to say the Y value for this line is B and for this particular data set b equal 3.5 we can measure how well this line fits the data by seeing how close it is to the data points we'll start with the point in the lower leftand corner of the graph with coordinates X1 y1 we can now draw a line from this point up to the line that cuts across the average yvalue for this data set the distance between the line and the first data point equals B minus y1 the distance between the line and the second data point is B minus Y2 so far the total distance between the data points and the line is the sum of the two distances and we can calculate the distance between the line and the third point that equals B minus Y3 now we've added the third distance to our total sum the distance for the fourth point is B minus y4 note y4 is greater than b because it's above the horizontal line so this value will be negative that's no good since it will subtract from the total and make the overall fit appear better than it really is the fifth data point is even higher relative to the horizontal line This distance is going to be very negative back in the day when they were first working this out they probably tried taking the absolute value of everything and then discovered that it made the math pretty tricky so they ended up squaring each term squaring ensures that each term is positive here's the equation that shows the total distance the data points have from the horizontal line in this specific example 24.6 2 is our measure of how well this line fits the data it's called the sum of squared residuals because the residuals are the differences between the real data and the line and we are summing the square of these values now let's see how good the fit is if we rotate the line a little bit in this case the sum of squared residuals equals 18.72% 05 that value keeps going down the more we rotate the line what if we rotate the line a whole lot well as you can see the fit gets worse in this case the sum of squared residuals is 31.7 one so there's a sweet spot in between horizontal and two vertical to find that sweet spot let's start with the generic line equation this is y = ax or a * x + B A is the slope of the line and B is the Y intercept of the line that's the location on the Y AIS that the line crosses when x equals 0 we want to find the optimal values for A and B so that we minimize the sum of squared residuals in more general math terms the sum of squared residuals is this complicated mathematical equation but it's actually not that complicated this first part is the value of the line at X1 and this second part is the observed value at X1 so really all we're doing in this part of the equation is calculating the distance between the line and the observed value so this is no big deal since we want the line that will give us the small smallest sum of squares this method for finding the best values for A and B is called least squares if we plotted the sum of squared residuals versus each rotation we'd get something like this where on the Y AIS we have the sum of squared residuals and on the x axis we've got each different rotation of the line we see that the sum of squared residuals goes down when we start rotating the line but that it's possible to rotate the line tooo far and the sum of squared residual starts going back up again how do we find the optimal rotation for the line well we take the derivative of this function the derivative tells us the slope of the function at every point the slope at the point on the far left side is pretty steep as we move to the right we see that the slope isn't as steep the slope at the best point where we have the least squares is zero after that the slope starts getting steep again let's go back to that middle point where we have the least squares value and the slope is zero remember the different rotations are just different values for a the slope and B The Intercept we can use a 3D graph to show how different values for the slope and intercept result in different sums of squares in this graph The Intercept is the z-axis so it's going back sort of deep into your computer screen and if we select one value for the intercept for example assume we set the intercept value to be three then we could change values for the slope and see how an intercept of three plus different values for the slope SL would affect the sum of squared residuals anyways we do that for Bunches of different intercepts and slopes taking the derivatives of both the slope and the intercepts tells us where the optimal values are for the best fit note no one ever solves this problem by hand this is done on a computer so for most people it's not essential to know how to take these derivatives however it's essential to understand the concepts big important concept number one we want to minimize the square of the distance between the observed values and the line big important concept number two we do this by taking the derivative and finding where it is equal to zero the final line minimizes the sums of squares it gives the least squares between it and the real data in this case the line is defined by the following equation y = 0.77 * x + 0.66 hooray we've made it to the end of another stat Quest tune in next time for another exciting adventure and statistics land

Original Description

Fitting a line to data is actually pretty straightforward. For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer #statquest #regression
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from StatQuest with Josh Starmer · StatQuest with Josh Starmer · 44 of 60

1 Cutting Butter
Cutting Butter
StatQuest with Josh Starmer
2 onion-dice
onion-dice
StatQuest with Josh Starmer
3 R-squared, Clearly Explained!!!
R-squared, Clearly Explained!!!
StatQuest with Josh Starmer
4 Wrapping up dumplings for pot stickers.
Wrapping up dumplings for pot stickers.
StatQuest with Josh Starmer
5 The standard error, Clearly Explained!!!
The standard error, Clearly Explained!!!
StatQuest with Josh Starmer
6 That Dude (in the movies)
That Dude (in the movies)
StatQuest with Josh Starmer
7 How to puree garlic
How to puree garlic
StatQuest with Josh Starmer
8 Confidence Intervals, Clearly Explained!!!
Confidence Intervals, Clearly Explained!!!
StatQuest with Josh Starmer
9 RPKM, FPKM and TPM, Clearly Explained!!!
RPKM, FPKM and TPM, Clearly Explained!!!
StatQuest with Josh Starmer
10 Principal Component Analysis (PCA) clearly explained (2015)
Principal Component Analysis (PCA) clearly explained (2015)
StatQuest with Josh Starmer
11 StatQuest: RNA-seq - the problem with technical replicates
StatQuest: RNA-seq - the problem with technical replicates
StatQuest with Josh Starmer
12 That's Alright
That's Alright
StatQuest with Josh Starmer
13 Christmas In Rio! (now on iTunes!)
Christmas In Rio! (now on iTunes!)
StatQuest with Josh Starmer
14 Drawing and Interpreting Heatmaps
Drawing and Interpreting Heatmaps
StatQuest with Josh Starmer
15 Rachel's Song (the ballad of Hazel Motes)
Rachel's Song (the ballad of Hazel Motes)
StatQuest with Josh Starmer
16 Deal With It
Deal With It
StatQuest with Josh Starmer
17 Say Your Goodbyes
Say Your Goodbyes
StatQuest with Josh Starmer
18 Another Day
Another Day
StatQuest with Josh Starmer
19 StatQuest: Linear Discriminant Analysis (LDA) clearly explained.
StatQuest: Linear Discriminant Analysis (LDA) clearly explained.
StatQuest with Josh Starmer
20 Maybe It'll Go Away
Maybe It'll Go Away
StatQuest with Josh Starmer
21 Nasty Weather
Nasty Weather
StatQuest with Josh Starmer
22 Roses
Roses
StatQuest with Josh Starmer
23 p-hacking and power calculations
p-hacking and power calculations
StatQuest with Josh Starmer
24 I Love You
I Love You
StatQuest with Josh Starmer
25 The Coldest Day of the Year
The Coldest Day of the Year
StatQuest with Josh Starmer
26 Psycho Killer
Psycho Killer
StatQuest with Josh Starmer
27 False Discovery Rates, FDR, clearly explained
False Discovery Rates, FDR, clearly explained
StatQuest with Josh Starmer
28 A New Song
A New Song
StatQuest with Josh Starmer
29 StatQuickie: Thresholds for Significance
StatQuickie: Thresholds for Significance
StatQuest with Josh Starmer
30 Logs (logarithms), Clearly Explained!!!
Logs (logarithms), Clearly Explained!!!
StatQuest with Josh Starmer
31 Bar Charts Are Better than Pie Charts
Bar Charts Are Better than Pie Charts
StatQuest with Josh Starmer
32 Mr  Hattie
Mr Hattie
StatQuest with Josh Starmer
33 StatQuickie: Which t test to use
StatQuickie: Which t test to use
StatQuest with Josh Starmer
34 Fisher's Exact Test and the Hypergeometric Distribution
Fisher's Exact Test and the Hypergeometric Distribution
StatQuest with Josh Starmer
35 Standard Deviation vs Standard Error, Clearly Explained!!!
Standard Deviation vs Standard Error, Clearly Explained!!!
StatQuest with Josh Starmer
36 StatQuest: DESeq2, part 1, Library Normalization
StatQuest: DESeq2, part 1, Library Normalization
StatQuest with Josh Starmer
37 The Rainbow
The Rainbow
StatQuest with Josh Starmer
38 StatQuest: edgeR, part 1, Library Normalization
StatQuest: edgeR, part 1, Library Normalization
StatQuest with Josh Starmer
39 The Main Ideas behind Probability Distributions
The Main Ideas behind Probability Distributions
StatQuest with Josh Starmer
40 StatQuest:  One or Two Tailed P-Values
StatQuest: One or Two Tailed P-Values
StatQuest with Josh Starmer
41 Evil Genius
Evil Genius
StatQuest with Josh Starmer
42 Sampling from a Distribution, Clearly Explained!!!
Sampling from a Distribution, Clearly Explained!!!
StatQuest with Josh Starmer
43 StatQuest: edgeR and DESeq2, part 2 - Independent Filtering
StatQuest: edgeR and DESeq2, part 2 - Independent Filtering
StatQuest with Josh Starmer
The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)
The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)
StatQuest with Josh Starmer
45 The Sum of Regrets
The Sum of Regrets
StatQuest with Josh Starmer
46 Lowess and Loess, Clearly Explained!!!
Lowess and Loess, Clearly Explained!!!
StatQuest with Josh Starmer
47 StatQuest: Hierarchical Clustering
StatQuest: Hierarchical Clustering
StatQuest with Josh Starmer
48 StatQuest: K-nearest neighbors, Clearly Explained
StatQuest: K-nearest neighbors, Clearly Explained
StatQuest with Josh Starmer
49 Your Dark Side
Your Dark Side
StatQuest with Josh Starmer
50 Boxplots are Awesome!!!
Boxplots are Awesome!!!
StatQuest with Josh Starmer
51 What is a (mathematical) model?
What is a (mathematical) model?
StatQuest with Josh Starmer
52 Linear Regression, Clearly Explained!!!
Linear Regression, Clearly Explained!!!
StatQuest with Josh Starmer
53 Linear Regression in R, Step-by-Step
Linear Regression in R, Step-by-Step
StatQuest with Josh Starmer
54 Maximum Likelihood, clearly explained!!!
Maximum Likelihood, clearly explained!!!
StatQuest with Josh Starmer
55 Brothers
Brothers
StatQuest with Josh Starmer
56 Using Linear Models for t-tests and ANOVA, Clearly Explained!!!
Using Linear Models for t-tests and ANOVA, Clearly Explained!!!
StatQuest with Josh Starmer
57 StatQuest: How to make a Mean Pizza Crust!!!
StatQuest: How to make a Mean Pizza Crust!!!
StatQuest with Josh Starmer
58 StatQuest: A gentle introduction to RNA-seq
StatQuest: A gentle introduction to RNA-seq
StatQuest with Josh Starmer
59 I'm Alive
I'm Alive
StatQuest with Josh Starmer
60 StatQuest: t-SNE, Clearly Explained
StatQuest: t-SNE, Clearly Explained
StatQuest with Josh Starmer

The video explains how to fit a line to data using least squares and linear regression, and how to find the optimal values for the slope and intercept using derivatives. The concept of sum of squared residuals is used to measure the fit of the line, and the video provides a step-by-step guide on how to calculate it.

Key Takeaways
  1. Plot data on an XY graph
  2. Calculate the sum of squared residuals for a horizontal line
  3. Rotate the line to find the optimal fit
  4. Use derivatives to find the optimal values for the slope and intercept
  5. Calculate the final line equation using the optimal values
💡 The sum of squared residuals is a measure of how well a line fits the data, and finding the optimal values for the slope and intercept using derivatives is a key concept in linear regression.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →