R-squared, Clearly Explained!!!

StatQuest with Josh Starmer · Beginner ·📄 Research Papers Explained ·11y ago

Key Takeaways

The video explains R-squared, a metric of correlation, and its interpretation, highlighting its advantages over plain old R correlation values, and demonstrates how to calculate it with examples.

Full Transcript

stat Quest stat Quest stat Quest stat Quest stat Quest is brought to you by the friendly people in the genetics department at the University of North Carolina at Chapel Hill hello and welcome to stat quest in this video we're going to talk about r squared r squ is a metric of correlation that is easy to compute and intuitive to interpret most of us are already familiar with correlation and the standard metric of it plain old R correlation values that are close to one or negative 1 are good and tell you that two quantitative variables for example weight and size are strongly related correlation values close to zero are lame some of you may be asking why should we care about R squ we already have regular R some of you might just be asking what is r s r 2 is very similar to its hipper cousin R but interpretation is easier for example it's not obvious that when R equals 0.7 that's twice as good a correlation as when R equals 0.5 however R2 = 0.7 is what it looks like it's 1.4 times as good as r^2 = 0.5 the other thing that I like about R 2 is that it's easy and intuitive to calculate let's start with an example here we're plotting mouse weight on the Y AIS with high weights towards the top and low weights towards the bottom and mouse identification numbers on the xaxis with ID numbers 1 through 7 we can calculate the mean or average of the mouse weight and plot it as a line that spans the graph we can calculate the variation of the data around this mean as the sum of the squar differences of the weight for each Mouse I where I is an individual Mouse represented by a red dot and the mean the difference between each data point is squared so that the points below the mean don't cancel out the points above the mean now What If instead of ordering our mice by their identification number we ordered them by their size instead of using identification number on the xais we have Mouse size with the smallest size on the left side and the largest size on the right side all we have done is reorder the data on the xais the mean and variation are the exact same as before here we show the mean again as a black bar that spans the graph in the exact same location as it was before also the distances between the dots and the line have not changed just the order of the dots here's a question for you given that we know an individual Mouse's size is the mean or average weight the best way to predict that individual Mouse's weight well the answer is no we can do way better all we have to do is fit a line into the data now we can predict weight with our line you tell me you have a large Mouse I can look at my line and make a good guess about the weight here's another question does the blue line that we just drew fit the data better than the mean if so how much better by I it looks like the blue line fits the data better than the mean how do we quantify that difference R SAR in the bottom of the graph I've drawn the equation for R 2 we're going to walk through it one step at a time the first part of the equation is just the variation around the mean we already calculated that it's just the sum of the squar differences of the actual data values from the mean the second part of the equation is the variation around our new Blue Line This is calculated in a very similar way here we just want the sum of the squar differences between the actual data points and our new Blue Line the numerator which is the difference between the variation around the mean and the variation around the blue line is then divided by the variation around the mean this makes R 2 range from 0 to one because the variation around the line will never be greater than the variation around the mean and it will never be less than zero this division also makes R 2 a percentage and we'll talk more about that in just a second now we'll walk through an example where we calculate things one step at a time first we'll start with the variation around the mean in this case that equals 32 the variation around the blue line is only six which is what we suspected since it appears to fit the data much better once we've calculated ated the variation around the mean and the variation around our Blue Line we can plug these values in to our formula for r s after plugging in our values we get r^2 = 32 - 6 over 32 after subtracting 6 from 32 we get 26 doing the division 26 / 32 gives us 0.81 or 81% % this means that there is 81% less variation around the line than the mean in other words the size weight relationship accounts for 81% of the total variation this means that most of the variation in the data is explained by the size weight relationship here's another example in this example we're comparing two possibly uncorr ated variables on the y- AIS we have mouse weight again but on the x-axis we now have time spent sniffing a rock like before we calculate the variation around the mean and just like before we got 32 however this time when we calculated the variation around the Blue Line we got a much larger value 30 now we just plug those values into our formula for r squar by doing the math we see that r^2 equal 0.06 or 6% thus there is only 6% less variation around the line than the mean in other words the sniff weight relationship accounts for only 6% of the total variation this means that hardly any of the variation in the data is expl explained by the sniff weight relationship now when someone says the statistically significant R 2ar was 0.9 you can think to yourself very good the relationship between the two variables explains 90% of the variation in the data and when someone else says the statistically significant R 2 was 0.01 you can think to yourself dag who cares if that relationship is significant it only accounts for 1% of the variation in the data something else must explain the remaining 99% what about plain old R how is it related to R SAR R SAR is just the square of R now when someone says the statistically significant R was 0.9 and we talking about just plain old R you can think to yourself 0.9 * 0.9 = 0.81 very good the relationship between the two variables explains 81% of the variation in the data and when someone else says the statistically significant R that's plain old R was 0.5 you can think to yourself 0.5 * 0.5 = 0.25 the relationship accounts for 25% of the variation in the data that's good if there are a million other things accounting for the remaining 75% and bad if there's only one thing I like R squar more than just plain old R because it's easier to interpret here's an example how much better is r equal 0.7 then R equals 0.5 well if we convert those numbers to R2 we see that when r^2 equal 0.72 it actually equals 0.5 which means 50% of the original variation is explained by the relationship when R2 equals 0.5 SAR which equals 0.25 we see that only 25% of the original variation is explained by the relationship with r s it's easy to see that the first correlation is twice as good as the second explaining 50% of the original variation is twice as good as only explaining 25% of the original variation that said r s does not indicate the direction of the correlation because squared numbers are never negative if the direction of the correlation isn't obvious you can say the two variables were positively or negatively correlated with r^2 equals dot dot dot whatever that value may be these are the two main ideas for r squared r squ is the percentage of variation explained by the relationship between two variables and also if someone gives you a value for plain old R just Square it in your head you'll understand what's going on a whole lot better we've reached the end of our stat Quest tune in next time for an exciting Adventure into the land of stati statistics

Original Description

R-squared is one of the most useful metrics in statistics. It can give you a sense of how good your model is. For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer #statquest #statistics #rsquared
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from StatQuest with Josh Starmer · StatQuest with Josh Starmer · 3 of 60

1 Cutting Butter
Cutting Butter
StatQuest with Josh Starmer
2 onion-dice
onion-dice
StatQuest with Josh Starmer
R-squared, Clearly Explained!!!
R-squared, Clearly Explained!!!
StatQuest with Josh Starmer
4 Wrapping up dumplings for pot stickers.
Wrapping up dumplings for pot stickers.
StatQuest with Josh Starmer
5 The standard error, Clearly Explained!!!
The standard error, Clearly Explained!!!
StatQuest with Josh Starmer
6 That Dude (in the movies)
That Dude (in the movies)
StatQuest with Josh Starmer
7 How to puree garlic
How to puree garlic
StatQuest with Josh Starmer
8 Confidence Intervals, Clearly Explained!!!
Confidence Intervals, Clearly Explained!!!
StatQuest with Josh Starmer
9 RPKM, FPKM and TPM, Clearly Explained!!!
RPKM, FPKM and TPM, Clearly Explained!!!
StatQuest with Josh Starmer
10 Principal Component Analysis (PCA) clearly explained (2015)
Principal Component Analysis (PCA) clearly explained (2015)
StatQuest with Josh Starmer
11 StatQuest: RNA-seq - the problem with technical replicates
StatQuest: RNA-seq - the problem with technical replicates
StatQuest with Josh Starmer
12 That's Alright
That's Alright
StatQuest with Josh Starmer
13 Christmas In Rio! (now on iTunes!)
Christmas In Rio! (now on iTunes!)
StatQuest with Josh Starmer
14 Drawing and Interpreting Heatmaps
Drawing and Interpreting Heatmaps
StatQuest with Josh Starmer
15 Rachel's Song (the ballad of Hazel Motes)
Rachel's Song (the ballad of Hazel Motes)
StatQuest with Josh Starmer
16 Deal With It
Deal With It
StatQuest with Josh Starmer
17 Say Your Goodbyes
Say Your Goodbyes
StatQuest with Josh Starmer
18 Another Day
Another Day
StatQuest with Josh Starmer
19 StatQuest: Linear Discriminant Analysis (LDA) clearly explained.
StatQuest: Linear Discriminant Analysis (LDA) clearly explained.
StatQuest with Josh Starmer
20 Maybe It'll Go Away
Maybe It'll Go Away
StatQuest with Josh Starmer
21 Nasty Weather
Nasty Weather
StatQuest with Josh Starmer
22 Roses
Roses
StatQuest with Josh Starmer
23 p-hacking and power calculations
p-hacking and power calculations
StatQuest with Josh Starmer
24 I Love You
I Love You
StatQuest with Josh Starmer
25 The Coldest Day of the Year
The Coldest Day of the Year
StatQuest with Josh Starmer
26 Psycho Killer
Psycho Killer
StatQuest with Josh Starmer
27 False Discovery Rates, FDR, clearly explained
False Discovery Rates, FDR, clearly explained
StatQuest with Josh Starmer
28 A New Song
A New Song
StatQuest with Josh Starmer
29 StatQuickie: Thresholds for Significance
StatQuickie: Thresholds for Significance
StatQuest with Josh Starmer
30 Logs (logarithms), Clearly Explained!!!
Logs (logarithms), Clearly Explained!!!
StatQuest with Josh Starmer
31 Bar Charts Are Better than Pie Charts
Bar Charts Are Better than Pie Charts
StatQuest with Josh Starmer
32 Mr  Hattie
Mr Hattie
StatQuest with Josh Starmer
33 StatQuickie: Which t test to use
StatQuickie: Which t test to use
StatQuest with Josh Starmer
34 Fisher's Exact Test and the Hypergeometric Distribution
Fisher's Exact Test and the Hypergeometric Distribution
StatQuest with Josh Starmer
35 Standard Deviation vs Standard Error, Clearly Explained!!!
Standard Deviation vs Standard Error, Clearly Explained!!!
StatQuest with Josh Starmer
36 StatQuest: DESeq2, part 1, Library Normalization
StatQuest: DESeq2, part 1, Library Normalization
StatQuest with Josh Starmer
37 The Rainbow
The Rainbow
StatQuest with Josh Starmer
38 StatQuest: edgeR, part 1, Library Normalization
StatQuest: edgeR, part 1, Library Normalization
StatQuest with Josh Starmer
39 The Main Ideas behind Probability Distributions
The Main Ideas behind Probability Distributions
StatQuest with Josh Starmer
40 StatQuest:  One or Two Tailed P-Values
StatQuest: One or Two Tailed P-Values
StatQuest with Josh Starmer
41 Evil Genius
Evil Genius
StatQuest with Josh Starmer
42 Sampling from a Distribution, Clearly Explained!!!
Sampling from a Distribution, Clearly Explained!!!
StatQuest with Josh Starmer
43 StatQuest: edgeR and DESeq2, part 2 - Independent Filtering
StatQuest: edgeR and DESeq2, part 2 - Independent Filtering
StatQuest with Josh Starmer
44 The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)
The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)
StatQuest with Josh Starmer
45 The Sum of Regrets
The Sum of Regrets
StatQuest with Josh Starmer
46 Lowess and Loess, Clearly Explained!!!
Lowess and Loess, Clearly Explained!!!
StatQuest with Josh Starmer
47 StatQuest: Hierarchical Clustering
StatQuest: Hierarchical Clustering
StatQuest with Josh Starmer
48 StatQuest: K-nearest neighbors, Clearly Explained
StatQuest: K-nearest neighbors, Clearly Explained
StatQuest with Josh Starmer
49 Your Dark Side
Your Dark Side
StatQuest with Josh Starmer
50 Boxplots are Awesome!!!
Boxplots are Awesome!!!
StatQuest with Josh Starmer
51 What is a (mathematical) model?
What is a (mathematical) model?
StatQuest with Josh Starmer
52 Linear Regression, Clearly Explained!!!
Linear Regression, Clearly Explained!!!
StatQuest with Josh Starmer
53 Linear Regression in R, Step-by-Step
Linear Regression in R, Step-by-Step
StatQuest with Josh Starmer
54 Maximum Likelihood, clearly explained!!!
Maximum Likelihood, clearly explained!!!
StatQuest with Josh Starmer
55 Brothers
Brothers
StatQuest with Josh Starmer
56 Using Linear Models for t-tests and ANOVA, Clearly Explained!!!
Using Linear Models for t-tests and ANOVA, Clearly Explained!!!
StatQuest with Josh Starmer
57 StatQuest: How to make a Mean Pizza Crust!!!
StatQuest: How to make a Mean Pizza Crust!!!
StatQuest with Josh Starmer
58 StatQuest: A gentle introduction to RNA-seq
StatQuest: A gentle introduction to RNA-seq
StatQuest with Josh Starmer
59 I'm Alive
I'm Alive
StatQuest with Josh Starmer
60 StatQuest: t-SNE, Clearly Explained
StatQuest: t-SNE, Clearly Explained
StatQuest with Josh Starmer

R-squared is a useful metric in statistics that measures the percentage of variation explained by the relationship between two variables, and is easier to interpret than plain old R correlation values.

Key Takeaways
  1. Calculate the mean of the data
  2. Calculate the variation around the mean
  3. Fit a regression line to the data
  4. Calculate the variation around the regression line
  5. Calculate R-squared using the formula
💡 R-squared is a percentage that represents the proportion of variation in the data that is explained by the relationship between the variables, making it easier to interpret than plain old R correlation values.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →