R-squared, Clearly Explained!!!

StatQuest with Josh Starmer · Beginner ·📄 Research Papers Explained ·11y ago

Skills: ML Maths Basics80%Reading ML Papers60%

Key Takeaways

The video explains R-squared, a metric of correlation, and its interpretation, highlighting its advantages over plain old R correlation values, and demonstrates how to calculate it with examples.

Full Transcript

stat Quest stat Quest stat Quest stat Quest stat Quest is brought to you by the friendly people in the genetics department at the University of North Carolina at Chapel Hill hello and welcome to stat quest in this video we're going to talk about r squared r squ is a metric of correlation that is easy to compute and intuitive to interpret most of us are already familiar with correlation and the standard metric of it plain old R correlation values that are close to one or negative 1 are good and tell you that two quantitative variables for example weight and size are strongly related correlation values close to zero are lame some of you may be asking why should we care about R squ we already have regular R some of you might just be asking what is r s r 2 is very similar to its hipper cousin R but interpretation is easier for example it's not obvious that when R equals 0.7 that's twice as good a correlation as when R equals 0.5 however R2 = 0.7 is what it looks like it's 1.4 times as good as r^2 = 0.5 the other thing that I like about R 2 is that it's easy and intuitive to calculate let's start with an example here we're plotting mouse weight on the Y AIS with high weights towards the top and low weights towards the bottom and mouse identification numbers on the xaxis with ID numbers 1 through 7 we can calculate the mean or average of the mouse weight and plot it as a line that spans the graph we can calculate the variation of the data around this mean as the sum of the squar differences of the weight for each Mouse I where I is an individual Mouse represented by a red dot and the mean the difference between each data point is squared so that the points below the mean don't cancel out the points above the mean now What If instead of ordering our mice by their identification number we ordered them by their size instead of using identification number on the xais we have Mouse size with the smallest size on the left side and the largest size on the right side all we have done is reorder the data on the xais the mean and variation are the exact same as before here we show the mean again as a black bar that spans the graph in the exact same location as it was before also the distances between the dots and the line have not changed just the order of the dots here's a question for you given that we know an individual Mouse's size is the mean or average weight the best way to predict that individual Mouse's weight well the answer is no we can do way better all we have to do is fit a line into the data now we can predict weight with our line you tell me you have a large Mouse I can look at my line and make a good guess about the weight here's another question does the blue line that we just drew fit the data better than the mean if so how much better by I it looks like the blue line fits the data better than the mean how do we quantify that difference R SAR in the bottom of the graph I've drawn the equation for R 2 we're going to walk through it one step at a time the first part of the equation is just the variation around the mean we already calculated that it's just the sum of the squar differences of the actual data values from the mean the second part of the equation is the variation around our new Blue Line This is calculated in a very similar way here we just want the sum of the squar differences between the actual data points and our new Blue Line the numerator which is the difference between the variation around the mean and the variation around the blue line is then divided by the variation around the mean this makes R 2 range from 0 to one because the variation around the line will never be greater than the variation around the mean and it will never be less than zero this division also makes R 2 a percentage and we'll talk more about that in just a second now we'll walk through an example where we calculate things one step at a time first we'll start with the variation around the mean in this case that equals 32 the variation around the blue line is only six which is what we suspected since it appears to fit the data much better once we've calculated ated the variation around the mean and the variation around our Blue Line we can plug these values in to our formula for r s after plugging in our values we get r^2 = 32 - 6 over 32 after subtracting 6 from 32 we get 26 doing the division 26 / 32 gives us 0.81 or 81% % this means that there is 81% less variation around the line than the mean in other words the size weight relationship accounts for 81% of the total variation this means that most of the variation in the data is explained by the size weight relationship here's another example in this example we're comparing two possibly uncorr ated variables on the y- AIS we have mouse weight again but on the x-axis we now have time spent sniffing a rock like before we calculate the variation around the mean and just like before we got 32 however this time when we calculated the variation around the Blue Line we got a much larger value 30 now we just plug those values into our formula for r squar by doing the math we see that r^2 equal 0.06 or 6% thus there is only 6% less variation around the line than the mean in other words the sniff weight relationship accounts for only 6% of the total variation this means that hardly any of the variation in the data is expl explained by the sniff weight relationship now when someone says the statistically significant R 2ar was 0.9 you can think to yourself very good the relationship between the two variables explains 90% of the variation in the data and when someone else says the statistically significant R 2 was 0.01 you can think to yourself dag who cares if that relationship is significant it only accounts for 1% of the variation in the data something else must explain the remaining 99% what about plain old R how is it related to R SAR R SAR is just the square of R now when someone says the statistically significant R was 0.9 and we talking about just plain old R you can think to yourself 0.9 * 0.9 = 0.81 very good the relationship between the two variables explains 81% of the variation in the data and when someone else says the statistically significant R that's plain old R was 0.5 you can think to yourself 0.5 * 0.5 = 0.25 the relationship accounts for 25% of the variation in the data that's good if there are a million other things accounting for the remaining 75% and bad if there's only one thing I like R squar more than just plain old R because it's easier to interpret here's an example how much better is r equal 0.7 then R equals 0.5 well if we convert those numbers to R2 we see that when r^2 equal 0.72 it actually equals 0.5 which means 50% of the original variation is explained by the relationship when R2 equals 0.5 SAR which equals 0.25 we see that only 25% of the original variation is explained by the relationship with r s it's easy to see that the first correlation is twice as good as the second explaining 50% of the original variation is twice as good as only explaining 25% of the original variation that said r s does not indicate the direction of the correlation because squared numbers are never negative if the direction of the correlation isn't obvious you can say the two variables were positively or negatively correlated with r^2 equals dot dot dot whatever that value may be these are the two main ideas for r squared r squ is the percentage of variation explained by the relationship between two variables and also if someone gives you a value for plain old R just Square it in your head you'll understand what's going on a whole lot better we've reached the end of our stat Quest tune in next time for an exciting Adventure into the land of stati statistics

Original Description

R-squared is one of the most useful metrics in statistics. It can give you a sense of how good your model is. For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer #statquest #statistics #rsquared

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from StatQuest with Josh Starmer · StatQuest with Josh Starmer · 3 of 60

← Previous Next →

StatQuest with Josh Starmer

StatQuest with Josh Starmer

R-squared, Clearly Explained!!!

R-squared, Clearly Explained!!!

StatQuest with Josh Starmer

Wrapping up dumplings for pot stickers.

Wrapping up dumplings for pot stickers.

StatQuest with Josh Starmer

The standard error, Clearly Explained!!!

The standard error, Clearly Explained!!!

StatQuest with Josh Starmer

That Dude (in the movies)

That Dude (in the movies)

StatQuest with Josh Starmer

How to puree garlic

How to puree garlic

StatQuest with Josh Starmer

Confidence Intervals, Clearly Explained!!!

Confidence Intervals, Clearly Explained!!!

StatQuest with Josh Starmer

RPKM, FPKM and TPM, Clearly Explained!!!

RPKM, FPKM and TPM, Clearly Explained!!!

StatQuest with Josh Starmer

Principal Component Analysis (PCA) clearly explained (2015)

Principal Component Analysis (PCA) clearly explained (2015)

StatQuest with Josh Starmer

StatQuest: RNA-seq - the problem with technical replicates

StatQuest: RNA-seq - the problem with technical replicates

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Christmas In Rio! (now on iTunes!)

Christmas In Rio! (now on iTunes!)

StatQuest with Josh Starmer

Drawing and Interpreting Heatmaps

Drawing and Interpreting Heatmaps

StatQuest with Josh Starmer

Rachel's Song (the ballad of Hazel Motes)

Rachel's Song (the ballad of Hazel Motes)

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Say Your Goodbyes

Say Your Goodbyes

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest: Linear Discriminant Analysis (LDA) clearly explained.

StatQuest: Linear Discriminant Analysis (LDA) clearly explained.

StatQuest with Josh Starmer

Maybe It'll Go Away

Maybe It'll Go Away

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest with Josh Starmer

p-hacking and power calculations

p-hacking and power calculations

StatQuest with Josh Starmer

StatQuest with Josh Starmer

The Coldest Day of the Year

The Coldest Day of the Year

StatQuest with Josh Starmer

StatQuest with Josh Starmer

False Discovery Rates, FDR, clearly explained

False Discovery Rates, FDR, clearly explained

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuickie: Thresholds for Significance

StatQuickie: Thresholds for Significance

StatQuest with Josh Starmer

Logs (logarithms), Clearly Explained!!!

Logs (logarithms), Clearly Explained!!!

StatQuest with Josh Starmer

Bar Charts Are Better than Pie Charts

Bar Charts Are Better than Pie Charts

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuickie: Which t test to use

StatQuickie: Which t test to use

StatQuest with Josh Starmer

Fisher's Exact Test and the Hypergeometric Distribution

Fisher's Exact Test and the Hypergeometric Distribution

StatQuest with Josh Starmer

Standard Deviation vs Standard Error, Clearly Explained!!!

Standard Deviation vs Standard Error, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: DESeq2, part 1, Library Normalization

StatQuest: DESeq2, part 1, Library Normalization

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest: edgeR, part 1, Library Normalization

StatQuest: edgeR, part 1, Library Normalization

StatQuest with Josh Starmer

The Main Ideas behind Probability Distributions

The Main Ideas behind Probability Distributions

StatQuest with Josh Starmer

StatQuest: One or Two Tailed P-Values

StatQuest: One or Two Tailed P-Values

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Sampling from a Distribution, Clearly Explained!!!

Sampling from a Distribution, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: edgeR and DESeq2, part 2 - Independent Filtering

StatQuest: edgeR and DESeq2, part 2 - Independent Filtering

StatQuest with Josh Starmer

The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)

The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)

StatQuest with Josh Starmer

The Sum of Regrets

The Sum of Regrets

StatQuest with Josh Starmer

Lowess and Loess, Clearly Explained!!!

Lowess and Loess, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: Hierarchical Clustering

StatQuest: Hierarchical Clustering

StatQuest with Josh Starmer

StatQuest: K-nearest neighbors, Clearly Explained

StatQuest: K-nearest neighbors, Clearly Explained

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Boxplots are Awesome!!!

Boxplots are Awesome!!!

StatQuest with Josh Starmer

What is a (mathematical) model?

What is a (mathematical) model?

StatQuest with Josh Starmer

Linear Regression, Clearly Explained!!!

Linear Regression, Clearly Explained!!!

StatQuest with Josh Starmer

Linear Regression in R, Step-by-Step

Linear Regression in R, Step-by-Step

StatQuest with Josh Starmer

Maximum Likelihood, clearly explained!!!

Maximum Likelihood, clearly explained!!!

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Using Linear Models for t-tests and ANOVA, Clearly Explained!!!

Using Linear Models for t-tests and ANOVA, Clearly Explained!!!

StatQuest with Josh Starmer

StatQuest: How to make a Mean Pizza Crust!!!

StatQuest: How to make a Mean Pizza Crust!!!

StatQuest with Josh Starmer

StatQuest: A gentle introduction to RNA-seq

StatQuest: A gentle introduction to RNA-seq

StatQuest with Josh Starmer

StatQuest with Josh Starmer

StatQuest: t-SNE, Clearly Explained

StatQuest: t-SNE, Clearly Explained

StatQuest with Josh Starmer

R-squared is a useful metric in statistics that measures the percentage of variation explained by the relationship between two variables, and is easier to interpret than plain old R correlation values.

Key Takeaways

Calculate the mean of the data
Calculate the variation around the mean
Fit a regression line to the data
Calculate the variation around the regression line
Calculate R-squared using the formula

💡 R-squared is a percentage that represents the proportion of variation in the data that is explained by the relationship between the variables, making it easier to interpret than plain old R correlation values.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling