Fisher's Exact Test and the Hypergeometric Distribution

StatQuest with Josh Starmer · Beginner ·🔢 Mathematical Foundations ·9y ago

Key Takeaways

Fisher's exact test and the Hypergeometric distribution are used to determine if a sample is enriched, as demonstrated with an example of M&M's colors, calculating a p-value from the sample to assess its significance.

Full Transcript

hello and welcome to a stat quickie today we're going to talk about Fisher's exact test and enrichment analysis but first let's eat some M&Ms I want to share some with my friends so I just take one handful and get seven blue and one red what does this say about the distribution of colors in the bag do I have more blues than normal lastly can I calculate a p-value from this delicious sample this bags supposed to have two servings and I think a serving of M&Ms is twenty M&Ms so there must be 40mm s in the bag I looked up the proportions of the different colors of M&Ms on the internet and this is what I found so on the right we have a histogram of an idealized bag of M&Ms I'm going to use the histogram of the ideal bag of M&Ms based on the proportions I got off the internet and my sample my handful of M&Ms to determine if my bag is special in this example I don't care about the order of how the M&Ms fell into my hand so let's consider every possible ordering of seven blue and one red as legit let's start by calculating the probability of getting seven blue M&Ms followed by a single red M&M the probability that the first M&M is blue equals eight divided by 48 because there are eight blue M&Ms divided by 40 because there are 40 M&Ms total now that I've got one M&M in my hand there are only seven blue M&Ms left in the bag the probability that the second M&M is blue equals 7 divided by 39 7 because there are now only 7 blue M&Ms in the bag divided by 39 because there are only 39 mm/s now there are only six blue M&Ms left in the bag the probability of getting a third blue M&M is 6 over 38 leaving five blue M&Ms left in the bag and by now you've probably grasped the pattern for how we determine the probabilities for getting a sequence of blue M&Ms once we have calculated the probabilities for getting 7 blue M&Ms in our hand we can now calculate the probability of getting 1 red M&M that's just 5 over 33 five because there are five red M&Ms divided by 33 because there are 33 M&Ms left in the bag at this point now just multiply all those probabilities together to get the probability of getting 7 blues followed by one red and that just equals a really small number that's rare but remember we don't care about order there's more work to do to get the probability of seven blues and one red in any order to calculate the probability of getting seven blues and one red we have to add together the probabilities of each possible ordering the good news is that the process of calculating the probabilities is the same as what we just did good thing we have computers because they'll do the work for us anyways the probability is still really small now what's the p-value if you'll remember from the p-value stat quest sometimes you can have very small probabilities but really large p-values and remember a p-value is the sum of the probabilities of all things equally rare or rarer this is all covered in the stat quest on p-values so that includes adding the probability of getting eight Blues in a row or seven oranges in one blue because that's equally rare and actually there are a lot of different ways you can come up with things that are equally rare or rarer too many to put on this stack quickie so we're just going to skip to the chase again good thing we have computers the p-value ends up being 0.01 so my bag is special hooray we just performed Fisher's exact test on the M&Ms enrichment for other things like does this list of genes have more involved in metabolism than normal it's done the exact same way and for any of you stat Questers that use our a programming language for doing statistics I provided the R code for doing the Fisher's exact test that we performed on the MMS in the description below hooray tune in next week for another stat quickie

Original Description

Fisher's exact test to determine if something is enriched or not. In this case, I wonder if I got an over abundance of blue m&m's. For a complete index of all the StatQuest videos, check out: https://statquest.org/video-index/ If you'd like to support StatQuest, please consider... Patreon: https://www.patreon.com/statquest ...or... YouTube Membership: https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw/join ...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store... https://statquest.org/statquest-store/ ...or just donating to StatQuest! https://www.paypal.me/statquest Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: https://twitter.com/joshuastarmer Correction: 4:15 I say getting 7 oranges and 1 blue is just as rare a getting 7 blues and 1 red. This is incorrect, since there are more blues and oranges in general than there are blues and red. However, the idea that we add up rarer events is correct. #statquest #statistics
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from StatQuest with Josh Starmer · StatQuest with Josh Starmer · 34 of 60

1 Cutting Butter
Cutting Butter
StatQuest with Josh Starmer
2 onion-dice
onion-dice
StatQuest with Josh Starmer
3 R-squared, Clearly Explained!!!
R-squared, Clearly Explained!!!
StatQuest with Josh Starmer
4 Wrapping up dumplings for pot stickers.
Wrapping up dumplings for pot stickers.
StatQuest with Josh Starmer
5 The standard error, Clearly Explained!!!
The standard error, Clearly Explained!!!
StatQuest with Josh Starmer
6 That Dude (in the movies)
That Dude (in the movies)
StatQuest with Josh Starmer
7 How to puree garlic
How to puree garlic
StatQuest with Josh Starmer
8 Confidence Intervals, Clearly Explained!!!
Confidence Intervals, Clearly Explained!!!
StatQuest with Josh Starmer
9 RPKM, FPKM and TPM, Clearly Explained!!!
RPKM, FPKM and TPM, Clearly Explained!!!
StatQuest with Josh Starmer
10 Principal Component Analysis (PCA) clearly explained (2015)
Principal Component Analysis (PCA) clearly explained (2015)
StatQuest with Josh Starmer
11 StatQuest: RNA-seq - the problem with technical replicates
StatQuest: RNA-seq - the problem with technical replicates
StatQuest with Josh Starmer
12 That's Alright
That's Alright
StatQuest with Josh Starmer
13 Christmas In Rio! (now on iTunes!)
Christmas In Rio! (now on iTunes!)
StatQuest with Josh Starmer
14 Drawing and Interpreting Heatmaps
Drawing and Interpreting Heatmaps
StatQuest with Josh Starmer
15 Rachel's Song (the ballad of Hazel Motes)
Rachel's Song (the ballad of Hazel Motes)
StatQuest with Josh Starmer
16 Deal With It
Deal With It
StatQuest with Josh Starmer
17 Say Your Goodbyes
Say Your Goodbyes
StatQuest with Josh Starmer
18 Another Day
Another Day
StatQuest with Josh Starmer
19 StatQuest: Linear Discriminant Analysis (LDA) clearly explained.
StatQuest: Linear Discriminant Analysis (LDA) clearly explained.
StatQuest with Josh Starmer
20 Maybe It'll Go Away
Maybe It'll Go Away
StatQuest with Josh Starmer
21 Nasty Weather
Nasty Weather
StatQuest with Josh Starmer
22 Roses
Roses
StatQuest with Josh Starmer
23 p-hacking and power calculations
p-hacking and power calculations
StatQuest with Josh Starmer
24 I Love You
I Love You
StatQuest with Josh Starmer
25 The Coldest Day of the Year
The Coldest Day of the Year
StatQuest with Josh Starmer
26 Psycho Killer
Psycho Killer
StatQuest with Josh Starmer
27 False Discovery Rates, FDR, clearly explained
False Discovery Rates, FDR, clearly explained
StatQuest with Josh Starmer
28 A New Song
A New Song
StatQuest with Josh Starmer
29 StatQuickie: Thresholds for Significance
StatQuickie: Thresholds for Significance
StatQuest with Josh Starmer
30 Logs (logarithms), Clearly Explained!!!
Logs (logarithms), Clearly Explained!!!
StatQuest with Josh Starmer
31 Bar Charts Are Better than Pie Charts
Bar Charts Are Better than Pie Charts
StatQuest with Josh Starmer
32 Mr  Hattie
Mr Hattie
StatQuest with Josh Starmer
33 StatQuickie: Which t test to use
StatQuickie: Which t test to use
StatQuest with Josh Starmer
Fisher's Exact Test and the Hypergeometric Distribution
Fisher's Exact Test and the Hypergeometric Distribution
StatQuest with Josh Starmer
35 Standard Deviation vs Standard Error, Clearly Explained!!!
Standard Deviation vs Standard Error, Clearly Explained!!!
StatQuest with Josh Starmer
36 StatQuest: DESeq2, part 1, Library Normalization
StatQuest: DESeq2, part 1, Library Normalization
StatQuest with Josh Starmer
37 The Rainbow
The Rainbow
StatQuest with Josh Starmer
38 StatQuest: edgeR, part 1, Library Normalization
StatQuest: edgeR, part 1, Library Normalization
StatQuest with Josh Starmer
39 The Main Ideas behind Probability Distributions
The Main Ideas behind Probability Distributions
StatQuest with Josh Starmer
40 StatQuest:  One or Two Tailed P-Values
StatQuest: One or Two Tailed P-Values
StatQuest with Josh Starmer
41 Evil Genius
Evil Genius
StatQuest with Josh Starmer
42 Sampling from a Distribution, Clearly Explained!!!
Sampling from a Distribution, Clearly Explained!!!
StatQuest with Josh Starmer
43 StatQuest: edgeR and DESeq2, part 2 - Independent Filtering
StatQuest: edgeR and DESeq2, part 2 - Independent Filtering
StatQuest with Josh Starmer
44 The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)
The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.)
StatQuest with Josh Starmer
45 The Sum of Regrets
The Sum of Regrets
StatQuest with Josh Starmer
46 Lowess and Loess, Clearly Explained!!!
Lowess and Loess, Clearly Explained!!!
StatQuest with Josh Starmer
47 StatQuest: Hierarchical Clustering
StatQuest: Hierarchical Clustering
StatQuest with Josh Starmer
48 StatQuest: K-nearest neighbors, Clearly Explained
StatQuest: K-nearest neighbors, Clearly Explained
StatQuest with Josh Starmer
49 Your Dark Side
Your Dark Side
StatQuest with Josh Starmer
50 Boxplots are Awesome!!!
Boxplots are Awesome!!!
StatQuest with Josh Starmer
51 What is a (mathematical) model?
What is a (mathematical) model?
StatQuest with Josh Starmer
52 Linear Regression, Clearly Explained!!!
Linear Regression, Clearly Explained!!!
StatQuest with Josh Starmer
53 Linear Regression in R, Step-by-Step
Linear Regression in R, Step-by-Step
StatQuest with Josh Starmer
54 Maximum Likelihood, clearly explained!!!
Maximum Likelihood, clearly explained!!!
StatQuest with Josh Starmer
55 Brothers
Brothers
StatQuest with Josh Starmer
56 Using Linear Models for t-tests and ANOVA, Clearly Explained!!!
Using Linear Models for t-tests and ANOVA, Clearly Explained!!!
StatQuest with Josh Starmer
57 StatQuest: How to make a Mean Pizza Crust!!!
StatQuest: How to make a Mean Pizza Crust!!!
StatQuest with Josh Starmer
58 StatQuest: A gentle introduction to RNA-seq
StatQuest: A gentle introduction to RNA-seq
StatQuest with Josh Starmer
59 I'm Alive
I'm Alive
StatQuest with Josh Starmer
60 StatQuest: t-SNE, Clearly Explained
StatQuest: t-SNE, Clearly Explained
StatQuest with Josh Starmer

This video teaches viewers how to use Fisher's exact test and the Hypergeometric distribution to determine if a sample is enriched, using an example with M&M's colors. The test calculates a p-value to assess the significance of the sample.

Key Takeaways
  1. Calculate the probability of getting a specific sequence of colors
  2. Calculate the probability of getting a specific combination of colors
  3. Use the Hypergeometric distribution to calculate the p-value
  4. Interpret the p-value to determine if the sample is enriched
💡 Fisher's exact test can be used to determine if a sample is enriched, and the Hypergeometric distribution is used to calculate the p-value.

Related AI Lessons

Up next
How to Open OSM Files (OpenStreetMap Data)
File Extension Geeks
Watch →