Gibbs Sampling : Data Science Concepts
Skills:
ML Maths Basics70%
Key Takeaways
Introduces Gibbs sampling for multivariate distributions using a two-dimensional normal distribution example
Full Transcript
[Music] hey everyone welcome back so today we'll be talking about another mcmc method called gibbs sampling and i think this video will be pretty short i just have a couple things to say on gibbs sampling so uh first off why would you use gibbs sampling so this is only really makes sense when you're sampling from a multivariate distribution so in most of our past videos just to keep the example simple we've been sampling from single dimensional distributions where there's only one variable give sampling is useful in the case where you have two or more dimensions for the distribution that you're trying to sample from so we're going to be working with the easiest such case with a two-dimensional distribution today and just to keep things concrete our goal is to sample from the two-dimensional normal distribution or two-dimensional gaussian distribution with mean zero zero and this pretty simple covariance matrix now i'll just say off the bat that there are known ways to sample from this distribution that are not give sampling but we're going to keep things simple and assume that we're going to be using gibbs sampling today to sample from this distribution just to show you how gibb sampling actually works in practice so if we were able to sample from this distribution we would get some kind of plot like this so there's a high density around the mean which is zero zero for x and y and the distribution is tilted like this because of these one halves in the covariance matrix and we can also show that the correlation between the x and the y variable is one half so you get a distribution that looks like that and so the case when you use gibbs sampling so you want to sample from a multivariate distribution now what is the secondary case for knowing you should use gibb sampling this is the most important condition so sampling from the joint distribution which is p x and y so that would be the joint pdf for the multivariate normal distribution for the two-dimensional normal distribution we're going to say sampling from that is difficult so you may have the equation for it you might not have the equation for it but either way sampling from that joint distribution getting a pair of x and y's simultaneously is difficult but what is easy is sampling from the conditional distributions and by conditional distributions i mean the distribution of x given a fixed value of y and also the distribution of y given a fixed value of x and as you ramp up the number of dimensions in your distribution three four ten dimensions all these conditional densities so the density of the first variable given the others the density of the second variable given the others we're assuming all of those are relatively easy to sample from so those are all sampling from a single variable distribution which is that first variable holding all the other variables fixed so that is the first thing to get in your mind which is that we use gibbs sampling for multivariate distributions exactly when sampling from the joint distribution is tricky or impossible but we can easily sample from all the conditionals and now that begs the question what are the conditional distributions so x given y and y given x for this particular example and we can show i won't derive it for you here but we can show that if you're sampling x given some fixed value of y then it's going to be rho which is the correlation between x and y times that fixed value of y and the variance is going to be 1 minus rho squared and so for us since rho is equal to 1 half we just said that before this simplifies to normal distribution y over 2 and 3 4 as our variance so in more easy terms what that's saying is that if you have a fixed value of y and you want to sample x then you can sample from the single variable normal distribution with mean y over 2 and variance 3 4 and since this whole problem is symmetric the conditional distribution of y given x looks exactly the same just substituting x for y and so gives sampling of proceeds as follows extremely simple algorithm we start by initializing some x naught y naught so that can be anywhere on the x y preferably somewhere that's sort of close to the center of the distribution but it could really be anywhere just a matter of how fast it's going to converge and the next thing we do is we change x so we're going to keep the y variable fixed for now so this was our first sample and asking for our next sample we're going to be keeping the y variable fixed and we are going to be sampling the new value of the x variable from this conditional distribution which is the new value of x variable given the existing value of the y variable which is y0 and then the next thing we do is we sample a new value for the y variable so y1 given some fixed value of the x variable namely the one that we just sampled in step two so basically what's happening is that we are getting a new x sampling from the existing value of y then we get a new y sampling from that new value of x and then we just rinse and repeat as many times as many samples as you would like and it's really nice because we can see this visually at least for the 2d case in this chart here so let's say this is your first sample x naught y naught and now we said that we're going to sample a new value from x but keep the current value of y fixed that's equivalent to just moving somewhere in the x direction so this is our next sample and then to get the next sample after that we're going to swap so we're going to keep the value of x fixed and then sample a new value for y and then we just swap again we sample a new value for x keeping y fixed and we just continue on and on like that as many times as one and what you'll find even though i won't prove it if you want to prove that gibbs sampling works it's actually even easier than proving that metropolis hastings work so you can just use the detailed balance condition again but what you'll find is that if you take enough of these samples it's going to be exactly sampling from this multivariate distribution here that is you're going to get a lot of samples around here and you'll get less samples around the tails of the distribution so that's gibbs sampling in a nutshell and you can extend this to as many variables as your distribution is it's just that you don't have two steps here you have i'm going to sample the first variable given fixed values for the others then i sample the next variable given fixed values for the others and you just keep going and gibbs sampling is pretty simple there's a lot of variance to it sometimes people do this sampling in order sometimes people do the sampling randomly sometimes people even sample blocks of variables given blocks of other variables so there's a lot of directions you can go with this but the general philosophy the general guiding principle of gibbs sampling is that conditional distributions are easy to sample from for this problem at hand but the joint distribution is not and the last thing i'll say in this video is just some pitfalls some places that gibbs sampling doesn't work out the way you expect and the first one is this very contrived case here where you have just zero and one in the y direction and zero and one in the x direction and there's a one half probability at zero zero there's a one half probability at one one and there's a zero probability here you can probably already see the issue here let's say i start off at 0 0 and because of the way gibbs sampling works i can only either go in the x direction or i can go in the y direction because of this trading off x and y direction principle but you see the problem immediately if i'm going in the x direction i couldn't go here because there's no probability there so i'm going to have to stay here if i however go in the y direction same exact issue i can't go here so i'm staying here so i can never actually sample from this 1-1 because i can't get there in one step okay so that's one of the shortcomings of gibbs sampling another one is this phenomenon called probability spikes that is totally a term i just made up please don't write that in any official report but what i mean is that you have a distribution where there is a spike in probability so for example consider this 2d distribution this little green dot here is where i'm saying there's a lot of probability there there's a very high probability density there and everywhere else in this distribution i've marked ls which means there's a very low density there let's think about the issues that we get using gibb sampling here let's say we're currently in a low region again we can only sample in the x direction or the y direction which means we're probably going to be at a low region again and that's exactly the first part of the problem is that if we're in a low region because we can only move in the x and y directions at one time then we're going to stay in these low probabilities for a long time conversely if we are in the high density bubble then think about moving in the x direction you're probably going to stay in the high density bubble because in the x direction there's no other high density areas and also in the y direction you're going to stay in the high density bubble so although gibbs sampling will work theoretically it's going to take unfeasibly long to converge to the actual distribution because you're going to stay in lows and you're going to stay in highs so this is one of the shortcomings too anyways um that was just gibbs sampling in a nutshell if you have any questions please leave them in the comments below please subscribe for more videos just like this and i will see you next time
Original Description
Another MCMC Method. Gibbs sampling is great for multivariate distributions where conditional densities are *easy* to sample from.
To emphasize a point in the video:
- First sample is (x0,y0)
- Next Sample is (x1,y1)
- Next Sample is (x2,y2)
...
That is, we update *all* variables once to get a new sample.
Intro MCMC Video : https://www.youtube.com/watch?v=yApmR-c_hKU
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from ritvikmath · ritvikmath · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Math Team Update
ritvikmath
Single Variable Calculus Volume of a Sphere - Proof 1
ritvikmath
Single Variable Calculus Volume of a Sphere - Proof 2
ritvikmath
Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
ritvikmath
Multivariable Calculus Volume of a Sphere Proof - Double Integrals
ritvikmath
The Euclidian Algorithm
ritvikmath
Proving the Chain Rule
ritvikmath
Proving the Fundamental Theorem of Calculus Part 1
ritvikmath
Proving the Fundamental Theorem of Calculus Part 2
ritvikmath
Math Puzzle - Poison Perplexity
ritvikmath
Math Puzzle - Poison Perplexity - Solution
ritvikmath
Expected Value and Variance of Continuous Random Variables (Calculus)
ritvikmath
Expected Value and Variance of Discrete Random Variables (No Calculus)
ritvikmath
Array Method
ritvikmath
Complex Power Series and their Derivatives
ritvikmath
Distributions - Intro
ritvikmath
The Poisson Distribution
ritvikmath
The Bernoulli Distribution
ritvikmath
The Binomial Distribution
ritvikmath
The Continuous Uniform Distribution
ritvikmath
The Geometric Distribution
ritvikmath
The Triangular Distribution
ritvikmath
The Exponential Distribution
ritvikmath
The Borel Distribution + Notes on Poisson Distribution
ritvikmath
The Gamma Distribution
ritvikmath
The Normal Distribution
ritvikmath
The Laplace Distribution
ritvikmath
The Chi - Squared Distribution
ritvikmath
Overfitting
ritvikmath
Vector Norms
ritvikmath
Truths Behind the Titanic : K-Nearest Neighbor
ritvikmath
The Mathematics of Breakups
ritvikmath
Sillyfish
ritvikmath
Finding Optimal Paths - Dynamic Programming
ritvikmath
HowToDataScience : Scraping Twitter Data
ritvikmath
Decision Trees
ritvikmath
Perceptron
ritvikmath
Naive Bayes
ritvikmath
K-Nearest Neighbor
ritvikmath
Evaluating Machine Learning Models
ritvikmath
Decision Tree Pruning
ritvikmath
K-Means Clustering
ritvikmath
Gaussian Mixture Model
ritvikmath
Data Science - Fuzzy Record Matching
ritvikmath
Time Series Talk : Autocorrelation and Partial Autocorrelation
ritvikmath
Time Series Talk : Autoregressive Model
ritvikmath
Time Series Talk : Moving Average Model
ritvikmath
Time Series Talk : ARMA Model
ritvikmath
Time Series Talk : ARCH Model
ritvikmath
Time Series Talk : White Noise
ritvikmath
Time Series Talk : Stationarity
ritvikmath
Time Series Talk : ARIMA Model
ritvikmath
Time Series Talk : Lag Operator
ritvikmath
Time Series Talk : What is Seasonality ?
ritvikmath
Time Series Talk : Seasonal ARIMA Model
ritvikmath
So ... What Actually is a Matrix ? : Data Science Basics
ritvikmath
Derivative of a Matrix : Data Science Basics
ritvikmath
Basics of PCA (Principal Component Analysis) : Data Science Concepts
ritvikmath
Eigenvalues & Eigenvectors : Data Science Basics
ritvikmath
The Covariance Matrix : Data Science Basics
ritvikmath
More on: ML Maths Basics
View skill →Related Reads
📰
📰
📰
📰
The AI Problem That Was Never About AI
Medium · AI
What If Your Surgical Stitches Could Tell You an Infection Is Coming?
Medium · AI
The AI RAM crisis: did legacy tech just give up its seat to China?
Medium · AI
The Great AI Quiet Period: Why No Frontier Model Launched This Week (July 2026)
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI