Maximum Likelihood - Explained
Key Takeaways
The video explains Maximum Likelihood Estimation (MLE) using the normal distribution, covering how to estimate model parameters like mean (μ) and standard deviation (σ) from data.
Full Transcript
You've collected some data, a handful of measurements scattered along a number line. You believe they came from a normal distribution. But which one? There are infinitely many normal distributions out there, each with a different center and spread. So, how do you find the right one? Let's start with the normal distribution itself. It's this classic bell-shaped curve, and it's completely determined by just two numbers. The mean mo tells you where the center sits. Shift mu to the right and the whole curve slides right. Shift it left and it follows. Then there's sigma, the standard deviation. It controls the width. A large sigma gives you a wide flat curve. A small sigma makes it tall and narrow. So really picking a normal distribution just means choosing mu and sigma. Now here's where it gets interesting. We've got our data points sitting on the axis and we place a normal distribution over them. If the curve is centered in the wrong spot, our data points fall in regions where the curve is low. The distribution thinks those values are unlikely. But as we slide the curve toward the data, the points land under taller parts of the bell. Those vertical lines from each point up to the curve show the density and taller lines mean the distribution considers that data more plausible. So let's turn this into something precise. Instead of asking what's the probability of data given a distribution, we flip the question inside. It's like how likely are these parameters given the data we actually observed. The likelihood is the product of all those individual densities. For a specific MMO, we read off the height at each data point, multiply them together, and get one number. Now, if we do this for every possible mu, we trace out the likelihood function and the peak of that curve, that's our maximum likelihood estimate. There's a practical issue, though. We're multiplying several small numbers together and the product gets tiny fast. So instead, we take the logarithm. Since log turns products into sums, the math becomes much friendlier. And because the log function is monotonically increasing, it doesn't move the peak. The log likelihood has its maximum at the exact same place as the likelihood. One last distinction worth making. Probability and likelihood use the same formula, but they ask different questions. With probability, you fix the distribution and ask about different data values. That's the area under the curve. With likelihood, you fix the data and ask which parameters make it most plausible. That's reading the curve's height at your observed points. Same formula, different perspective. And that's basically it. The core idea behind maximum likelihood estimation. Thanks for watching. See you next time.
Original Description
Maximum Likelihood Estimation (MLE) is a fundamental concept in statistics and machine learning used to estimate model parameters from data. This video explains maximum likelihood estimation intuitively using the normal distribution, showing how parameters like the mean (μ) and standard deviation (σ) are chosen to best fit observed data. Learn how likelihood works, why the log-likelihood is used in practice, and the key difference between probability and likelihood in statistical modeling.
*Related Videos*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
K-Means Clustering: https://youtu.be/dyG9cj5RKL0
Support Vector Machines: https://youtu.be/K1EcCjDD_q4
The Hessian Matrix: https://youtu.be/9tp1kULwU2w
The Jacobian Matrix: https://youtu.be/6FesMicc844
Bayesian Optimization: https://youtu.be/Kq6_kzlwSUQ
Hyperparameters Tuning: Grid Search vs Random Search: https://youtu.be/G-fXV-o9QV8
The Kernel Trick: https://youtu.be/N_RQj4OL1mg
Cross-Entropy - Explained: https://youtu.be/Fv98vtitmiA
Dropout - Explained: https://youtu.be/FDF_Q3_98GQ
Overfitting vs Underfitting: https://youtu.be/B9rhzg6_LLw
Why Models Overfit and Underfit - The Bias Variance Trade-off: https://youtu.be/5mbX6ITznHk
Least Squares vs Maximum Likelihood: https://youtu.be/WCP98USBZ0w
*Follow Me*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 X: @datamlistic https://x.com/datamlistic
📸 Instagram: @datamlistic https://www.instagram.com/datamlistic
📱 TikTok: @datamlistic https://www.tiktok.com/@datamlistic
👔 Linkedin: https://www.linkedin.com/company/datamlistic
*Channel Support*
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: https://www.patreon.com/datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyun
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: ML Maths Basics
View skill →Related AI Lessons
🎓
Tutor Explanation
DeepCamp AI