Maximum Likelihood - Explained

DataMListic · Beginner ·🔢 Mathematical Foundations ·3mo ago

Skills: ML Maths Basics90%Supervised Learning60%

Key Takeaways

The video explains Maximum Likelihood Estimation (MLE) using the normal distribution, covering how to estimate model parameters like mean (μ) and standard deviation (σ) from data.

Full Transcript

You've collected some data, a handful of measurements scattered along a number line. You believe they came from a normal distribution. But which one? There are infinitely many normal distributions out there, each with a different center and spread. So, how do you find the right one? Let's start with the normal distribution itself. It's this classic bell-shaped curve, and it's completely determined by just two numbers. The mean mo tells you where the center sits. Shift mu to the right and the whole curve slides right. Shift it left and it follows. Then there's sigma, the standard deviation. It controls the width. A large sigma gives you a wide flat curve. A small sigma makes it tall and narrow. So really picking a normal distribution just means choosing mu and sigma. Now here's where it gets interesting. We've got our data points sitting on the axis and we place a normal distribution over them. If the curve is centered in the wrong spot, our data points fall in regions where the curve is low. The distribution thinks those values are unlikely. But as we slide the curve toward the data, the points land under taller parts of the bell. Those vertical lines from each point up to the curve show the density and taller lines mean the distribution considers that data more plausible. So let's turn this into something precise. Instead of asking what's the probability of data given a distribution, we flip the question inside. It's like how likely are these parameters given the data we actually observed. The likelihood is the product of all those individual densities. For a specific MMO, we read off the height at each data point, multiply them together, and get one number. Now, if we do this for every possible mu, we trace out the likelihood function and the peak of that curve, that's our maximum likelihood estimate. There's a practical issue, though. We're multiplying several small numbers together and the product gets tiny fast. So instead, we take the logarithm. Since log turns products into sums, the math becomes much friendlier. And because the log function is monotonically increasing, it doesn't move the peak. The log likelihood has its maximum at the exact same place as the likelihood. One last distinction worth making. Probability and likelihood use the same formula, but they ask different questions. With probability, you fix the distribution and ask about different data values. That's the area under the curve. With likelihood, you fix the data and ask which parameters make it most plausible. That's reading the curve's height at your observed points. Same formula, different perspective. And that's basically it. The core idea behind maximum likelihood estimation. Thanks for watching. See you next time.

Original Description

Maximum Likelihood Estimation (MLE) is a fundamental concept in statistics and machine learning used to estimate model parameters from data. This video explains maximum likelihood estimation intuitively using the normal distribution, showing how parameters like the mean (μ) and standard deviation (σ) are chosen to best fit observed data. Learn how likelihood works, why the log-likelihood is used in practice, and the key difference between probability and likelihood in statistical modeling. *Related Videos* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ K-Means Clustering: https://youtu.be/dyG9cj5RKL0 Support Vector Machines: https://youtu.be/K1EcCjDD_q4 The Hessian Matrix: https://youtu.be/9tp1kULwU2w The Jacobian Matrix: https://youtu.be/6FesMicc844 Bayesian Optimization: https://youtu.be/Kq6_kzlwSUQ Hyperparameters Tuning: Grid Search vs Random Search: https://youtu.be/G-fXV-o9QV8 The Kernel Trick: https://youtu.be/N_RQj4OL1mg Cross-Entropy - Explained: https://youtu.be/Fv98vtitmiA Dropout - Explained: https://youtu.be/FDF_Q3_98GQ Overfitting vs Underfitting: https://youtu.be/B9rhzg6_LLw Why Models Overfit and Underfit - The Bias Variance Trade-off: https://youtu.be/5mbX6ITznHk Least Squares vs Maximum Likelihood: https://youtu.be/WCP98USBZ0w *Follow Me* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ 🐦 X: @datamlistic https://x.com/datamlistic 📸 Instagram: @datamlistic https://www.instagram.com/datamlistic 📱 TikTok: @datamlistic https://www.tiktok.com/@datamlistic 👔 Linkedin: https://www.linkedin.com/company/datamlistic *Channel Support* ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ The best way to support the channel is to share the content. ;) If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary) ► Patreon: https://www.patreon.com/datamlistic ► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq ► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281 ► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyun

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

This video teaches the concept of Maximum Likelihood Estimation (MLE) using the normal distribution, explaining how to estimate model parameters from data and introducing the likelihood function and log likelihood. MLE is a fundamental concept in statistics and machine learning, and understanding it is crucial for parameter estimation and model fitting. By watching this video, viewers will learn how to apply MLE to real-world problems and gain a deeper understanding of the underlying mathematics

Key Takeaways

Collect data points
Assume a normal distribution
Define the mean (μ) and standard deviation (σ) parameters
Place a normal distribution over the data points
Calculate the likelihood function
Take the logarithm of the likelihood function
Find the maximum likelihood estimate

💡 The likelihood function and log likelihood are used to estimate model parameters, and the maximum likelihood estimate is the peak of the likelihood function curve.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Related AI Lessons

Super Mario is mathier than you think

Super Mario's world is full of mathematical concepts, making it a great example of how math is used in real-world problem-solving

MIT Technology Review

A Geometry Puzzle With 3 Circles

Solve a geometry puzzle involving 3 circles using mathematical reasoning and visualization techniques

Medium · Data Science

The Consecutive Integers Divisibility Trick

Learn the Consecutive Integers Divisibility Trick to simplify difficult proofs in mathematics and programming

Medium · Programming

The Mayans Invented Zero Before Most of the World — Here Is Their Number System in Python

Learn about the Mayan number system and its implementation in Python, highlighting the importance of zero in their base-20 system

Medium · Python

How to Open OSM Files (OpenStreetMap Data)

File Extension Geeks