Difference in Difference : Data Science Concepts

ritvikmath · Intermediate ·📐 ML Fundamentals ·4y ago

Skills: ML Maths Basics80%

Key Takeaways

The video discusses the Difference in Difference statistical method, a technique used to estimate the effect of an intervention or treatment on an outcome variable, using an example of two pizzerias with different customer traffic, and explains how to apply this method to run an experiment after the fact, without the need for a controlled experiment, by assuming parallel trends between the treatment and control groups. The method is demonstrated using a chart showing the number of pizzas sold by

Full Transcript

[Music] hey everyone welcome back today we're going to be talking about a very cool statistical method called difference indifference let's dive right into the example so let's say there is a town called pai town and pai town has two pizzerias we have pizzeria a and pizzeria b pizzeria is a place that sells pizza so let's say that these two pizzerias are very similar to each other the only difference let's say the main major difference is the amount of customers that live around them so those customers are kind of represented by those little dots so b has a lot more customers around it so it gets a lot more traffic and a has fewer customers around it but let's say besides that they're very similar types of pizzerias so let's say that one day b drops the price of its pizza and as the chief data scientist for the town of pai town you observe the following statistics following the drop of price of bee's pizza you find that a sold 200 pizzas per day and then b is selling 1 000 pizzas per day this is all happening maybe a little bit of time after this drop in price occurs for pizzeria b so naive conclusion is that this drop led to an increase of 800 pizzas per day for pizzeria b right because now it's selling 2 000 a is only selling 200 so that must mean that b is now selling 800 more pizzas than it was before now probably most of you are very skeptical right is this actually true obviously there's some faulty logic going on but let's try to explain it and see how we can fix it to get what this number should actually be the flaw in logic that i made is that i mistakenly assumed that a and b are selling the exact same number of pizzas before the price drop and therefore after the price drop the difference between how many pizzas per day they're selling must be fully attributed to this price drop but let's see where that breaks down so this chart kind of is going to be the main thing we're looking at in this video it's going to be the entire story is built around this chart so this chart is showing two time periods before the price drop and after the price drop and in general when you're using these difference and difference methods you're going to need to have some kind of time based data so we have these two time periods before b drops its price and after b drops its price let's look at what happens before so before b drops its price a is selling 100 pizzas per day and b is actually selling five times that amount it's selling 500 pizzas per day the reason is because b has a lot more traffic around it that's the whole reason i kind of explained that before so b is already selling a lot more pizzas per day than a was even before the price drop now the price drop occurs and we observe how many pizzas they're selling after the price drop and let's say a is now selling 200 pizzas per day after the price drop so the main first observation is that a is actually increasing the number of pizzas it's selling just irrespective of b now after the price drop as we saw before b we now observe is selling 1 000 pizzas per day and we mistakenly just subtracted this 200 from this 1000 and we said that oh this price drop led to an increase of 800 pizzas being sold per day for b but here's where difference and difference comes in we assume parallel trends we'll talk about that more at the end of this video but we assume that if b had not dropped its price if it was just business as usual then we would assume that b's trajectory of the number of pizzas sold would follow this dotted line which is parallel to the trajectory of a before and after the price drop so we see that there's an increase of 100 for a and so we would assume also an increase of 100 for b so we would say that there's if there's no price drop b would now be selling 600 pizzas but it's actually selling 1 000 pizzas so let's quantify a couple of gaps in this diagram and lock in on what's the actual change in b's number of pizzas sold according to this price drop so the first one is this blue bracket here this is the observed difference that's that 800 pizzas per day that's the number we mistakenly assumed was the correct difference attributed to this price drop the second one is this green bracket here which is the expected difference between a's and b's number of pizzas sold if there was no price drop at all and that would be 400 pizzas per day now this is where the term difference and difference comes in because we subtract this green difference from this blue difference and let's think about intuitively what that means that means i'm taking the observed difference between these guys and taking away how much difference i would have just gotten naturally without any intervention happening and now i'm left with 400 pizzas per day which is 800 minus 400 and that is the difference in number of pizzas per day sold for b because of the price drop and so this is actually pretty wild cool method because it lets us kind of run an experiment after the fact notice that in the best case scenario we would have just run a controlled experiment opened up two pizzerias right next to each other that are really similar and we just changed the price of pizzas for one of them but obviously that takes a lot of investment you can't just open a pizzeria as a researcher you can't force existing pizzerias to change their prices so a lot of times we have to use this after the fact observational data to come to conclusions and that's what we're doing here so finally let's talk about the assumptions of difference in difference so it has all the same assumptions as the ordinary least squares model because we need all these things to be linear with the added very important assumption of parallel trends now let's think for a minute why we need parallel trends what if the diagram looked like this so a increased before and after like this and b increased before and after according to the red line if we assume parallel trends so if we assume this blue line would have been what b did if there was no price change then we will get this difference here between the red and the blue dots but what if b was just growing at a faster rate in general even without this price drop so that its actual difference without the change would have been this green line then we're going to get it wrong because the actual change attributed to the price drop would have been the red and the green dots difference but we don't see that so there's a lot of assumptions baked in but it is a very important tool for the data scientist to be able to run an experiment after the fact and get some kind of indication about the change attributed to a possible intervention so any questions welcome in the comments below like and subscribe see you next time

Original Description

Running an experiment ... without running an experiment. My Patreon : https://www.patreon.com/user?u=49277905

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from ritvikmath · ritvikmath · 0 of 60

← Previous Next →

Math Team Update

Math Team Update

Single Variable Calculus Volume of a Sphere - Proof 1

Single Variable Calculus Volume of a Sphere - Proof 1

Single Variable Calculus Volume of a Sphere - Proof 2

Single Variable Calculus Volume of a Sphere - Proof 2

Multivariable Calculus Volume of a Sphere Proof - Triple Integrals

Multivariable Calculus Volume of a Sphere Proof - Triple Integrals

Multivariable Calculus Volume of a Sphere Proof - Double Integrals

Multivariable Calculus Volume of a Sphere Proof - Double Integrals

The Euclidian Algorithm

The Euclidian Algorithm

Proving the Chain Rule

Proving the Chain Rule

Proving the Fundamental Theorem of Calculus Part 1

Proving the Fundamental Theorem of Calculus Part 1

Proving the Fundamental Theorem of Calculus Part 2

Proving the Fundamental Theorem of Calculus Part 2

Math Puzzle - Poison Perplexity

Math Puzzle - Poison Perplexity

Math Puzzle - Poison Perplexity - Solution

Math Puzzle - Poison Perplexity - Solution

Expected Value and Variance of Continuous Random Variables (Calculus)

Expected Value and Variance of Continuous Random Variables (Calculus)

Expected Value and Variance of Discrete Random Variables (No Calculus)

Expected Value and Variance of Discrete Random Variables (No Calculus)

Complex Power Series and their Derivatives

Complex Power Series and their Derivatives

Distributions - Intro

Distributions - Intro

The Poisson Distribution

The Poisson Distribution

The Bernoulli Distribution

The Bernoulli Distribution

The Binomial Distribution

The Binomial Distribution

The Continuous Uniform Distribution

The Continuous Uniform Distribution

The Geometric Distribution

The Geometric Distribution

The Triangular Distribution

The Triangular Distribution

The Exponential Distribution

The Exponential Distribution

The Borel Distribution + Notes on Poisson Distribution

The Borel Distribution + Notes on Poisson Distribution

The Gamma Distribution

The Gamma Distribution

The Normal Distribution

The Normal Distribution

The Laplace Distribution

The Laplace Distribution

The Chi - Squared Distribution

The Chi - Squared Distribution

Truths Behind the Titanic : K-Nearest Neighbor

Truths Behind the Titanic : K-Nearest Neighbor

The Mathematics of Breakups

The Mathematics of Breakups

Finding Optimal Paths - Dynamic Programming

Finding Optimal Paths - Dynamic Programming

HowToDataScience : Scraping Twitter Data

HowToDataScience : Scraping Twitter Data

K-Nearest Neighbor

K-Nearest Neighbor

Evaluating Machine Learning Models

Evaluating Machine Learning Models

Decision Tree Pruning

Decision Tree Pruning

K-Means Clustering

K-Means Clustering

Gaussian Mixture Model

Gaussian Mixture Model

Data Science - Fuzzy Record Matching

Data Science - Fuzzy Record Matching

Time Series Talk : Autocorrelation and Partial Autocorrelation

Time Series Talk : Autocorrelation and Partial Autocorrelation

Time Series Talk : Autoregressive Model

Time Series Talk : Autoregressive Model

Time Series Talk : Moving Average Model

Time Series Talk : Moving Average Model

Time Series Talk : ARMA Model

Time Series Talk : ARMA Model

Time Series Talk : ARCH Model

Time Series Talk : ARCH Model

Time Series Talk : White Noise

Time Series Talk : White Noise

Time Series Talk : Stationarity

Time Series Talk : Stationarity

Time Series Talk : ARIMA Model

Time Series Talk : ARIMA Model

Time Series Talk : Lag Operator

Time Series Talk : Lag Operator

Time Series Talk : What is Seasonality ?

Time Series Talk : What is Seasonality ?

Time Series Talk : Seasonal ARIMA Model

Time Series Talk : Seasonal ARIMA Model

So ... What Actually is a Matrix ? : Data Science Basics

So ... What Actually is a Matrix ? : Data Science Basics

Derivative of a Matrix : Data Science Basics

Derivative of a Matrix : Data Science Basics

Basics of PCA (Principal Component Analysis) : Data Science Concepts

Basics of PCA (Principal Component Analysis) : Data Science Concepts

Eigenvalues & Eigenvectors : Data Science Basics

Eigenvalues & Eigenvectors : Data Science Basics

The Covariance Matrix : Data Science Basics

The Covariance Matrix : Data Science Basics

The Difference in Difference method is a statistical technique used to estimate the effect of an intervention or treatment on an outcome variable, by comparing the difference in outcomes between a treatment group and a control group, before and after the intervention. This method assumes parallel trends between the two groups, and can be used to run an experiment after the fact, without the need for a controlled experiment.

Key Takeaways

Identify the treatment and control groups
Collect data on the outcome variable before and after the intervention
Apply the Difference in Difference method to estimate the effect of the intervention
Evaluate the assumptions of the method, including linearity and parallel trends

💡 The Difference in Difference method can be used to estimate causal effects in observational data, without the need for a controlled experiment, by assuming parallel trends between the treatment and control groups.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for advancing AI research

Medium · Data Science

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Explore the geometric assumptions underlying neural networks and their implications on manifold learning and projections

Medium · Deep Learning

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry

Learn about the hidden assumptions of neural geometry and how manifolds and projections impact neural network performance

Machine Learning Project for Final Year Students | ML Project Idea @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB