Difference in Difference : Data Science Concepts

ritvikmath · Intermediate ·📐 ML Fundamentals ·4y ago

Key Takeaways

The video discusses the Difference in Difference statistical method, a technique used to estimate the effect of an intervention or treatment on an outcome variable, using an example of two pizzerias with different customer traffic, and explains how to apply this method to run an experiment after the fact, without the need for a controlled experiment, by assuming parallel trends between the treatment and control groups. The method is demonstrated using a chart showing the number of pizzas sold by

Full Transcript

[Music] hey everyone welcome back today we're going to be talking about a very cool statistical method called difference indifference let's dive right into the example so let's say there is a town called pai town and pai town has two pizzerias we have pizzeria a and pizzeria b pizzeria is a place that sells pizza so let's say that these two pizzerias are very similar to each other the only difference let's say the main major difference is the amount of customers that live around them so those customers are kind of represented by those little dots so b has a lot more customers around it so it gets a lot more traffic and a has fewer customers around it but let's say besides that they're very similar types of pizzerias so let's say that one day b drops the price of its pizza and as the chief data scientist for the town of pai town you observe the following statistics following the drop of price of bee's pizza you find that a sold 200 pizzas per day and then b is selling 1 000 pizzas per day this is all happening maybe a little bit of time after this drop in price occurs for pizzeria b so naive conclusion is that this drop led to an increase of 800 pizzas per day for pizzeria b right because now it's selling 2 000 a is only selling 200 so that must mean that b is now selling 800 more pizzas than it was before now probably most of you are very skeptical right is this actually true obviously there's some faulty logic going on but let's try to explain it and see how we can fix it to get what this number should actually be the flaw in logic that i made is that i mistakenly assumed that a and b are selling the exact same number of pizzas before the price drop and therefore after the price drop the difference between how many pizzas per day they're selling must be fully attributed to this price drop but let's see where that breaks down so this chart kind of is going to be the main thing we're looking at in this video it's going to be the entire story is built around this chart so this chart is showing two time periods before the price drop and after the price drop and in general when you're using these difference and difference methods you're going to need to have some kind of time based data so we have these two time periods before b drops its price and after b drops its price let's look at what happens before so before b drops its price a is selling 100 pizzas per day and b is actually selling five times that amount it's selling 500 pizzas per day the reason is because b has a lot more traffic around it that's the whole reason i kind of explained that before so b is already selling a lot more pizzas per day than a was even before the price drop now the price drop occurs and we observe how many pizzas they're selling after the price drop and let's say a is now selling 200 pizzas per day after the price drop so the main first observation is that a is actually increasing the number of pizzas it's selling just irrespective of b now after the price drop as we saw before b we now observe is selling 1 000 pizzas per day and we mistakenly just subtracted this 200 from this 1000 and we said that oh this price drop led to an increase of 800 pizzas being sold per day for b but here's where difference and difference comes in we assume parallel trends we'll talk about that more at the end of this video but we assume that if b had not dropped its price if it was just business as usual then we would assume that b's trajectory of the number of pizzas sold would follow this dotted line which is parallel to the trajectory of a before and after the price drop so we see that there's an increase of 100 for a and so we would assume also an increase of 100 for b so we would say that there's if there's no price drop b would now be selling 600 pizzas but it's actually selling 1 000 pizzas so let's quantify a couple of gaps in this diagram and lock in on what's the actual change in b's number of pizzas sold according to this price drop so the first one is this blue bracket here this is the observed difference that's that 800 pizzas per day that's the number we mistakenly assumed was the correct difference attributed to this price drop the second one is this green bracket here which is the expected difference between a's and b's number of pizzas sold if there was no price drop at all and that would be 400 pizzas per day now this is where the term difference and difference comes in because we subtract this green difference from this blue difference and let's think about intuitively what that means that means i'm taking the observed difference between these guys and taking away how much difference i would have just gotten naturally without any intervention happening and now i'm left with 400 pizzas per day which is 800 minus 400 and that is the difference in number of pizzas per day sold for b because of the price drop and so this is actually pretty wild cool method because it lets us kind of run an experiment after the fact notice that in the best case scenario we would have just run a controlled experiment opened up two pizzerias right next to each other that are really similar and we just changed the price of pizzas for one of them but obviously that takes a lot of investment you can't just open a pizzeria as a researcher you can't force existing pizzerias to change their prices so a lot of times we have to use this after the fact observational data to come to conclusions and that's what we're doing here so finally let's talk about the assumptions of difference in difference so it has all the same assumptions as the ordinary least squares model because we need all these things to be linear with the added very important assumption of parallel trends now let's think for a minute why we need parallel trends what if the diagram looked like this so a increased before and after like this and b increased before and after according to the red line if we assume parallel trends so if we assume this blue line would have been what b did if there was no price change then we will get this difference here between the red and the blue dots but what if b was just growing at a faster rate in general even without this price drop so that its actual difference without the change would have been this green line then we're going to get it wrong because the actual change attributed to the price drop would have been the red and the green dots difference but we don't see that so there's a lot of assumptions baked in but it is a very important tool for the data scientist to be able to run an experiment after the fact and get some kind of indication about the change attributed to a possible intervention so any questions welcome in the comments below like and subscribe see you next time

Original Description

Running an experiment ... without running an experiment. My Patreon : https://www.patreon.com/user?u=49277905
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from ritvikmath · ritvikmath · 0 of 60

← Previous Next →
1 Math Team Update
Math Team Update
ritvikmath
2 Single Variable Calculus Volume of a Sphere - Proof 1
Single Variable Calculus Volume of a Sphere - Proof 1
ritvikmath
3 Single Variable Calculus Volume of a Sphere - Proof 2
Single Variable Calculus Volume of a Sphere - Proof 2
ritvikmath
4 Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
ritvikmath
5 Multivariable Calculus Volume of a Sphere Proof - Double Integrals
Multivariable Calculus Volume of a Sphere Proof - Double Integrals
ritvikmath
6 The Euclidian Algorithm
The Euclidian Algorithm
ritvikmath
7 Proving the Chain Rule
Proving the Chain Rule
ritvikmath
8 Proving the Fundamental Theorem of Calculus Part 1
Proving the Fundamental Theorem of Calculus Part 1
ritvikmath
9 Proving the Fundamental Theorem of Calculus Part 2
Proving the Fundamental Theorem of Calculus Part 2
ritvikmath
10 Math Puzzle - Poison Perplexity
Math Puzzle - Poison Perplexity
ritvikmath
11 Math Puzzle - Poison Perplexity - Solution
Math Puzzle - Poison Perplexity - Solution
ritvikmath
12 Expected Value and Variance of Continuous Random Variables (Calculus)
Expected Value and Variance of Continuous Random Variables (Calculus)
ritvikmath
13 Expected Value and Variance of Discrete Random Variables (No Calculus)
Expected Value and Variance of Discrete Random Variables (No Calculus)
ritvikmath
14 Array Method
Array Method
ritvikmath
15 Complex Power Series and their Derivatives
Complex Power Series and their Derivatives
ritvikmath
16 Distributions - Intro
Distributions - Intro
ritvikmath
17 The Poisson Distribution
The Poisson Distribution
ritvikmath
18 The Bernoulli Distribution
The Bernoulli Distribution
ritvikmath
19 The Binomial Distribution
The Binomial Distribution
ritvikmath
20 The Continuous Uniform Distribution
The Continuous Uniform Distribution
ritvikmath
21 The Geometric Distribution
The Geometric Distribution
ritvikmath
22 The Triangular Distribution
The Triangular Distribution
ritvikmath
23 The Exponential Distribution
The Exponential Distribution
ritvikmath
24 The Borel Distribution + Notes on Poisson Distribution
The Borel Distribution + Notes on Poisson Distribution
ritvikmath
25 The Gamma Distribution
The Gamma Distribution
ritvikmath
26 The Normal Distribution
The Normal Distribution
ritvikmath
27 The Laplace Distribution
The Laplace Distribution
ritvikmath
28 The Chi - Squared Distribution
The Chi - Squared Distribution
ritvikmath
29 Overfitting
Overfitting
ritvikmath
30 Vector Norms
Vector Norms
ritvikmath
31 Truths Behind the Titanic : K-Nearest Neighbor
Truths Behind the Titanic : K-Nearest Neighbor
ritvikmath
32 The Mathematics of Breakups
The Mathematics of Breakups
ritvikmath
33 Sillyfish
Sillyfish
ritvikmath
34 Finding Optimal Paths - Dynamic Programming
Finding Optimal Paths - Dynamic Programming
ritvikmath
35 HowToDataScience : Scraping Twitter Data
HowToDataScience : Scraping Twitter Data
ritvikmath
36 Decision Trees
Decision Trees
ritvikmath
37 Perceptron
Perceptron
ritvikmath
38 Naive Bayes
Naive Bayes
ritvikmath
39 K-Nearest Neighbor
K-Nearest Neighbor
ritvikmath
40 Evaluating Machine Learning Models
Evaluating Machine Learning Models
ritvikmath
41 Decision Tree Pruning
Decision Tree Pruning
ritvikmath
42 K-Means Clustering
K-Means Clustering
ritvikmath
43 Gaussian Mixture Model
Gaussian Mixture Model
ritvikmath
44 Data Science - Fuzzy Record Matching
Data Science - Fuzzy Record Matching
ritvikmath
45 Time Series Talk : Autocorrelation and Partial Autocorrelation
Time Series Talk : Autocorrelation and Partial Autocorrelation
ritvikmath
46 Time Series Talk : Autoregressive Model
Time Series Talk : Autoregressive Model
ritvikmath
47 Time Series Talk : Moving Average Model
Time Series Talk : Moving Average Model
ritvikmath
48 Time Series Talk : ARMA Model
Time Series Talk : ARMA Model
ritvikmath
49 Time Series Talk : ARCH Model
Time Series Talk : ARCH Model
ritvikmath
50 Time Series Talk : White Noise
Time Series Talk : White Noise
ritvikmath
51 Time Series Talk : Stationarity
Time Series Talk : Stationarity
ritvikmath
52 Time Series Talk : ARIMA Model
Time Series Talk : ARIMA Model
ritvikmath
53 Time Series Talk : Lag Operator
Time Series Talk : Lag Operator
ritvikmath
54 Time Series Talk : What is Seasonality ?
Time Series Talk : What is Seasonality ?
ritvikmath
55 Time Series Talk : Seasonal ARIMA Model
Time Series Talk : Seasonal ARIMA Model
ritvikmath
56 So ... What Actually is a Matrix ? : Data Science Basics
So ... What Actually is a Matrix ? : Data Science Basics
ritvikmath
57 Derivative of a Matrix : Data Science Basics
Derivative of a Matrix : Data Science Basics
ritvikmath
58 Basics of PCA (Principal Component Analysis) : Data Science Concepts
Basics of PCA (Principal Component Analysis) : Data Science Concepts
ritvikmath
59 Eigenvalues & Eigenvectors : Data Science Basics
Eigenvalues & Eigenvectors : Data Science Basics
ritvikmath
60 The Covariance Matrix : Data Science Basics
The Covariance Matrix : Data Science Basics
ritvikmath

The Difference in Difference method is a statistical technique used to estimate the effect of an intervention or treatment on an outcome variable, by comparing the difference in outcomes between a treatment group and a control group, before and after the intervention. This method assumes parallel trends between the two groups, and can be used to run an experiment after the fact, without the need for a controlled experiment.

Key Takeaways
  1. Identify the treatment and control groups
  2. Collect data on the outcome variable before and after the intervention
  3. Apply the Difference in Difference method to estimate the effect of the intervention
  4. Evaluate the assumptions of the method, including linearity and parallel trends
💡 The Difference in Difference method can be used to estimate causal effects in observational data, without the need for a controlled experiment, by assuming parallel trends between the treatment and control groups.

Related AI Lessons

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development
Medium · AI
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for advancing AI research
Medium · Data Science
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Explore the geometric assumptions underlying neural networks and their implications on manifold learning and projections
Medium · Deep Learning
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn about the hidden assumptions of neural geometry and how manifolds and projections impact neural network performance
Medium · LLM
Up next
Machine Learning Project for Final Year Students | ML Project Idea @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →