Difference in Difference : Data Science Concepts
Skills:
ML Maths Basics80%
Key Takeaways
The video discusses the Difference in Difference statistical method, a technique used to estimate the effect of an intervention or treatment on an outcome variable, using an example of two pizzerias with different customer traffic, and explains how to apply this method to run an experiment after the fact, without the need for a controlled experiment, by assuming parallel trends between the treatment and control groups. The method is demonstrated using a chart showing the number of pizzas sold by
Full Transcript
[Music] hey everyone welcome back today we're going to be talking about a very cool statistical method called difference indifference let's dive right into the example so let's say there is a town called pai town and pai town has two pizzerias we have pizzeria a and pizzeria b pizzeria is a place that sells pizza so let's say that these two pizzerias are very similar to each other the only difference let's say the main major difference is the amount of customers that live around them so those customers are kind of represented by those little dots so b has a lot more customers around it so it gets a lot more traffic and a has fewer customers around it but let's say besides that they're very similar types of pizzerias so let's say that one day b drops the price of its pizza and as the chief data scientist for the town of pai town you observe the following statistics following the drop of price of bee's pizza you find that a sold 200 pizzas per day and then b is selling 1 000 pizzas per day this is all happening maybe a little bit of time after this drop in price occurs for pizzeria b so naive conclusion is that this drop led to an increase of 800 pizzas per day for pizzeria b right because now it's selling 2 000 a is only selling 200 so that must mean that b is now selling 800 more pizzas than it was before now probably most of you are very skeptical right is this actually true obviously there's some faulty logic going on but let's try to explain it and see how we can fix it to get what this number should actually be the flaw in logic that i made is that i mistakenly assumed that a and b are selling the exact same number of pizzas before the price drop and therefore after the price drop the difference between how many pizzas per day they're selling must be fully attributed to this price drop but let's see where that breaks down so this chart kind of is going to be the main thing we're looking at in this video it's going to be the entire story is built around this chart so this chart is showing two time periods before the price drop and after the price drop and in general when you're using these difference and difference methods you're going to need to have some kind of time based data so we have these two time periods before b drops its price and after b drops its price let's look at what happens before so before b drops its price a is selling 100 pizzas per day and b is actually selling five times that amount it's selling 500 pizzas per day the reason is because b has a lot more traffic around it that's the whole reason i kind of explained that before so b is already selling a lot more pizzas per day than a was even before the price drop now the price drop occurs and we observe how many pizzas they're selling after the price drop and let's say a is now selling 200 pizzas per day after the price drop so the main first observation is that a is actually increasing the number of pizzas it's selling just irrespective of b now after the price drop as we saw before b we now observe is selling 1 000 pizzas per day and we mistakenly just subtracted this 200 from this 1000 and we said that oh this price drop led to an increase of 800 pizzas being sold per day for b but here's where difference and difference comes in we assume parallel trends we'll talk about that more at the end of this video but we assume that if b had not dropped its price if it was just business as usual then we would assume that b's trajectory of the number of pizzas sold would follow this dotted line which is parallel to the trajectory of a before and after the price drop so we see that there's an increase of 100 for a and so we would assume also an increase of 100 for b so we would say that there's if there's no price drop b would now be selling 600 pizzas but it's actually selling 1 000 pizzas so let's quantify a couple of gaps in this diagram and lock in on what's the actual change in b's number of pizzas sold according to this price drop so the first one is this blue bracket here this is the observed difference that's that 800 pizzas per day that's the number we mistakenly assumed was the correct difference attributed to this price drop the second one is this green bracket here which is the expected difference between a's and b's number of pizzas sold if there was no price drop at all and that would be 400 pizzas per day now this is where the term difference and difference comes in because we subtract this green difference from this blue difference and let's think about intuitively what that means that means i'm taking the observed difference between these guys and taking away how much difference i would have just gotten naturally without any intervention happening and now i'm left with 400 pizzas per day which is 800 minus 400 and that is the difference in number of pizzas per day sold for b because of the price drop and so this is actually pretty wild cool method because it lets us kind of run an experiment after the fact notice that in the best case scenario we would have just run a controlled experiment opened up two pizzerias right next to each other that are really similar and we just changed the price of pizzas for one of them but obviously that takes a lot of investment you can't just open a pizzeria as a researcher you can't force existing pizzerias to change their prices so a lot of times we have to use this after the fact observational data to come to conclusions and that's what we're doing here so finally let's talk about the assumptions of difference in difference so it has all the same assumptions as the ordinary least squares model because we need all these things to be linear with the added very important assumption of parallel trends now let's think for a minute why we need parallel trends what if the diagram looked like this so a increased before and after like this and b increased before and after according to the red line if we assume parallel trends so if we assume this blue line would have been what b did if there was no price change then we will get this difference here between the red and the blue dots but what if b was just growing at a faster rate in general even without this price drop so that its actual difference without the change would have been this green line then we're going to get it wrong because the actual change attributed to the price drop would have been the red and the green dots difference but we don't see that so there's a lot of assumptions baked in but it is a very important tool for the data scientist to be able to run an experiment after the fact and get some kind of indication about the change attributed to a possible intervention so any questions welcome in the comments below like and subscribe see you next time
Original Description
Running an experiment ... without running an experiment.
My Patreon : https://www.patreon.com/user?u=49277905
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from ritvikmath · ritvikmath · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Math Team Update
ritvikmath
Single Variable Calculus Volume of a Sphere - Proof 1
ritvikmath
Single Variable Calculus Volume of a Sphere - Proof 2
ritvikmath
Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
ritvikmath
Multivariable Calculus Volume of a Sphere Proof - Double Integrals
ritvikmath
The Euclidian Algorithm
ritvikmath
Proving the Chain Rule
ritvikmath
Proving the Fundamental Theorem of Calculus Part 1
ritvikmath
Proving the Fundamental Theorem of Calculus Part 2
ritvikmath
Math Puzzle - Poison Perplexity
ritvikmath
Math Puzzle - Poison Perplexity - Solution
ritvikmath
Expected Value and Variance of Continuous Random Variables (Calculus)
ritvikmath
Expected Value and Variance of Discrete Random Variables (No Calculus)
ritvikmath
Array Method
ritvikmath
Complex Power Series and their Derivatives
ritvikmath
Distributions - Intro
ritvikmath
The Poisson Distribution
ritvikmath
The Bernoulli Distribution
ritvikmath
The Binomial Distribution
ritvikmath
The Continuous Uniform Distribution
ritvikmath
The Geometric Distribution
ritvikmath
The Triangular Distribution
ritvikmath
The Exponential Distribution
ritvikmath
The Borel Distribution + Notes on Poisson Distribution
ritvikmath
The Gamma Distribution
ritvikmath
The Normal Distribution
ritvikmath
The Laplace Distribution
ritvikmath
The Chi - Squared Distribution
ritvikmath
Overfitting
ritvikmath
Vector Norms
ritvikmath
Truths Behind the Titanic : K-Nearest Neighbor
ritvikmath
The Mathematics of Breakups
ritvikmath
Sillyfish
ritvikmath
Finding Optimal Paths - Dynamic Programming
ritvikmath
HowToDataScience : Scraping Twitter Data
ritvikmath
Decision Trees
ritvikmath
Perceptron
ritvikmath
Naive Bayes
ritvikmath
K-Nearest Neighbor
ritvikmath
Evaluating Machine Learning Models
ritvikmath
Decision Tree Pruning
ritvikmath
K-Means Clustering
ritvikmath
Gaussian Mixture Model
ritvikmath
Data Science - Fuzzy Record Matching
ritvikmath
Time Series Talk : Autocorrelation and Partial Autocorrelation
ritvikmath
Time Series Talk : Autoregressive Model
ritvikmath
Time Series Talk : Moving Average Model
ritvikmath
Time Series Talk : ARMA Model
ritvikmath
Time Series Talk : ARCH Model
ritvikmath
Time Series Talk : White Noise
ritvikmath
Time Series Talk : Stationarity
ritvikmath
Time Series Talk : ARIMA Model
ritvikmath
Time Series Talk : Lag Operator
ritvikmath
Time Series Talk : What is Seasonality ?
ritvikmath
Time Series Talk : Seasonal ARIMA Model
ritvikmath
So ... What Actually is a Matrix ? : Data Science Basics
ritvikmath
Derivative of a Matrix : Data Science Basics
ritvikmath
Basics of PCA (Principal Component Analysis) : Data Science Concepts
ritvikmath
Eigenvalues & Eigenvectors : Data Science Basics
ritvikmath
The Covariance Matrix : Data Science Basics
ritvikmath
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Medium · AI
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Medium · Data Science
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Medium · Deep Learning
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Medium · LLM
🎓
Tutor Explanation
DeepCamp AI