Causal Effects via Propensity Scores | Introduction & Python Code

Shaw Talebi · Beginner ·🛠️ AI Tools & Apps ·3y ago

Skills: LLM Foundations80%ML Maths Basics80%Supervised Learning80%

Key Takeaways

This video introduces the concept of Propensity Scores and their use in estimating causal effects from observational data, with a focus on Python implementation using logistic regression and libraries such as do_y, pandas, and numpy. The video covers three methods for computing causal effects via Propensity Scores: matching, stratification, and inverse probability of treatment weighting.

Full Transcript

hey folks welcome back this is the second video in a series on causal effects in the last video we learned some theoretical Concepts that underlie causal effects however there were questions surrounding how to translate this Theory into practice in this video we will resolve these questions with a set of practical techniques for estimating causal effects these techniques are all based on something called a propensity score we will conclude the discussion with a concrete example with python code and real world data so with that let's get into the video so in the last video of this series we were talking about estimating causal effects and to estimate causal effects we need data but not all data are equal and so here I'm going to distinguish two types of ways we can obtain data so the first is data from what I'll call an observational study so an observational study insists of passively measuring data without intervention in the data generating process so as an example the causal effect of taking a pill on headache status would an observational study might look like is we passively observe a population of people with headaches some of them taking pills some of them not taking pills and just doing an analysis trying to quantify the causal effect of that pill on headache status the second type of way we can obtain data is from what I'll call an Interventional study and what this consists of is an intentional manipulation of a data generating process for a particular goal so an example of this is a randomized control trial which we talked about in the previous video so if we were interested in studying the same thing the effect of a pill on headache status what a randomized controlled trial would look like is we pick a group of people from a larger population of people with headaches we split that group into two subgroups the first of which we give all the people a pill and then the other subgroup we don't give them a pill and then we can evaluate the causal effect so like I mentioned in the previous video Interventional studies or randomized control trials or something like it is one of the most common ways to quantify causal effects but this comes at a cost it takes a lot more effort and care to collect data through a randomized control trial than it is to just passively observe natural behaviors of people so it would be very advantageous if we could compute causal effects using observational data because it is a lot easier to obtain but there's a problem here in observational studies there could be systematic differences between people that take the pill and don't take the pill that can bias your estimate so for this example where we're just passively observing people with headaches not controlling who takes the pill and who doesn't take the pill there could be a variable that we might not be measuring such as age that could be a confounder because someone's age could drive their behavior to take a pill like kids probably won't be taking pills because their parents won't let them adults might be more likely to take pills than kids because they don't need permission from anyone to take a Tylenol and then that could also affect headache status kids might be less prone to get headaches than adults and so this introduces a systematic difference that can bias our causal effect estimate so one solution to this problem is the propensity score and so a propensity score aims to solve this problem of systematic differences by estimating the probability a subject receives a treatment based on other characteristics so essentially what we do is we include additional variables to our treatment and outcomes so our treatment variable could be takes pill or not our outcome could be headache status and then we can collect other variables that might influence treatment status or headache status such as age income sex or some other variables and so a common way of computing the propensities score is a two-step process first we train a logistic regression model so what that consists of is taking your covariates basically any variable that's not your treatment or outcome and so let's say here we have age income and sex and then we set as our Target variable the treatment status so in other words we're using the covariates to predict treatment status and logistic regression is just a way to connect a set of predictors to a binary Target once we get the logistic regression model the next step is to use it to generate the propensity score so what this might look like is we take a subject with a set of covariates here we have age income and sex we pass them into the logistic regression model that we just developed and then out comes a probability of treatment or in other words a propensity score and so now we can do this for all our subjects and we have a propensity score for every single subject in our data set with a propensity score for every subject in our observational data set we can use the propensity score in different ways to help estimate unbiased causal effects and here I'll be talking about three different propensity score based methods for doing this and so these methods are matching stratification and inverse probability of treatment waiting okay so starting with matching in the simplest case what matching consists of is creating treated untreated pairs with similar propensity scores what this might look like is we have an observational data set so we've just passively observed People's Natural behaviors when they have a headache let's say five people took a pill seven people didn't take the pill and we have propensity scores for each of these subjects and notice that there are people that actually took the pill that have a relatively low probability of treatment and conversely there are people that didn't take the pill that have a relatively high probability of treatment so this is good because it helps us in doing the matching process so one way we can do this is called one to one unmatching without replacement say we pick this subject with a 93 percent probability of treatment and what we do is match this subject to a subject in the untreated population with the most similar propensity score so we can just look at all these subjects we see 73 is the closest to 93 out of all these participants and we just match these two together and then we can take this subject with the 80 propensity score and then we do the same process excluding this participant here with 73 and then we pick this participant with a 57 percent pick the closest one in the untreated population so on and so forth so now we have a so-called matched sample and this is reminiscent of what we might see in a randomized control trial we have two groups of equal size one group took a treatment the other group didn't take the treatment so with this match sample we can compute the average treatment effect just like we did in the previous video so conceive of probably there are two ways we could go about this the first way ate stands for average treatment effect defined in the previous video e is the expectation value which is essentially an average Y is denoting the outcome variable one is indicating a treatment status of one meaning they took the pill zero is indicating a treatment status of zero meaning they didn't take the pill and I is just indexing these pairs in the Matched sample so kind of walking through this I equals one let's say it corresponds to this matched pair so this person corresponds to this term and then this person corresponds to this term we take their difference then we look at this pair we take the difference in their outcomes then we look at the next pair look at the difference in their outcomes and so on and so forth and now we'll have five values corresponding to the difference in outcomes for each of these pairs and then we can just take their average and then that'll give us an average treatment effect alternatively we can use the Expression we had for a randomized control trial so that's what RCT means here so this is is the average treatment effect in a randomized controlled trial so it's just slightly different instead of taking five differences and Computing the average here we will look at all the participants in the treated population look at their outcome and compute the average then we look at the outcome for the untreated population and then we take their difference so these are two alternative ways we can compute an average treatment effect with a matched sample so here there are a lot of details that I glossed over and I don't want to spend too much time on it I'll just refer you to this nice paper by Austin where he dives into details on optimizing the Matched sample basically how we match the took pill population with the didn't take pill population and then also Alternatives and matching so here we did one-to-one matching but you can also do one-to-many matching so essentially there what you're doing is instead of pairing each individual in the treated population with one subject in the untreated population you can match many untreated subjects to a single treated subject and this is all in the paper for anyone who is interested so we'll move on to stratification so in stratification we split subjects into groups with similar propensity scores so again let's say we have our observational data set five people took the pill seven people didn't take the pill and then what we do is what is called rank ordering so basically we order the subjects from lowest to highest propensity score so this is like what you do in elementary school math take the subjects from smallest propensity score to largest propensity score and then what we can do is split them into groups so here we split the subjects into four equal sized groups and then what we can do is compute the average treatment effect in each of these groups don't worry too much about the notation here so G is just indexing each group so we can have group one group two group three group four and then p is just indexing the people that took the pill and is indexing the people that didn't take the pill and we can compute the average treatment effect for each group like this so looking at this term this is the expectation value the average outcome for the people that took the pill let's say in group one so let's say G is equal to one and then this for g equals one we look at the average outcome for the people that didn't take the pill and since it's just one person we just look at this one person's outcome and then we take the difference and then we have the average treatment effect for group one we can do the same thing for group two then for group three group four so we have four average treatment effects and then we can go a step further and compute the average of these four averages and get an overall average treatment effect all right the last method I'll talk about is inverse probability of treatment waiting or iptw for short and the first two methods were kind of similar where we were clumping together people with similar propensity scores and comparing average outcomes between people that took the pill and didn't but in iptw we do things a little differently so here instead we use the propensity score to derive five weights from which the average treatment effect can be computed directly so what this might look like is we have our same observational data set five people take the pill seven people didn't take the pill we have propensity scores for all 12 of these individuals then we can convert these propensity scores into weights and so how we Define the weight will depend on whether the subject hooked the pill or didn't take the pill if they took the pill we just do one divided by the propensity score so 1 divided by 0.23 is about 4.35 1 over 0.93 is about 1.08 so on and so forth but if they didn't take the pill we do one divided by 1 minus their propensity score and then we can get weights for each of the subjects okay and then the next step is we use these weights to aggregate the outcomes for the treated and untreated populations so we weight each outcome so p is indexing again the people that took the pill we do this for every subject in the treated population we multi put these together we add them all up and then we divide by n which is the total number of subjects so in this case it would be 12 and then that gives us the aggregated outcome for the treated population then we do the same exact thing for the untreated population which will give us the aggregated outcome for the untreated and then we can use these values to estimate our average treatment effect so here we can just simply take the difference between the aggregated outcome of the treatment with the aggregated outcome of the untreated now we're going to do a concrete example in Python so here we'll compute the average treatment effect of going to grad school on income and for those of you keeping up with the causality videos I've been putting up this example will be very similar to what we saw in the causal inference video this code and much more is available at the GitHub which I will link in the description below so first we import our libraries so we have pickled just to import our data do y is going to help us do the causal effect estimation and then we have numpy to just do some math then we load in the data we have this pickle file which is a pandas data frame also available at the GitHub repository and then this data comes from the UCI machine learning data repository which I will link in the description as well okay next we Define the causal model so this isn't really necessary for the propensity score based techniques but this is a standard procedure in the do y Library we'll get a better sense of why the library is set up in this way in future videos of this series but for now we can just view this as picking out or labeling which variables in our data set are the treatments the outcomes and the covariance so our treatment is the variable has graduate degree which is a Boolean variable indicating whether the subject went to grad school our outcome variable is greater than 50k indicating whether the participant makes more than fifty thousand dollars or not and then we have the common causes which here we can just view as the covariates and we just have H as the single covariant and so what this causal model looks like is this this is identical to what we saw in the causal inference video and then we can compute the causal effects using these three different propensity score based methods so here we're just creating the S demand which isn't so important here but we will see why this is very important in future videos of this series and then I create a list of the names of each of the propensity score based methods so we have matching stratification and waiting then I initialize two data structures a dictionary and a list to store the causal estimates for each of these approaches and then we just compute each causal effect in a for Loop so just one by one for each of these elements in this list we're just going through and Computing the causal estimate and then storing those in both the dictionary and the list that I created earlier so the result of all this is for matching we had an average treatment effect of 0.136 for stratification we had 0.25 and for inverse probability of treatment weighting we had an average treatment effect of 0.331 so it's interesting to see that all three methods even though they're all using the same data give different causal estimates so one thing we can do is just aggregate these and take their average and so the average of all three methods is 0.24 for those of you who recalled the causal inference video where we did an identical analysis but there instead of using propensity score based methods we use something called a meta learner and we got a similar result so they're the average treatment effect was 0.2 and here we have 0.24 so very similar so how we can interpret this result is going to grad school will increase the probability someone makes more than fifty thousand dollars a year by 24 okay so before we jump with joy and say that we can compute causal effects from any observational data set I would share a word of caution so the whole point that people go through the trouble of randomized control trial is that they can handle both measured and unmeasured confounders through the randomization process you can mitigate systematic differences between your treated and untreated populations but with these propensity score based methods we can only hope to handle measured confounders in the example that we just ran through we had only a single measured confounder which was age but there are conceivably other variables that could impact both someone's probability of going to grad school and someone's probability of making more than fifty thousand dollars a year things like parental income or field of study or work ethic and so on and so forth so these propensity score based methods won't be able to account for other confounders that are not included in the propensity score model and so this may not be such a big deal when you know what you need to measure and you have the ability to measure it but the situations where you know what you need to measure but it's very tricky key to measure so for example let's say we're considering work ethic as a confounder to both someone's probability of going to grad school and someone's probability of making more than fifty thousand dollars a year it could be challenging to quantify work ethic and measuring that for each of your subjects so that brings up the problem of unmeasured confounders and so in the next video of this series we will see what we can do about unmeasured confounders okay and lastly there is a Blog associated with this video linked here and I'll also link it in the description there are some details in the blog that I didn't discuss here so for those interested feel free to check that out and if you enjoyed this video please consider liking subscribing or sharing the content if you have any questions please share those in the comments section below I definitely learned a lot from the feedback I get in the comments section and as always thanks for your time and thanks for watching

Original Description

🤝 Work with me: https://aibuilder.academy/yt/dm-BWjyYQpw 🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/dm-BWjyYQpw This is the 2nd video in a series on causal effects. Here I introduce the Propensity Score and discuss 3 ways we can use it to compute causal effects from observational data. At the end, I share a concrete example with code of what using these methods might look like in practice. 👉 Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosVVTz9HEzpI4d6xpWsc8rOa 📰 Read more: https://medium.com/towards-data-science/propensity-score-5c29c480130c?sk=45f0ec6803eba962c0d2d0162185741d 💻 Example Code: https://github.com/ShawhinT/YouTube-Blog/tree/main/causality/propensity_score Resources: - An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies by Peter C. Austin - Data from UCI MLR: https://archive.ics.uci.edu/ml/datasets/census+income Introduction - 0:00 Observational vs Interventional Studies - 0:32 Propensity Score - 3:25 3 Propensity Score-based Methods - 4:56 1) Matching - 5:18 2) Stratification - 9:07 3) Inverse Probability of Treatment Weighting - 10:37 Example: ATE of Grad on Income - 12:29 Word of Caution - 15:46

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Shaw Talebi · Shaw Talebi · 22 of 60

← Previous Next →

biometricDashboard2 DEMO

biometricDashboard2 DEMO

biometricDahboard3 DEMO

biometricDahboard3 DEMO

Time Series, Signals, & the Fourier Transform | Introduction

Time Series, Signals, & the Fourier Transform | Introduction

The Fast Fourier Transform | How does it (actually) work?

The Fast Fourier Transform | How does it (actually) work?

The Wavelet Transform | Introduction & Example Code

The Wavelet Transform | Introduction & Example Code

Principal Component Analysis (PCA) | Introduction & Example (Python) Code

Principal Component Analysis (PCA) | Introduction & Example (Python) Code

Independent Component Analysis (ICA) | EEG Analysis Example Code

Independent Component Analysis (ICA) | EEG Analysis Example Code

Kmeans-based Blink Detecter DEMO

Kmeans-based Blink Detecter DEMO

Shit Happens, Stay Solution Oriented

Shit Happens, Stay Solution Oriented

Why Conflict Is Good & How You Can Use It

Why Conflict Is Good & How You Can Use It

Causality: An Introduction | How (naive) statistics can fail us

Causality: An Introduction | How (naive) statistics can fail us

Causal Inference | Answering causal questions

Causal Inference | Answering causal questions

Causal Discovery | Inferring causality from observational data

Causal Discovery | Inferring causality from observational data

How to Be Antifragile | 7 Practical Tips

How to Be Antifragile | 7 Practical Tips

Multi-kills: How to Do More With Less (no, not by multi-tasking)

Multi-kills: How to Do More With Less (no, not by multi-tasking)

Topological Data Analysis (TDA) | An introduction

Topological Data Analysis (TDA) | An introduction

The Mapper Algorithm | Overview & Python Example Code

The Mapper Algorithm | Overview & Python Example Code

Persistent Homology | Introduction & Python Example Code

Persistent Homology | Introduction & Python Example Code

What Is Data Science & How To Start? | A Beginner's Guide

What Is Data Science & How To Start? | A Beginner's Guide

How to do MORE with LESS - multikills

How to do MORE with LESS - multikills

Causal Effects | An introduction

Causal Effects | An introduction

Causal Effects via Propensity Scores | Introduction & Python Code

Causal Effects via Propensity Scores | Introduction & Python Code

Causal Effects via the Do-operator | Overview & Example

Causal Effects via the Do-operator | Overview & Example

Causal Effects via DAGs | How to Handle Unobserved Confounders

Causal Effects via DAGs | How to Handle Unobserved Confounders

Smoothing Crypto Time Series with Wavelets | Real-world Data Project

Smoothing Crypto Time Series with Wavelets | Real-world Data Project

Causal Effects via Regression w/ Python Code

Causal Effects via Regression w/ Python Code

5 Reasons Why Every Data Scientist Should Consider Freelancing

5 Reasons Why Every Data Scientist Should Consider Freelancing

An Introduction to Decision Trees | Gini Impurity & Python Code

An Introduction to Decision Trees | Gini Impurity & Python Code

10 Decision Trees are Better Than 1 | Random Forest & AdaBoost

10 Decision Trees are Better Than 1 | Random Forest & AdaBoost

Dimensionality Reduction & Segmentation with Decision Trees | Python Code

Dimensionality Reduction & Segmentation with Decision Trees | Python Code

How to Make a Data Science Portfolio With GitHub Pages (2025)

How to Make a Data Science Portfolio With GitHub Pages (2025)

My $100,000+ Data Science Resume (what got me hired)

My $100,000+ Data Science Resume (what got me hired)

How to Create a Custom Email Signature in Gmail (2025)

How to Create a Custom Email Signature in Gmail (2025)

I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned

I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned

Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience

Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience

A Practical Introduction to Large Language Models (LLMs)

A Practical Introduction to Large Language Models (LLMs)

The OpenAI (Python) API | Introduction & Example Code

The OpenAI (Python) API | Introduction & Example Code

The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio

The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio

Why I Quit My $150,000 Data Science Job

Why I Quit My $150,000 Data Science Job

Prompt Engineering: How to Trick AI into Solving Your Problems

Prompt Engineering: How to Trick AI into Solving Your Problems

The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness

The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning Large Language Models (LLMs) | w/ Example Code

How to Build an LLM from Scratch | An Overview

How to Build an LLM from Scratch | An Overview

I Have 90 Days to Make $10k/mo—Here's my plan

I Have 90 Days to Make $10k/mo—Here's my plan

I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.

I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.

Pareto, Power Laws, and Fat Tails

Pareto, Power Laws, and Fat Tails

Do NOT become an entrepreneur #entrepreneurship

Do NOT become an entrepreneur #entrepreneurship

Detecting Power Laws in Real-world Data | w/ Python Code

Detecting Power Laws in Real-world Data | w/ Python Code

How I’d learn data analytics (if I had to start over in 2024) #dataanalytics

How I’d learn data analytics (if I had to start over in 2024) #dataanalytics

4 Ways to Measure Fat Tails with Python (+ Example Code)

4 Ways to Measure Fat Tails with Python (+ Example Code)

Fine-tuning EXPLAINED in 40 sec #generativeai

Fine-tuning EXPLAINED in 40 sec #generativeai

How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)

How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)

5 Questions Every Data Scientist Should Hardcode into Their Brain

5 Questions Every Data Scientist Should Hardcode into Their Brain

AI for Business: A (non-technical) introduction

AI for Business: A (non-technical) introduction

LLMs EXPLAINED in 60 seconds #ai

LLMs EXPLAINED in 60 seconds #ai

3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning

3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning

What is #ai? — Simply Explained

What is #ai? — Simply Explained

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

How to Improve LLMs with RAG (Overview + Python Code)

How to Improve LLMs with RAG (Overview + Python Code)

Text Embeddings, Classification, and Semantic Search (w/ Python Code)

Text Embeddings, Classification, and Semantic Search (w/ Python Code)

This video teaches the concept of Propensity Scores and their application in estimating causal effects from observational data, with a focus on Python implementation. The video covers three methods for computing causal effects via Propensity Scores and provides a cautionary note on the use of observational data. By watching this video, viewers can learn how to estimate causal effects using Propensity Scores and implement logistic regression in Python.

Key Takeaways

Train a logistic regression model to predict treatment status from covariates
Generate propensity scores by passing covariates into logistic regression model
Match treated and untreated pairs with similar propensity scores
Compute average treatment effect using matched samples
Use stratification to split subjects into groups with similar propensity scores
Compute average treatment effect in each group and then average these values
Use inverse probability of treatment weighting (IPTW) to compare average outcomes between treated and untreated populations

💡 Propensity scores can be used to estimate causal effects from observational data, but it is essential to consider the limitations of propensity score-based methods in handling unmeasured confounders.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

How I Built a Free Online Image & PDF Processing Platform with Vue 3 + FastAPI

Learn how to build a free online image and PDF processing platform using Vue 3 and FastAPI, and discover the benefits of combining these technologies for efficient file processing

Dev.to · IAMUU

I Built a Free AI-Powered YouTube SEO Toolkit With Zero Budget. Here’s What Actually Happened.

Learn how a solo dev built a free AI-powered YouTube SEO toolkit with zero budget and the lessons they learned from the experience

Medium · Startup

How to Create a Second Version of Yourself Inside Obsidian Using AI (Step-by-Step Guide)

Learn to create a second version of yourself inside Obsidian using AI with a step-by-step guide

Medium · ChatGPT

How to prepare for Spain civil service TIC exam using AI in 2026

Learn how to prepare for the Spain civil service TIC exam using AI in 2026, boosting your chances of success with technology-driven study techniques

Dev.to · David García

I Asked Gemini to Build a Dashboard... I Didn't Expect This