Causal Inference | Answering causal questions
Key Takeaways
The video discusses causal inference, aiming to answer questions involving cause and effect, using tools like the do_y library in Python and concepts like causal models, do operator, and rules of do calculus.
Full Transcript
hey folks welcome back this is the second video in a three-part series on causality in this video i'll be talking about causal inference which aims at answering questions involving cause and effect so i'll start by giving um introduction to causal inference and sketching some big ideas and then i'll finish with a concrete example with code using the microsoft do y library in python so with that let's get into the video okay so here we're talking about causal inference which aims at answering questions about cause and effect so given a causal model here we have a directed acyclic graph which i talked about in the previous video and from that how can we estimate causal effects for example how can we estimate the effect of x on y so some examples of questions that fall under the umbrella of causal inference are did the treatment directly help those who took it or was it the marketing campaign that led to increased sales this month or the holiday or how big of an effect would increasing wages have on productivity so these are very practical and significant questions that may not be so readily answered using traditional means and i'll try to highlight what causal inference is good at through what i call the three gifs of causal inference so the first gift is the do operator and the do operator simply simulates a physical intervention and we're all familiar with interventions in the real world this is like when your friend's candy habit gets completely out of control and you just have to sit him or her down and say this has got to stop this is what the do operator does but for a causal model in other words it is a mathematical representation of an intervention so suppose we have this model on the left here we have z causes x which causes y what an intervention in x looks like in this mathematical representation is we delete all the incoming edges into x and manually set x to some predetermined value say x naught so significant contribution from judea pearl and colleagues are the rules of do calculus what these rules provide is a way to translate probabilities that include the do operator into probabilities that do not include the do operator so the power of this is that often we can't perform interventions in the real world this could be because it's physically impossible or unethical or whatever reason for example intervening in someone's height by making them taller to measure the response in basketball ability is not physically feasible or intervening in smoking by forcing someone to smoke a pack of cigarettes every day to measure the response in the risk of lung disease is unethical so in other words often in the real world we have no way to collect data about the interventional probability distribution that is we don't have access to data about probabilities that include the do operator in these situations the rules of do calculus may provide a way to re-express to rewrite probabilities that we are interested in but can't measure directly so the second gift of causal inference is clarifying this notion of confounding and confounding at least for me was something that did not have a clear definition until i read judea pearl's book the book of why so in his book pearl defines confounding as anything that makes the interventional distribution different from the observational distribution in other words anything that makes a probability of y given an intervention in x different from the probability of y given an observation in x so this is easy to see in the three variable case so here we have an example of a causal model which shows the relationship between age education and wealth in this example age is the confounder and this can be understood as age is a common cause of education and wealth which is an idea that's been around for a while as pearl discusses in his book many people took this kind of common cause definition as a definition for confounding but what pearl does in defining confounding in this way in terms of the interventional versus observational distribution is becomes much more easy to generalize this notion to much more than just three variables okay so what does this mean practically if we know age is a confounder this can help inform our analysis of data that we might collect of these three variables so suppose we have this data here of age education and income and we want to assess the impact of education on income if we don't take into consideration age bank a confounder the naive thing to do would be to just partition the data into two subgroups one group has just a high school education and the other group has a college education and just compare their difference in income but since age is confounder this wouldn't give you the best result so knowing what the confounders are of your problem allow you to perform this analysis a different way so in this specific case since age is a confounder we shouldn't compare data between age groups we should compare data within age groups so that's what i'm showing here you can imagine this single data set being split off into four separate data sets uh where we have the blue data set people in their 20s the yellow dataset people in the 30s people in the 40s in red people in the 50s in green and then we repeat the analysis i was talking about before where we kind of compare the incomes of people without just high school education versus college education so you may ask why do we care about this do operator why do we need to talk about interventional probabilities versus observational probabilities and so on ultimately what these tools provide are a way to estimate causal effects so a causal effect is a way to quantify the causal impact that one variable has on another and this is a core part of causal inference so this is what we were naturally doing in the previous slide when we were trying to assess the impact of education on income what we were really doing is quantifying the causal effect that education had on people's incomes but this is obviously applicable to other situations when we ask questions like what productivity be increased if we increase wages or how would sales change if we increase the marketing budget with these questions and several more we're talking about causal effects what is the causal impact of wages on productivity what is the causal effect of marketing spend on sales so looking at the same example as before we have a causal model including age education and wealth we know from the previous slide that age is a confounder because it creates a discrepancy between the interventional and observational probability distributions we can consider education to be a treatment and wealth to be a response to that treatment and then suppose with this causal model in hand we collect some data very similar to what we were talking about in the previous slide but now we're set up to do causal inference we're set up to ask and attempt to answer a question involving cause and effect so a question might be is grad school worth it which might be something someone watching this video is thinking about or something someone is reminiscing upon and wishing they would have known about causal inference before deciding to go to grad school and they're already waist deep into it either way one way to frame this question of is grad school worth it could be what is the treatment effect of education on wealth i'm not saying this is the best way to do it but this is a way we can do it so i'll use this opportunity to run through a concrete example with code in python so the example code is at the github link at the bottom here i also put the link in the description but basically here we're going to estimate the treatment effect of education on income so first we download some libraries load some data this is real census data from the uci university of california irvine the machine learning data repository i don't know the specific name but here is the uh link here to do the causal effect calculation i use the do y library which is a microsoft library for doing causal inference so the next step is we have to define our causal model so again the starting point of all causal inference is a causal model so we need to start with our dag which is the same as we saw in an earlier slide just that education now has a new name called has graduate degree and income has a different name which is greater than 50k so these are both boolean variables which means they're true or false variables so either someone can have a graduate degree or they don't or either they make more than fifty thousand dollars a year or they don't and then age is just an integer next we need a s demand which is basically a recipe for estimating our causal effect you can just do this in one line using the do y library and then finally we can estimate the causal effect so here we're using a t learner which is a type of meta learner i can link a paper talking about meta learners in the description i won't jump into all the details i'll just kind of jump to the result which is the average causal effect is 0.2 so one way to interpret this is having a graduate degree increases your chances of making more than 50 000 a year by 20 however we had a lot of samples in this data set and we've just reduced all those samples to a single number which was the average which may not always be the most representative number so it's always good to plot the distribution and when we plot the distribution so here we have on the x-axis the causal effect the y-axis is the count the number of records or people that had that individual causal effect we see that the distribution is not gaussian so if the distribution is not gaussian that means the average is not a very representative number for that distribution so in other words even though a lot of people had a 0.2 treatment effect there were also a significant number of people that had no treatment effect so it seems we're no closer to answering the question of is grad school worth it however one thing one could do is to dive into these different cohorts kind of look at the samples that had no causal effect from a graduate degree and then look at the people that had a significant causal effect and then you can start to answer the question like what kinds of people benefit from a graduate degree and what kinds of person don't benefit from a graduate degree and then maybe that can kind of help you answer this question so again codes on the github feel free to take it run with it do whatever you want extend the analysis further post your own youtube video about it i'll be really interested to see if anyone actually takes a look and tries to answer this question of is grad school worth it but i guess it's a little too late for me at this point so that was the second video in the three-part series on causality we talked about causal inference which aims at answering questions involving causality however the starting point of all causal inference is a causal model which may not be so easy to have in hand that's where the topic of the next video can be helpful which is causal discovery and that aims at obtaining causal structure from data alone so if you enjoyed this video consider liking subscribing sharing commenting your thoughts i'm always happy and interested in reading the comments check out the blog if you want to get some more details on causal inference and check out the github to get the example code talked about in this video and thanks for watching [Music]
Original Description
🤝 Work with me: https://aibuilder.academy/yt/PFBI-ZfV5rs
🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/PFBI-ZfV5rs
The second video in a 3-part series on causality. In this video I discuss key ideas from causal inference, which aims at answering question about cause-and-effect. I finish with a concrete example with code of doing causal inference in Python.
Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosVVTz9HEzpI4d6xpWsc8rOa
📰 Read more: https://medium.com/towards-data-science/causal-inference-962ae97cefda?sk=d68d5191fdb00d3fee47aaa43ed48f3d
💻 Example code: https://github.com/ShawhinT/YouTube-Blog/tree/main/causality/causal_inference
Resources:
- The Book of Why by Judea Pearl: https://www.amazon.com/Book-Why-Science-Cause-Effect/dp/046509760X
- Do-calculus: https://arxiv.org/abs/1210.4852
- Metalearner paper: https://www.pnas.org/content/116/10/4156
Introduction - 0:00
Causal Inference - 0:28
3 Gifts of Causal Inference - 1:13
Gift 1: Do-operator - 1:20
Gift 2: Confounding (deconfounded) - 3:22
Gift 3: Causal Effects - 5:51
Example: Treatment Effect of Grad School on Income - 8:05
Closing remarks - 11:12
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Shaw Talebi · Shaw Talebi · 12 of 60
1
2
3
4
5
6
7
8
9
10
11
▶
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
biometricDashboard2 DEMO
Shaw Talebi
biometricDahboard3 DEMO
Shaw Talebi
Time Series, Signals, & the Fourier Transform | Introduction
Shaw Talebi
The Fast Fourier Transform | How does it (actually) work?
Shaw Talebi
The Wavelet Transform | Introduction & Example Code
Shaw Talebi
Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Shaw Talebi
Independent Component Analysis (ICA) | EEG Analysis Example Code
Shaw Talebi
Kmeans-based Blink Detecter DEMO
Shaw Talebi
Shit Happens, Stay Solution Oriented
Shaw Talebi
Why Conflict Is Good & How You Can Use It
Shaw Talebi
Causality: An Introduction | How (naive) statistics can fail us
Shaw Talebi
Causal Inference | Answering causal questions
Shaw Talebi
Causal Discovery | Inferring causality from observational data
Shaw Talebi
How to Be Antifragile | 7 Practical Tips
Shaw Talebi
Multi-kills: How to Do More With Less (no, not by multi-tasking)
Shaw Talebi
Topological Data Analysis (TDA) | An introduction
Shaw Talebi
The Mapper Algorithm | Overview & Python Example Code
Shaw Talebi
Persistent Homology | Introduction & Python Example Code
Shaw Talebi
What Is Data Science & How To Start? | A Beginner's Guide
Shaw Talebi
How to do MORE with LESS - multikills
Shaw Talebi
Causal Effects | An introduction
Shaw Talebi
Causal Effects via Propensity Scores | Introduction & Python Code
Shaw Talebi
Causal Effects via the Do-operator | Overview & Example
Shaw Talebi
Causal Effects via DAGs | How to Handle Unobserved Confounders
Shaw Talebi
Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Shaw Talebi
Causal Effects via Regression w/ Python Code
Shaw Talebi
5 Reasons Why Every Data Scientist Should Consider Freelancing
Shaw Talebi
An Introduction to Decision Trees | Gini Impurity & Python Code
Shaw Talebi
10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
Shaw Talebi
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Shaw Talebi
How to Make a Data Science Portfolio With GitHub Pages (2025)
Shaw Talebi
My $100,000+ Data Science Resume (what got me hired)
Shaw Talebi
How to Create a Custom Email Signature in Gmail (2025)
Shaw Talebi
I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned
Shaw Talebi
Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Shaw Talebi
A Practical Introduction to Large Language Models (LLMs)
Shaw Talebi
The OpenAI (Python) API | Introduction & Example Code
Shaw Talebi
The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
Shaw Talebi
Why I Quit My $150,000 Data Science Job
Shaw Talebi
Prompt Engineering: How to Trick AI into Solving Your Problems
Shaw Talebi
The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
Shaw Talebi
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Shaw Talebi
How to Build an LLM from Scratch | An Overview
Shaw Talebi
I Have 90 Days to Make $10k/mo—Here's my plan
Shaw Talebi
I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.
Shaw Talebi
Pareto, Power Laws, and Fat Tails
Shaw Talebi
Do NOT become an entrepreneur #entrepreneurship
Shaw Talebi
Detecting Power Laws in Real-world Data | w/ Python Code
Shaw Talebi
How I’d learn data analytics (if I had to start over in 2024) #dataanalytics
Shaw Talebi
4 Ways to Measure Fat Tails with Python (+ Example Code)
Shaw Talebi
Fine-tuning EXPLAINED in 40 sec #generativeai
Shaw Talebi
How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
Shaw Talebi
5 Questions Every Data Scientist Should Hardcode into Their Brain
Shaw Talebi
AI for Business: A (non-technical) introduction
Shaw Talebi
LLMs EXPLAINED in 60 seconds #ai
Shaw Talebi
3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
Shaw Talebi
What is #ai? — Simply Explained
Shaw Talebi
QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
Shaw Talebi
How to Improve LLMs with RAG (Overview + Python Code)
Shaw Talebi
Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Shaw Talebi
More on: Research Methods
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI