Persistent Homology | Introduction & Python Example Code
Key Takeaways
The video introduces persistent homology, a technique for finding core topological features of data that are robust to noise, and demonstrates its application in market analysis using Python libraries such as numpy, pandas, and TDA SEF.
Full Transcript
hey folks welcome back this is the final video in a three-part series on topological data analysis or TDA for short in this video I'll be talking about another specific technique Under the Umbrella of TDA called persistent homology the big idea behind persistent homology is finding the core topological features of your data that are hopefully robust to noise I'll start with a brief discussion of key points surrounding persistent homology and then dive into a concrete example with code of how to use it and with that let's get into the video there are many layers to persistent homology so I'll try to start super simple and build things up in a way that hopefully makes some sense like I've mentioned throughout this series TDA is all about looking at the shape of data so let's go back to preschool and talk about shapes or more precisely polygons like the ones shown here but not all polygons are equal there is one that is special and the reason it is special is because it is the simplest polygon we can construct the triangle and one neat thing about triangles is that we can use them to make any other polygon for example a square is really just two triangles stuck together a pentagon can be made from four triangles like this and a star is just the same Pentagon but with five triangles coming out of it so one thought is if we want to analyze the shape of our data maybe we can break it down into a bunch of triangles well as it turns out this is essentially what what we do in persistent homology but with one technical detail since most data sets live in more than just two dimensions that's to say we have more than just two variables flat two-dimensional triangles may not capture the full richness of our data's shape don't worry like most things mathematicians have generalized the notion of a triangle to any number of dimensions and they call these generalized triangles simplexes so the triangle that we know and love is called a two Simplex since it lives in two dimensions a line segment is the simplest shape we can construct in one dimension and it's called a one Simplex similarly a tetrahedron is called a three simplex and a point is a zero simplex and so on for all the other dimensions so just like a collection of triangles can make any two-dimensional polygon a collection of simplexes can approximate just about any complicated high-dimensional shape that may underly our data and so since you'll probably see it elsewhere the technical name for a collection of simplexes is called a simplicial complex and this is a key Concept in persistent homology okay so this gives us a clue as to how we can take unstructured Point clouds in other words data sets and translate them into shapes so now let's talk about how we might compare shapes together no matter how different or complicated they may seem so one way to do this is by looking at holes for example these three objects shown here we have a Taurus a loop and a coffee mug so while these may appear to be very different shapes they have something fundamental in common they all have a hole and this is like the joke that a topologist looks at a coffee mug and a donut and sees the same thing the reason being that one can continuously transform one into the other for the fadas out there this is called a homeomorphism but the fundamental thing here is the number of holes so one way we can characterize and group shapes together is by counting holes and just like before when we generalize triangles into simplexes we can generalize holes as well we can think of cavities as holes in 3D and we can think of singly connected components as holes in 1D and so these generalized holes form the basis of what are called homology groups and these give us a formal way to characterize different shapes so when we talk about homology we are essentially just talking about holes okay so now that we've talked about constructing shapes with generalized triangles and characterizing those shapes via generalized holes we can finally talk about persistent homology and the first step in persistent homology is to convert data into a simpal complex to see this consider a data set I.E a point cloud like this and one way we can construct a simplicial complex out of this is by drawing n dimensional balls around each point and since our data here is two-dimensional we just draw circles around each point which might look something like this so at the center of each of these gray circles we have a point we can form one simplexes I.E line segments by connecting the data points whose corresponding circles overlap which might look something like this and so now we have two shapes we have our original Point Cloud which is indeed a simpal complex where each point is a zero simp Lex and the shape we just constructed made up of both zero and one simplexes and then we can compare these two shapes by looking at their homology more specifically by counting the number of connected components which corresponds to the h0 homology group that we talked about in the previous slide and there we go so we can see that in our first shape on the left we have 20 separate connected components while on the right here we have 13 singly Connected components but there's nothing special about this radius Epsilon sub one so let's do this again but with bigger circles now we can start to see two simplexes appear in other words triangles and the number of connected components decreases but still there's nothing special about Epsilon 2 so let's go even bigger and now we see three simplexes appear I.E tetrahedrons and so on and so forth however there is a special radius value here which is when every circle overlaps with with every other Circle and we are just left with one big connected component and this is a natural limit to this process as we can see with each of these simplicial complexes the shape of our data is evolving and its evolution is captured and Quantified by the number of connected components in other words by the change in its homology so although only four different choices of radi are shown here corresponding to the four different shapes on the screen we can do this for every choice of radius between zero and the limit I mentioned earlier so this gives us a way to sus out which topological features of our data are significant based on how long they persist during this circle growing process in other words the holes that persist over a large increase in radi are more significant than the ones that persist over just a short period okay so how can we track the Persistence of these holes so so one good way to do this is by using a persistence diagram these look something like the plot on the left here which is showing the persistence diagram of a hollow sphere and looking at the plot each of these blue orange and green points corresponds to a topological feature or in other words a hole in blue we have the H Subzero homology Group which are the singly connected components in Orange we have the H1 homology Group which are closed loops and in green we have the H2 homology group in other words W cavities the x axis of this plot indicates the radius at which a hole appeared in the evolution of the data's shape in other words in this circle growing process that we showed in this previous slide and on the y- AIS we have the radius at which that hole disappeared so therefore a point that sits near this black dashed Line This yal X line corresponds to a hole that disappeared soon after it appeared conversely points that sit far away from this line represent holes that disappeared long after they appeared therefore two key points of a persistence diagram are the points close to this yal x line are noise while the points relatively far from this line are significant so in this example we have two points that are far from this line the blue one in the top left here and the green one right here so we can ignore this blue one here because this corresponds to when every and dimensional ball overlaps with every other ball so the significant topological feature of this data is captured by this Green Point here which is telling us that the data is characterized by one cavity and this makes sense since the data for this example are organized on the surface of a sphere okay so up until this point I've discussed only toy examples and meant to give you an idea of what's going on with persistent homology so now we'll switch gears to an example with with real world data so in this example we'll walk through how one could use persistent homology to analyze Market data and I suppose it's worth mentioning that this example is not meant as Financial advice I'm a physicist not a Trader never taken a finance class in my life however I hope this example gives you an idea of what an analysis using persistent homology might look like and Inspire ideas for analyses using data that you might be working with okay so similar to the last video we start by importing python libraries the notable libraries here are y Finance which gives us an API to grab Market data and the riper and pum modules which are part of the same pyit TDA ecosystem from the last video next we load in Market data over a 4-year period using Y Finance here we are grabbing four major Market indexes namely the s&p500 Dow Jones NASDAQ and Russell 2000 we have daily prices for these index organized in a pandas data frame so you can imagine four columns for each market index and many rows corresponding to each day that the markets were open over this 4-year period then we convert this pandas data frame into a numpy array and compute the log daily returns of each index and this choice of data prep follows the procedure used in the paper by gidia and cats which was the inspiration for this example and you can find it at the archive reference here okay so now we get into the TDA SEF so in this analysis we want to track changes in the shape of the markets by looking at how the homology of the market changes over time so to do this we start by initializing this object that constructs simpal complexes from data next we Define a Time window size which will allow us to grab a chunk of data to analyze the homology of so here we're sending this window size to 20 days next we Define the total number of these chunks we will have and finally we create an Umpire rate to keep track of a number that quantifies changes in homology okay next we go down to this for Loop and we do some persistent homology so first we take the first 20 rows of data to do persistent homology and create a persistence diagram that is we grow four dimensional balls around each point where each choice of radius creates a simplicial complex and we track the holes that appear and disappear using a persistence diagram so we do all that with just one line of code and we do the same thing but now for another set of 20 rows specifically the second row all the way down to the 21st row so now we have two persistance diagrams corresponding to two overlapping 20-day windows in which the market was open so next we can quantify the change in the overall homology between these two persistence diagrams using something called the washer Stein distance which is essentially a distance measure between two persistence diagrams so at the end of this whole process we get a single number and store it in the numpy array we created earlier then we repeat this whole process for all the rows in our data set okay so after this whole process we have a set of values which quantify the changes in homology between consecutive days that the market was open and so we can just plot this as a Time series which is what's happening in this block of code here and the plot will look like this blue line here which we can see there's this clear peak near the middle of the time series and then for some context we also have scaled S&P 500 close prices plotted in Orange just above and this vertical red line here is indicating when the crash of 2020 occurred and then as it turns out the peak in this waserstein distance time series seems to correspond very closely with when this crash occurred so did homology changes predict the crash of 2020 well I wouldn't go that far but this is indeed interesting one idea to investigate this further is one could try to use these waserstein distances to predict future market index prices so if past distance values predict future index prices then maybe there's something here so as you may be able to see from this example there is a lot of room for creativity when using persistent homology in practice and in some sense this is more art than science so that bring brings us to the end of our three-part series on topological data analysis A TDA is a young field with a lot of untapped potential so I hope this series was helpful in getting a better idea of what it's all about if you'd like to learn more check out the other videos in this series Linked In the description below there's also a corresponding medium article to this video and the others in this series which you can find in the description if you enjoyed this content please consider liking subscribing or sharing this video like many of you I am indeed still learning so if you have thoughts questions or concerns please feel free to share those in the comment section below and as always thanks for watching
Original Description
🤝 Work with me: https://aibuilder.academy/yt/5ezFcy9CIWE
🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/5ezFcy9CIWE
This is the final video in a 3-part series on topological data analysis (TDA). TDA is an up-and-coming approach to data analysis that studies the shape of data. In this video, I discuss a popular TDA approach called persistent homology.
Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosVi8Qoyqvz1MEiYrz35Zb7F
📰 Read more: https://medium.datadriveninvestor.com/persistent-homology-f22789d753c4?sk=c0925c51c31f5136abf362829c755146
💻 Example code: https://github.com/ShawhinT/YouTube-Blog/tree/main/TDA/persistent_homology
Resources I found helpful:
- TDA review: https://www.frontiersin.org/articles/10.3389/frai.2021.667963/full
- Intro to persistent homology: https://www.youtube.com/watch?v=2PSqWBIrn90&ab_channel=MatthewWright
Introduction - 0:00
Shapes - 0:33
Triangles - 1:02
Simplexes - 1:54
Holes - 2:58
Persistent Homology - 4:14
Persistence Diagrams - 7:03
Example code: Homology of Market Data - 8:58
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Shaw Talebi · Shaw Talebi · 18 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
▶
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
biometricDashboard2 DEMO
Shaw Talebi
biometricDahboard3 DEMO
Shaw Talebi
Time Series, Signals, & the Fourier Transform | Introduction
Shaw Talebi
The Fast Fourier Transform | How does it (actually) work?
Shaw Talebi
The Wavelet Transform | Introduction & Example Code
Shaw Talebi
Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Shaw Talebi
Independent Component Analysis (ICA) | EEG Analysis Example Code
Shaw Talebi
Kmeans-based Blink Detecter DEMO
Shaw Talebi
Shit Happens, Stay Solution Oriented
Shaw Talebi
Why Conflict Is Good & How You Can Use It
Shaw Talebi
Causality: An Introduction | How (naive) statistics can fail us
Shaw Talebi
Causal Inference | Answering causal questions
Shaw Talebi
Causal Discovery | Inferring causality from observational data
Shaw Talebi
How to Be Antifragile | 7 Practical Tips
Shaw Talebi
Multi-kills: How to Do More With Less (no, not by multi-tasking)
Shaw Talebi
Topological Data Analysis (TDA) | An introduction
Shaw Talebi
The Mapper Algorithm | Overview & Python Example Code
Shaw Talebi
Persistent Homology | Introduction & Python Example Code
Shaw Talebi
What Is Data Science & How To Start? | A Beginner's Guide
Shaw Talebi
How to do MORE with LESS - multikills
Shaw Talebi
Causal Effects | An introduction
Shaw Talebi
Causal Effects via Propensity Scores | Introduction & Python Code
Shaw Talebi
Causal Effects via the Do-operator | Overview & Example
Shaw Talebi
Causal Effects via DAGs | How to Handle Unobserved Confounders
Shaw Talebi
Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Shaw Talebi
Causal Effects via Regression w/ Python Code
Shaw Talebi
5 Reasons Why Every Data Scientist Should Consider Freelancing
Shaw Talebi
An Introduction to Decision Trees | Gini Impurity & Python Code
Shaw Talebi
10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
Shaw Talebi
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Shaw Talebi
How to Make a Data Science Portfolio With GitHub Pages (2025)
Shaw Talebi
My $100,000+ Data Science Resume (what got me hired)
Shaw Talebi
How to Create a Custom Email Signature in Gmail (2025)
Shaw Talebi
I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned
Shaw Talebi
Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Shaw Talebi
A Practical Introduction to Large Language Models (LLMs)
Shaw Talebi
The OpenAI (Python) API | Introduction & Example Code
Shaw Talebi
The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
Shaw Talebi
Why I Quit My $150,000 Data Science Job
Shaw Talebi
Prompt Engineering: How to Trick AI into Solving Your Problems
Shaw Talebi
The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
Shaw Talebi
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Shaw Talebi
How to Build an LLM from Scratch | An Overview
Shaw Talebi
I Have 90 Days to Make $10k/mo—Here's my plan
Shaw Talebi
I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.
Shaw Talebi
Pareto, Power Laws, and Fat Tails
Shaw Talebi
Do NOT become an entrepreneur #entrepreneurship
Shaw Talebi
Detecting Power Laws in Real-world Data | w/ Python Code
Shaw Talebi
How I’d learn data analytics (if I had to start over in 2024) #dataanalytics
Shaw Talebi
4 Ways to Measure Fat Tails with Python (+ Example Code)
Shaw Talebi
Fine-tuning EXPLAINED in 40 sec #generativeai
Shaw Talebi
How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
Shaw Talebi
5 Questions Every Data Scientist Should Hardcode into Their Brain
Shaw Talebi
AI for Business: A (non-technical) introduction
Shaw Talebi
LLMs EXPLAINED in 60 seconds #ai
Shaw Talebi
3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
Shaw Talebi
What is #ai? — Simply Explained
Shaw Talebi
QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
Shaw Talebi
How to Improve LLMs with RAG (Overview + Python Code)
Shaw Talebi
Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Shaw Talebi
More on: ML Maths Basics
View skill →
🎓
Tutor Explanation
DeepCamp AI