Persistent Homology | Introduction & Python Example Code

Shaw Talebi · Beginner ·🛠️ AI Tools & Apps ·4y ago

Key Takeaways

The video introduces persistent homology, a technique for finding core topological features of data that are robust to noise, and demonstrates its application in market analysis using Python libraries such as numpy, pandas, and TDA SEF.

Full Transcript

hey folks welcome back this is the final video in a three-part series on topological data analysis or TDA for short in this video I'll be talking about another specific technique Under the Umbrella of TDA called persistent homology the big idea behind persistent homology is finding the core topological features of your data that are hopefully robust to noise I'll start with a brief discussion of key points surrounding persistent homology and then dive into a concrete example with code of how to use it and with that let's get into the video there are many layers to persistent homology so I'll try to start super simple and build things up in a way that hopefully makes some sense like I've mentioned throughout this series TDA is all about looking at the shape of data so let's go back to preschool and talk about shapes or more precisely polygons like the ones shown here but not all polygons are equal there is one that is special and the reason it is special is because it is the simplest polygon we can construct the triangle and one neat thing about triangles is that we can use them to make any other polygon for example a square is really just two triangles stuck together a pentagon can be made from four triangles like this and a star is just the same Pentagon but with five triangles coming out of it so one thought is if we want to analyze the shape of our data maybe we can break it down into a bunch of triangles well as it turns out this is essentially what what we do in persistent homology but with one technical detail since most data sets live in more than just two dimensions that's to say we have more than just two variables flat two-dimensional triangles may not capture the full richness of our data's shape don't worry like most things mathematicians have generalized the notion of a triangle to any number of dimensions and they call these generalized triangles simplexes so the triangle that we know and love is called a two Simplex since it lives in two dimensions a line segment is the simplest shape we can construct in one dimension and it's called a one Simplex similarly a tetrahedron is called a three simplex and a point is a zero simplex and so on for all the other dimensions so just like a collection of triangles can make any two-dimensional polygon a collection of simplexes can approximate just about any complicated high-dimensional shape that may underly our data and so since you'll probably see it elsewhere the technical name for a collection of simplexes is called a simplicial complex and this is a key Concept in persistent homology okay so this gives us a clue as to how we can take unstructured Point clouds in other words data sets and translate them into shapes so now let's talk about how we might compare shapes together no matter how different or complicated they may seem so one way to do this is by looking at holes for example these three objects shown here we have a Taurus a loop and a coffee mug so while these may appear to be very different shapes they have something fundamental in common they all have a hole and this is like the joke that a topologist looks at a coffee mug and a donut and sees the same thing the reason being that one can continuously transform one into the other for the fadas out there this is called a homeomorphism but the fundamental thing here is the number of holes so one way we can characterize and group shapes together is by counting holes and just like before when we generalize triangles into simplexes we can generalize holes as well we can think of cavities as holes in 3D and we can think of singly connected components as holes in 1D and so these generalized holes form the basis of what are called homology groups and these give us a formal way to characterize different shapes so when we talk about homology we are essentially just talking about holes okay so now that we've talked about constructing shapes with generalized triangles and characterizing those shapes via generalized holes we can finally talk about persistent homology and the first step in persistent homology is to convert data into a simpal complex to see this consider a data set I.E a point cloud like this and one way we can construct a simplicial complex out of this is by drawing n dimensional balls around each point and since our data here is two-dimensional we just draw circles around each point which might look something like this so at the center of each of these gray circles we have a point we can form one simplexes I.E line segments by connecting the data points whose corresponding circles overlap which might look something like this and so now we have two shapes we have our original Point Cloud which is indeed a simpal complex where each point is a zero simp Lex and the shape we just constructed made up of both zero and one simplexes and then we can compare these two shapes by looking at their homology more specifically by counting the number of connected components which corresponds to the h0 homology group that we talked about in the previous slide and there we go so we can see that in our first shape on the left we have 20 separate connected components while on the right here we have 13 singly Connected components but there's nothing special about this radius Epsilon sub one so let's do this again but with bigger circles now we can start to see two simplexes appear in other words triangles and the number of connected components decreases but still there's nothing special about Epsilon 2 so let's go even bigger and now we see three simplexes appear I.E tetrahedrons and so on and so forth however there is a special radius value here which is when every circle overlaps with with every other Circle and we are just left with one big connected component and this is a natural limit to this process as we can see with each of these simplicial complexes the shape of our data is evolving and its evolution is captured and Quantified by the number of connected components in other words by the change in its homology so although only four different choices of radi are shown here corresponding to the four different shapes on the screen we can do this for every choice of radius between zero and the limit I mentioned earlier so this gives us a way to sus out which topological features of our data are significant based on how long they persist during this circle growing process in other words the holes that persist over a large increase in radi are more significant than the ones that persist over just a short period okay so how can we track the Persistence of these holes so so one good way to do this is by using a persistence diagram these look something like the plot on the left here which is showing the persistence diagram of a hollow sphere and looking at the plot each of these blue orange and green points corresponds to a topological feature or in other words a hole in blue we have the H Subzero homology Group which are the singly connected components in Orange we have the H1 homology Group which are closed loops and in green we have the H2 homology group in other words W cavities the x axis of this plot indicates the radius at which a hole appeared in the evolution of the data's shape in other words in this circle growing process that we showed in this previous slide and on the y- AIS we have the radius at which that hole disappeared so therefore a point that sits near this black dashed Line This yal X line corresponds to a hole that disappeared soon after it appeared conversely points that sit far away from this line represent holes that disappeared long after they appeared therefore two key points of a persistence diagram are the points close to this yal x line are noise while the points relatively far from this line are significant so in this example we have two points that are far from this line the blue one in the top left here and the green one right here so we can ignore this blue one here because this corresponds to when every and dimensional ball overlaps with every other ball so the significant topological feature of this data is captured by this Green Point here which is telling us that the data is characterized by one cavity and this makes sense since the data for this example are organized on the surface of a sphere okay so up until this point I've discussed only toy examples and meant to give you an idea of what's going on with persistent homology so now we'll switch gears to an example with with real world data so in this example we'll walk through how one could use persistent homology to analyze Market data and I suppose it's worth mentioning that this example is not meant as Financial advice I'm a physicist not a Trader never taken a finance class in my life however I hope this example gives you an idea of what an analysis using persistent homology might look like and Inspire ideas for analyses using data that you might be working with okay so similar to the last video we start by importing python libraries the notable libraries here are y Finance which gives us an API to grab Market data and the riper and pum modules which are part of the same pyit TDA ecosystem from the last video next we load in Market data over a 4-year period using Y Finance here we are grabbing four major Market indexes namely the s&p500 Dow Jones NASDAQ and Russell 2000 we have daily prices for these index organized in a pandas data frame so you can imagine four columns for each market index and many rows corresponding to each day that the markets were open over this 4-year period then we convert this pandas data frame into a numpy array and compute the log daily returns of each index and this choice of data prep follows the procedure used in the paper by gidia and cats which was the inspiration for this example and you can find it at the archive reference here okay so now we get into the TDA SEF so in this analysis we want to track changes in the shape of the markets by looking at how the homology of the market changes over time so to do this we start by initializing this object that constructs simpal complexes from data next we Define a Time window size which will allow us to grab a chunk of data to analyze the homology of so here we're sending this window size to 20 days next we Define the total number of these chunks we will have and finally we create an Umpire rate to keep track of a number that quantifies changes in homology okay next we go down to this for Loop and we do some persistent homology so first we take the first 20 rows of data to do persistent homology and create a persistence diagram that is we grow four dimensional balls around each point where each choice of radius creates a simplicial complex and we track the holes that appear and disappear using a persistence diagram so we do all that with just one line of code and we do the same thing but now for another set of 20 rows specifically the second row all the way down to the 21st row so now we have two persistance diagrams corresponding to two overlapping 20-day windows in which the market was open so next we can quantify the change in the overall homology between these two persistence diagrams using something called the washer Stein distance which is essentially a distance measure between two persistence diagrams so at the end of this whole process we get a single number and store it in the numpy array we created earlier then we repeat this whole process for all the rows in our data set okay so after this whole process we have a set of values which quantify the changes in homology between consecutive days that the market was open and so we can just plot this as a Time series which is what's happening in this block of code here and the plot will look like this blue line here which we can see there's this clear peak near the middle of the time series and then for some context we also have scaled S&P 500 close prices plotted in Orange just above and this vertical red line here is indicating when the crash of 2020 occurred and then as it turns out the peak in this waserstein distance time series seems to correspond very closely with when this crash occurred so did homology changes predict the crash of 2020 well I wouldn't go that far but this is indeed interesting one idea to investigate this further is one could try to use these waserstein distances to predict future market index prices so if past distance values predict future index prices then maybe there's something here so as you may be able to see from this example there is a lot of room for creativity when using persistent homology in practice and in some sense this is more art than science so that bring brings us to the end of our three-part series on topological data analysis A TDA is a young field with a lot of untapped potential so I hope this series was helpful in getting a better idea of what it's all about if you'd like to learn more check out the other videos in this series Linked In the description below there's also a corresponding medium article to this video and the others in this series which you can find in the description if you enjoyed this content please consider liking subscribing or sharing this video like many of you I am indeed still learning so if you have thoughts questions or concerns please feel free to share those in the comment section below and as always thanks for watching

Original Description

🤝 Work with me: https://aibuilder.academy/yt/5ezFcy9CIWE 🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/5ezFcy9CIWE This is the final video in a 3-part series on topological data analysis (TDA). TDA is an up-and-coming approach to data analysis that studies the shape of data. In this video, I discuss a popular TDA approach called persistent homology. Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosVi8Qoyqvz1MEiYrz35Zb7F 📰 Read more: https://medium.datadriveninvestor.com/persistent-homology-f22789d753c4?sk=c0925c51c31f5136abf362829c755146 💻 Example code: https://github.com/ShawhinT/YouTube-Blog/tree/main/TDA/persistent_homology Resources I found helpful: - TDA review: https://www.frontiersin.org/articles/10.3389/frai.2021.667963/full - Intro to persistent homology: https://www.youtube.com/watch?v=2PSqWBIrn90&ab_channel=MatthewWright Introduction - 0:00 Shapes - 0:33 Triangles - 1:02 Simplexes - 1:54 Holes - 2:58 Persistent Homology - 4:14 Persistence Diagrams - 7:03 Example code: Homology of Market Data - 8:58
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Shaw Talebi · Shaw Talebi · 18 of 60

1 biometricDashboard2 DEMO
biometricDashboard2 DEMO
Shaw Talebi
2 biometricDahboard3 DEMO
biometricDahboard3 DEMO
Shaw Talebi
3 Time Series, Signals, & the Fourier Transform | Introduction
Time Series, Signals, & the Fourier Transform | Introduction
Shaw Talebi
4 The Fast Fourier Transform | How does it (actually) work?
The Fast Fourier Transform | How does it (actually) work?
Shaw Talebi
5 The Wavelet Transform | Introduction & Example Code
The Wavelet Transform | Introduction & Example Code
Shaw Talebi
6 Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Shaw Talebi
7 Independent Component Analysis (ICA) | EEG Analysis Example Code
Independent Component Analysis (ICA) | EEG Analysis Example Code
Shaw Talebi
8 Kmeans-based Blink Detecter DEMO
Kmeans-based Blink Detecter DEMO
Shaw Talebi
9 Shit Happens, Stay Solution Oriented
Shit Happens, Stay Solution Oriented
Shaw Talebi
10 Why Conflict Is Good & How You Can Use It
Why Conflict Is Good & How You Can Use It
Shaw Talebi
11 Causality: An Introduction | How (naive) statistics can fail us
Causality: An Introduction | How (naive) statistics can fail us
Shaw Talebi
12 Causal Inference | Answering causal questions
Causal Inference | Answering causal questions
Shaw Talebi
13 Causal Discovery | Inferring causality from observational data
Causal Discovery | Inferring causality from observational data
Shaw Talebi
14 How to Be Antifragile | 7 Practical Tips
How to Be Antifragile | 7 Practical Tips
Shaw Talebi
15 Multi-kills: How to Do More With Less (no, not by multi-tasking)
Multi-kills: How to Do More With Less (no, not by multi-tasking)
Shaw Talebi
16 Topological Data Analysis (TDA) | An introduction
Topological Data Analysis (TDA) | An introduction
Shaw Talebi
17 The Mapper Algorithm | Overview & Python Example Code
The Mapper Algorithm | Overview & Python Example Code
Shaw Talebi
Persistent Homology | Introduction & Python Example Code
Persistent Homology | Introduction & Python Example Code
Shaw Talebi
19 What Is Data Science & How To Start? | A Beginner's Guide
What Is Data Science & How To Start? | A Beginner's Guide
Shaw Talebi
20 How to do MORE with LESS - multikills
How to do MORE with LESS - multikills
Shaw Talebi
21 Causal Effects | An introduction
Causal Effects | An introduction
Shaw Talebi
22 Causal Effects via Propensity Scores | Introduction & Python Code
Causal Effects via Propensity Scores | Introduction & Python Code
Shaw Talebi
23 Causal Effects via the Do-operator | Overview & Example
Causal Effects via the Do-operator | Overview & Example
Shaw Talebi
24 Causal Effects via DAGs | How to Handle Unobserved Confounders
Causal Effects via DAGs | How to Handle Unobserved Confounders
Shaw Talebi
25 Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Shaw Talebi
26 Causal Effects via Regression w/ Python Code
Causal Effects via Regression w/ Python Code
Shaw Talebi
27 5 Reasons Why Every Data Scientist Should Consider Freelancing
5 Reasons Why Every Data Scientist Should Consider Freelancing
Shaw Talebi
28 An Introduction to Decision Trees | Gini Impurity & Python Code
An Introduction to Decision Trees | Gini Impurity & Python Code
Shaw Talebi
29 10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
Shaw Talebi
30 Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Shaw Talebi
31 How to Make a Data Science Portfolio With GitHub Pages (2025)
How to Make a Data Science Portfolio With GitHub Pages (2025)
Shaw Talebi
32 My $100,000+ Data Science Resume (what got me hired)
My $100,000+ Data Science Resume (what got me hired)
Shaw Talebi
33 How to Create a Custom Email Signature in Gmail (2025)
How to Create a Custom Email Signature in Gmail (2025)
Shaw Talebi
34 I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned
I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned
Shaw Talebi
35 Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Shaw Talebi
36 A Practical Introduction to Large Language Models (LLMs)
A Practical Introduction to Large Language Models (LLMs)
Shaw Talebi
37 The OpenAI (Python) API | Introduction & Example Code
The OpenAI (Python) API | Introduction & Example Code
Shaw Talebi
38 The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
Shaw Talebi
39 Why I Quit My $150,000 Data Science Job
Why I Quit My $150,000 Data Science Job
Shaw Talebi
40 Prompt Engineering: How to Trick AI into Solving Your Problems
Prompt Engineering: How to Trick AI into Solving Your Problems
Shaw Talebi
41 The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
Shaw Talebi
42 Fine-tuning Large Language Models (LLMs) | w/ Example Code
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Shaw Talebi
43 How to Build an LLM from Scratch | An Overview
How to Build an LLM from Scratch | An Overview
Shaw Talebi
44 I Have 90 Days to Make $10k/mo—Here's my plan
I Have 90 Days to Make $10k/mo—Here's my plan
Shaw Talebi
45 I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.
I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.
Shaw Talebi
46 Pareto, Power Laws, and Fat Tails
Pareto, Power Laws, and Fat Tails
Shaw Talebi
47 Do NOT become an entrepreneur #entrepreneurship
Do NOT become an entrepreneur #entrepreneurship
Shaw Talebi
48 Detecting Power Laws in Real-world Data | w/ Python Code
Detecting Power Laws in Real-world Data | w/ Python Code
Shaw Talebi
49 How I’d learn data analytics (if I had to start over in 2024) #dataanalytics
How I’d learn data analytics (if I had to start over in 2024) #dataanalytics
Shaw Talebi
50 4 Ways to Measure Fat Tails with Python (+ Example Code)
4 Ways to Measure Fat Tails with Python (+ Example Code)
Shaw Talebi
51 Fine-tuning EXPLAINED in 40 sec #generativeai
Fine-tuning EXPLAINED in 40 sec #generativeai
Shaw Talebi
52 How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
Shaw Talebi
53 5 Questions Every Data Scientist Should Hardcode into Their Brain
5 Questions Every Data Scientist Should Hardcode into Their Brain
Shaw Talebi
54 AI for Business: A (non-technical) introduction
AI for Business: A (non-technical) introduction
Shaw Talebi
55 LLMs EXPLAINED in 60 seconds #ai
LLMs EXPLAINED in 60 seconds #ai
Shaw Talebi
56 3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
Shaw Talebi
57 What is #ai? — Simply Explained
What is #ai? — Simply Explained
Shaw Talebi
58 QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
Shaw Talebi
59 How to Improve LLMs with RAG (Overview + Python Code)
How to Improve LLMs with RAG (Overview + Python Code)
Shaw Talebi
60 Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Shaw Talebi

This video introduces persistent homology, a technique for finding core topological features of data that are robust to noise, and demonstrates its application in market analysis using Python libraries such as numpy, pandas, and TDA SEF. The video covers the basics of persistent homology, including the construction of simplicial complexes and the computation of homology groups. It also shows how to use persistent homology to analyze market data and predict future prices.

Key Takeaways
  1. Construct a simplicial complex from a point cloud
  2. Compare shapes by looking at their homology
  3. Track the number of connected components as the shape is transformed
  4. Create a persistence diagram to visualize the persistence of holes in a shape
  5. Import necessary libraries and load market data
  6. Compute log daily returns of each index and initialize the TDA SEF
  7. Create a numpy array to store the washer Stein distances and plot the time series
  8. Scale S&P 500 close prices and plot them above the time series
💡 Persistent homology can be used to analyze market data and predict future prices by tracking changes in the shape of the data over time.

Related AI Lessons

Up next
How to Open HPL Files (HP-GL Plotter)
File Extension Geeks
Watch →