Persistent Homology | Introduction & Python Example Code

Shaw Talebi · Beginner ·🛠️ AI Tools & Apps ·4y ago

Skills: ML Maths Basics80%Research Methods70%Reading ML Papers60%

Key Takeaways

The video introduces persistent homology, a technique for finding core topological features of data that are robust to noise, and demonstrates its application in market analysis using Python libraries such as numpy, pandas, and TDA SEF.

Full Transcript

hey folks welcome back this is the final video in a three-part series on topological data analysis or TDA for short in this video I'll be talking about another specific technique Under the Umbrella of TDA called persistent homology the big idea behind persistent homology is finding the core topological features of your data that are hopefully robust to noise I'll start with a brief discussion of key points surrounding persistent homology and then dive into a concrete example with code of how to use it and with that let's get into the video there are many layers to persistent homology so I'll try to start super simple and build things up in a way that hopefully makes some sense like I've mentioned throughout this series TDA is all about looking at the shape of data so let's go back to preschool and talk about shapes or more precisely polygons like the ones shown here but not all polygons are equal there is one that is special and the reason it is special is because it is the simplest polygon we can construct the triangle and one neat thing about triangles is that we can use them to make any other polygon for example a square is really just two triangles stuck together a pentagon can be made from four triangles like this and a star is just the same Pentagon but with five triangles coming out of it so one thought is if we want to analyze the shape of our data maybe we can break it down into a bunch of triangles well as it turns out this is essentially what what we do in persistent homology but with one technical detail since most data sets live in more than just two dimensions that's to say we have more than just two variables flat two-dimensional triangles may not capture the full richness of our data's shape don't worry like most things mathematicians have generalized the notion of a triangle to any number of dimensions and they call these generalized triangles simplexes so the triangle that we know and love is called a two Simplex since it lives in two dimensions a line segment is the simplest shape we can construct in one dimension and it's called a one Simplex similarly a tetrahedron is called a three simplex and a point is a zero simplex and so on for all the other dimensions so just like a collection of triangles can make any two-dimensional polygon a collection of simplexes can approximate just about any complicated high-dimensional shape that may underly our data and so since you'll probably see it elsewhere the technical name for a collection of simplexes is called a simplicial complex and this is a key Concept in persistent homology okay so this gives us a clue as to how we can take unstructured Point clouds in other words data sets and translate them into shapes so now let's talk about how we might compare shapes together no matter how different or complicated they may seem so one way to do this is by looking at holes for example these three objects shown here we have a Taurus a loop and a coffee mug so while these may appear to be very different shapes they have something fundamental in common they all have a hole and this is like the joke that a topologist looks at a coffee mug and a donut and sees the same thing the reason being that one can continuously transform one into the other for the fadas out there this is called a homeomorphism but the fundamental thing here is the number of holes so one way we can characterize and group shapes together is by counting holes and just like before when we generalize triangles into simplexes we can generalize holes as well we can think of cavities as holes in 3D and we can think of singly connected components as holes in 1D and so these generalized holes form the basis of what are called homology groups and these give us a formal way to characterize different shapes so when we talk about homology we are essentially just talking about holes okay so now that we've talked about constructing shapes with generalized triangles and characterizing those shapes via generalized holes we can finally talk about persistent homology and the first step in persistent homology is to convert data into a simpal complex to see this consider a data set I.E a point cloud like this and one way we can construct a simplicial complex out of this is by drawing n dimensional balls around each point and since our data here is two-dimensional we just draw circles around each point which might look something like this so at the center of each of these gray circles we have a point we can form one simplexes I.E line segments by connecting the data points whose corresponding circles overlap which might look something like this and so now we have two shapes we have our original Point Cloud which is indeed a simpal complex where each point is a zero simp Lex and the shape we just constructed made up of both zero and one simplexes and then we can compare these two shapes by looking at their homology more specifically by counting the number of connected components which corresponds to the h0 homology group that we talked about in the previous slide and there we go so we can see that in our first shape on the left we have 20 separate connected components while on the right here we have 13 singly Connected components but there's nothing special about this radius Epsilon sub one so let's do this again but with bigger circles now we can start to see two simplexes appear in other words triangles and the number of connected components decreases but still there's nothing special about Epsilon 2 so let's go even bigger and now we see three simplexes appear I.E tetrahedrons and so on and so forth however there is a special radius value here which is when every circle overlaps with with every other Circle and we are just left with one big connected component and this is a natural limit to this process as we can see with each of these simplicial complexes the shape of our data is evolving and its evolution is captured and Quantified by the number of connected components in other words by the change in its homology so although only four different choices of radi are shown here corresponding to the four different shapes on the screen we can do this for every choice of radius between zero and the limit I mentioned earlier so this gives us a way to sus out which topological features of our data are significant based on how long they persist during this circle growing process in other words the holes that persist over a large increase in radi are more significant than the ones that persist over just a short period okay so how can we track the Persistence of these holes so so one good way to do this is by using a persistence diagram these look something like the plot on the left here which is showing the persistence diagram of a hollow sphere and looking at the plot each of these blue orange and green points corresponds to a topological feature or in other words a hole in blue we have the H Subzero homology Group which are the singly connected components in Orange we have the H1 homology Group which are closed loops and in green we have the H2 homology group in other words W cavities the x axis of this plot indicates the radius at which a hole appeared in the evolution of the data's shape in other words in this circle growing process that we showed in this previous slide and on the y- AIS we have the radius at which that hole disappeared so therefore a point that sits near this black dashed Line This yal X line corresponds to a hole that disappeared soon after it appeared conversely points that sit far away from this line represent holes that disappeared long after they appeared therefore two key points of a persistence diagram are the points close to this yal x line are noise while the points relatively far from this line are significant so in this example we have two points that are far from this line the blue one in the top left here and the green one right here so we can ignore this blue one here because this corresponds to when every and dimensional ball overlaps with every other ball so the significant topological feature of this data is captured by this Green Point here which is telling us that the data is characterized by one cavity and this makes sense since the data for this example are organized on the surface of a sphere okay so up until this point I've discussed only toy examples and meant to give you an idea of what's going on with persistent homology so now we'll switch gears to an example with with real world data so in this example we'll walk through how one could use persistent homology to analyze Market data and I suppose it's worth mentioning that this example is not meant as Financial advice I'm a physicist not a Trader never taken a finance class in my life however I hope this example gives you an idea of what an analysis using persistent homology might look like and Inspire ideas for analyses using data that you might be working with okay so similar to the last video we start by importing python libraries the notable libraries here are y Finance which gives us an API to grab Market data and the riper and pum modules which are part of the same pyit TDA ecosystem from the last video next we load in Market data over a 4-year period using Y Finance here we are grabbing four major Market indexes namely the s&p500 Dow Jones NASDAQ and Russell 2000 we have daily prices for these index organized in a pandas data frame so you can imagine four columns for each market index and many rows corresponding to each day that the markets were open over this 4-year period then we convert this pandas data frame into a numpy array and compute the log daily returns of each index and this choice of data prep follows the procedure used in the paper by gidia and cats which was the inspiration for this example and you can find it at the archive reference here okay so now we get into the TDA SEF so in this analysis we want to track changes in the shape of the markets by looking at how the homology of the market changes over time so to do this we start by initializing this object that constructs simpal complexes from data next we Define a Time window size which will allow us to grab a chunk of data to analyze the homology of so here we're sending this window size to 20 days next we Define the total number of these chunks we will have and finally we create an Umpire rate to keep track of a number that quantifies changes in homology okay next we go down to this for Loop and we do some persistent homology so first we take the first 20 rows of data to do persistent homology and create a persistence diagram that is we grow four dimensional balls around each point where each choice of radius creates a simplicial complex and we track the holes that appear and disappear using a persistence diagram so we do all that with just one line of code and we do the same thing but now for another set of 20 rows specifically the second row all the way down to the 21st row so now we have two persistance diagrams corresponding to two overlapping 20-day windows in which the market was open so next we can quantify the change in the overall homology between these two persistence diagrams using something called the washer Stein distance which is essentially a distance measure between two persistence diagrams so at the end of this whole process we get a single number and store it in the numpy array we created earlier then we repeat this whole process for all the rows in our data set okay so after this whole process we have a set of values which quantify the changes in homology between consecutive days that the market was open and so we can just plot this as a Time series which is what's happening in this block of code here and the plot will look like this blue line here which we can see there's this clear peak near the middle of the time series and then for some context we also have scaled S&P 500 close prices plotted in Orange just above and this vertical red line here is indicating when the crash of 2020 occurred and then as it turns out the peak in this waserstein distance time series seems to correspond very closely with when this crash occurred so did homology changes predict the crash of 2020 well I wouldn't go that far but this is indeed interesting one idea to investigate this further is one could try to use these waserstein distances to predict future market index prices so if past distance values predict future index prices then maybe there's something here so as you may be able to see from this example there is a lot of room for creativity when using persistent homology in practice and in some sense this is more art than science so that bring brings us to the end of our three-part series on topological data analysis A TDA is a young field with a lot of untapped potential so I hope this series was helpful in getting a better idea of what it's all about if you'd like to learn more check out the other videos in this series Linked In the description below there's also a corresponding medium article to this video and the others in this series which you can find in the description if you enjoyed this content please consider liking subscribing or sharing this video like many of you I am indeed still learning so if you have thoughts questions or concerns please feel free to share those in the comment section below and as always thanks for watching

Original Description

🤝 Work with me: https://aibuilder.academy/yt/5ezFcy9CIWE 🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/5ezFcy9CIWE This is the final video in a 3-part series on topological data analysis (TDA). TDA is an up-and-coming approach to data analysis that studies the shape of data. In this video, I discuss a popular TDA approach called persistent homology. Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosVi8Qoyqvz1MEiYrz35Zb7F 📰 Read more: https://medium.datadriveninvestor.com/persistent-homology-f22789d753c4?sk=c0925c51c31f5136abf362829c755146 💻 Example code: https://github.com/ShawhinT/YouTube-Blog/tree/main/TDA/persistent_homology Resources I found helpful: - TDA review: https://www.frontiersin.org/articles/10.3389/frai.2021.667963/full - Intro to persistent homology: https://www.youtube.com/watch?v=2PSqWBIrn90&ab_channel=MatthewWright Introduction - 0:00 Shapes - 0:33 Triangles - 1:02 Simplexes - 1:54 Holes - 2:58 Persistent Homology - 4:14 Persistence Diagrams - 7:03 Example code: Homology of Market Data - 8:58

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Shaw Talebi · Shaw Talebi · 18 of 60

← Previous Next →

biometricDashboard2 DEMO

biometricDashboard2 DEMO

biometricDahboard3 DEMO

biometricDahboard3 DEMO

Time Series, Signals, & the Fourier Transform | Introduction

Time Series, Signals, & the Fourier Transform | Introduction

The Fast Fourier Transform | How does it (actually) work?

The Fast Fourier Transform | How does it (actually) work?

The Wavelet Transform | Introduction & Example Code

The Wavelet Transform | Introduction & Example Code

Principal Component Analysis (PCA) | Introduction & Example (Python) Code

Principal Component Analysis (PCA) | Introduction & Example (Python) Code

Independent Component Analysis (ICA) | EEG Analysis Example Code

Independent Component Analysis (ICA) | EEG Analysis Example Code

Kmeans-based Blink Detecter DEMO

Kmeans-based Blink Detecter DEMO

Shit Happens, Stay Solution Oriented

Shit Happens, Stay Solution Oriented

Why Conflict Is Good & How You Can Use It

Why Conflict Is Good & How You Can Use It

Causality: An Introduction | How (naive) statistics can fail us

Causality: An Introduction | How (naive) statistics can fail us

Causal Inference | Answering causal questions

Causal Inference | Answering causal questions

Causal Discovery | Inferring causality from observational data

Causal Discovery | Inferring causality from observational data

How to Be Antifragile | 7 Practical Tips

How to Be Antifragile | 7 Practical Tips

Multi-kills: How to Do More With Less (no, not by multi-tasking)

Multi-kills: How to Do More With Less (no, not by multi-tasking)

Topological Data Analysis (TDA) | An introduction

Topological Data Analysis (TDA) | An introduction

The Mapper Algorithm | Overview & Python Example Code

The Mapper Algorithm | Overview & Python Example Code

Persistent Homology | Introduction & Python Example Code

Persistent Homology | Introduction & Python Example Code

What Is Data Science & How To Start? | A Beginner's Guide

What Is Data Science & How To Start? | A Beginner's Guide

How to do MORE with LESS - multikills

How to do MORE with LESS - multikills

Causal Effects | An introduction

Causal Effects | An introduction

Causal Effects via Propensity Scores | Introduction & Python Code

Causal Effects via Propensity Scores | Introduction & Python Code

Causal Effects via the Do-operator | Overview & Example

Causal Effects via the Do-operator | Overview & Example

Causal Effects via DAGs | How to Handle Unobserved Confounders

Causal Effects via DAGs | How to Handle Unobserved Confounders

Smoothing Crypto Time Series with Wavelets | Real-world Data Project

Smoothing Crypto Time Series with Wavelets | Real-world Data Project

Causal Effects via Regression w/ Python Code

Causal Effects via Regression w/ Python Code

5 Reasons Why Every Data Scientist Should Consider Freelancing

5 Reasons Why Every Data Scientist Should Consider Freelancing

An Introduction to Decision Trees | Gini Impurity & Python Code

An Introduction to Decision Trees | Gini Impurity & Python Code

10 Decision Trees are Better Than 1 | Random Forest & AdaBoost

10 Decision Trees are Better Than 1 | Random Forest & AdaBoost

Dimensionality Reduction & Segmentation with Decision Trees | Python Code

Dimensionality Reduction & Segmentation with Decision Trees | Python Code

How to Make a Data Science Portfolio With GitHub Pages (2025)

How to Make a Data Science Portfolio With GitHub Pages (2025)

My $100,000+ Data Science Resume (what got me hired)

My $100,000+ Data Science Resume (what got me hired)

How to Create a Custom Email Signature in Gmail (2025)

How to Create a Custom Email Signature in Gmail (2025)

I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned

I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned

Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience

Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience

A Practical Introduction to Large Language Models (LLMs)

A Practical Introduction to Large Language Models (LLMs)

The OpenAI (Python) API | Introduction & Example Code

The OpenAI (Python) API | Introduction & Example Code

The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio

The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio

Why I Quit My $150,000 Data Science Job

Why I Quit My $150,000 Data Science Job

Prompt Engineering: How to Trick AI into Solving Your Problems

Prompt Engineering: How to Trick AI into Solving Your Problems

The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness

The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning Large Language Models (LLMs) | w/ Example Code

How to Build an LLM from Scratch | An Overview

How to Build an LLM from Scratch | An Overview

I Have 90 Days to Make $10k/mo—Here's my plan

I Have 90 Days to Make $10k/mo—Here's my plan

I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.

I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.

Pareto, Power Laws, and Fat Tails

Pareto, Power Laws, and Fat Tails

Do NOT become an entrepreneur #entrepreneurship

Do NOT become an entrepreneur #entrepreneurship

Detecting Power Laws in Real-world Data | w/ Python Code

Detecting Power Laws in Real-world Data | w/ Python Code

How I’d learn data analytics (if I had to start over in 2024) #dataanalytics

How I’d learn data analytics (if I had to start over in 2024) #dataanalytics

4 Ways to Measure Fat Tails with Python (+ Example Code)

4 Ways to Measure Fat Tails with Python (+ Example Code)

Fine-tuning EXPLAINED in 40 sec #generativeai

Fine-tuning EXPLAINED in 40 sec #generativeai

How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)

How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)

5 Questions Every Data Scientist Should Hardcode into Their Brain

5 Questions Every Data Scientist Should Hardcode into Their Brain

AI for Business: A (non-technical) introduction

AI for Business: A (non-technical) introduction

LLMs EXPLAINED in 60 seconds #ai

LLMs EXPLAINED in 60 seconds #ai

3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning

3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning

What is #ai? — Simply Explained

What is #ai? — Simply Explained

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

How to Improve LLMs with RAG (Overview + Python Code)

How to Improve LLMs with RAG (Overview + Python Code)

Text Embeddings, Classification, and Semantic Search (w/ Python Code)

Text Embeddings, Classification, and Semantic Search (w/ Python Code)

This video introduces persistent homology, a technique for finding core topological features of data that are robust to noise, and demonstrates its application in market analysis using Python libraries such as numpy, pandas, and TDA SEF. The video covers the basics of persistent homology, including the construction of simplicial complexes and the computation of homology groups. It also shows how to use persistent homology to analyze market data and predict future prices.

Key Takeaways

Construct a simplicial complex from a point cloud
Compare shapes by looking at their homology
Track the number of connected components as the shape is transformed
Create a persistence diagram to visualize the persistence of holes in a shape
Import necessary libraries and load market data
Compute log daily returns of each index and initialize the TDA SEF
Create a numpy array to store the washer Stein distances and plot the time series
Scale S&P 500 close prices and plot them above the time series

💡 Persistent homology can be used to analyze market data and predict future prices by tracking changes in the shape of the data over time.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

Best AI Tools and Software Reviews: 2026 Picks

Discover the best AI tools and software for your specific needs in 2026, and learn how to match them to your work for optimal results

Verify real estate listings with Dwell, a platform that checks claims against records before you sign

Reddit r/artificial

X now offers an MCP server to make its platform easier for AI tools to use

X launches a hosted MCP server to simplify AI tool integration with its API

n8n Automation Repurpose Video Content: The 2025 Production Guide

Learn to repurpose video content using n8n automation, replacing manual labor with a self-hosted workflow solution

How to Open HPL Files (HP-GL Plotter)

File Extension Geeks