Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Skills:
ML Maths Basics80%
Key Takeaways
Introduces the Wavelet Transform and provides example code
Full Transcript
okay this is good i got the angle quite on the set hey guys welcome back hey guys welcome back i'm back with another series if you missed the first one it's available on my channel it was on time series signals the fourier transform and the wave of the transform in this new series i'll be talking about two things one principle component analysis and two independent component analysis so principal component analysis or pca is the topic of this video so i'll give you a little intuition share some math and then i'll finish with a concrete example of how you can use pca to analyze the stock market so let's get right into it so the analogy i like to think of for pca is imagine like a massive rock band with like 20 members in the ensemble and you have you know two drummers several guitarists you have several keyboardists or pianists you have a string section a horn section vocalist percussionist the whole works so you have this 20 person band and you know that's not a big deal that's uh you know that's the kind of band made for huge arenas and stadiums but if a band like this is just getting started they're gonna have a hard time fitting in smaller venues like coffee shops and restaurants so a natural solution to this problem is to just kind of reduce the number of players at specific performances so instead of like a keyboardist pianist and whatnot you could just have one person on the keyboard instead of having multiple guitars you could just have one person do an acoustic guitar instead of two drummers and a percussionist you can have someone banging on the bongos and so on in a lot of ways this is basically what pca does so this is the big band on the left is before pca and then you can kind of boil it down to its core elements for the same band to play at the coffee shop but instead of uh a band you can think of a pca applying to a data set instead of musicians or players in the band you can think of the variables in your data set and instead of a song or the music you can think of what your data set is representing a bit more concretely principal component analysis pca reduces input dimensionality and redundancy so we can think of two variables x and y this could be something like hot dogs sold and hot dog bun sold which are directly correlated but in a lot of ways contain redundant information so it may be practical to represent this underlying information instead of through two variables through just one variable and then that's uh application of pca so we can transform our axes from this x and y axis to a new set of axes we'll call them pc1 and pc2 and then if you want to take it a step further you can just remove pc2 and just operate with one variable so essentially we've reduced the dimensionality from two variables x and y to just one pc one if we choose to drop pc2 okay so how does it work the basic idea the goal of pca is to reduce variable redundancy or input variable redundancy by creating a new set of variables where the variance along each subsequent variable is maximized so in the previous example we saw pictorially that we changed from a set of two variables hot dog sold and hot dog bun sold to a new pair of variables we call them pc1 and pc2 and essentially pc1 contained all the relevant information we needed and the way we got pc1 is basically rotated the axes to be kind of along this linear slope of points defined by the hot dog bun and hot dog sales what does that translate to mathematically so we can think of this situation so we have x which is a matrix of data where the rows are data records and the columns are variables we have w which is a vector of weights and then we have t which is a score vector and what i'm going to be calling a principal component so t is what we're interested in we have our data x and we're trying to find uh a w that is going to create this principle component for us okay so here's here's the magic of pca here's the trick to it all so the goal here is to maximize the variance of t subject to the constraint that the norm squared of w so w transpose times w is equal to one okay and then variances uh defined in the usual way so you take every element subtract the mean of the variable you square it and then you divide by uh the number of elements minus one and then you just add this up for every single element in the set of numbers um and so one really important thing when doing pca is you want to auto scale your data so basically what does that mean for each number in each column of your matrix uh you want to subtract the average and divide by the standard deviation so if we do that then the mean of the principal component will turn out to be zero which allows us to kind of drop the mean term in the variance here it turns out that the variance will just be equal to the norm squared of t divided by uh the number of elements minus one okay so what does that mean that means we can rewrite this optimization problem instead of maximizing the variance we can just maximize the norm squared of t because the the vector w that maximizes the norm squared of t is also going to be the same vector w that maximizes the variance of t okay so we can rewrite uh the optimization problem using our above expression for t and it turns out this is actually a pretty straightforward optimization problem to solve and don't be intimidated by the matrices and vectors we can use a very well known and common technique in calculus known as the method of lagrange multipliers which basically allows us to rewrite an optimization problem with constraints a constrained optimization problem as a optimization problem without constraints or an unconstrained optimization problem if none of that makes sense that's fine we just need these relevant expressions here so we can write out the lagrangian which is this l of x uh term here for our pca optimization problem and then we can have the associated equations and this is the exciting part here this first equation if we rearrange it is just an eigenvalue problem which is a standard problem in linear algebra and then the second equation is just a restatement of our original constraint so writing it explicitly here we can solve for the eigenvalue lambda and the vector of weights w using standard eigenvalue approaches if you're doing this in some programming language every programming language like r python matlab they're going to have built-in functions that allow you to solve this problem and then once we have this vector of weights we have everything we need we can just multiply that by x and we can get our principal component and then this naturally extends to multiple components so this we started out just looking for a single component but if you solve the eigenvalue problem your and you have n columns in your matrix x and x is square you're going to end up with n eigenvalues and n corresponding eigenvectors and then if you kind of sort these eigenvalues and eigenvectors from largest to smallest you sort from the largest eigenvalue all the way down to the smallest each corresponding eigenvector w is going to be a set of weights which define a principal component and the principal components associated with the larger eigenvalues contain more information than components associated with smaller eigenvalues so you can define some threshold like in the first slide where we could have just dropped pc2 because it wasn't giving us much additional information you can do the same thing and kind of truncate your variables after a certain amount of information is captured with your principal components okay so just as a recap principle component analysis it reduces input dimensionality and redundancy some key points are new variables are created to be a linear combination of input variables so that's kind of what we saw in the previous slide where you had a matrix multiplied by a vector of weights that's equivalent to a linear combination of your input variables and then each subsequent new variable contains less information we kind of saw that once you sorted your eigenvalues from largest to smallest the principal components associated with the larger eigenvalues contain more information and the principal components corresponding to smaller eigenvalues contain less information and then there are a lot of applications for pca relating variables together so if two variables get kind of clumped together kind of like hot dog bun sold and hot dog sold there's some underlying correlation there you can use it for clustering where you can transform your space from your original input space to like a new pca space and then you can do a clustering algorithm like k-means and then you can also do some outlier identification so you can plot all your points in your principal component space and just kind of visually inspect if there are any outliers all right so here's a fun example i guess at the outset i'm going to say i'm not a financial advisor i've never taken a finance class so in no way is this a recommendation of how you should invest your money this is just a fun example of what pca can do so here we're going to use pca to create an s p 500 index fund so an index fund is basically a set of investments that are meant to follow or track with a specific market the example codes on the github so i'll probably just fly through this i used the yahoo finance module to get real actual stock data so this is all real data this isn't made up and then i use pandas and numpy for all the number crunching so i write some code to input the ticker names from wikipedia and then graham guthrie had a nice medium post of how you can grab all these s p 500 names so i just stole some code from that post and made some edits okay then i pull s p 500 data for 2020 i drop nands get a pandas data frame of just close prices as opposed to all the other information that's available get a list of names ticker names of all the companies in the data frame so we have 253 rows and 499 columns so here i i guess the comments aren't updated so i apologize for that but here we're initializing pca with 10 components and then we'll ex we'll apply pca to our data set and we'll print the explained variants so you can see you know the first three components you're already at more than 90 of the explained variants uh if you just sum up the first three elements of that array there um okay and then we can create an index fund so there's countless ways you can do this i just arbitrarily took the weights defining the first three principal components i sum them together and then i only included the top 61 weights we can represent the uh overall portfolio of this index fund with a bar plot it's a natural way to do it so the y-axis is the relative weight you can also think of this as the number of dollars relative number of dollars you're gonna invest in each specific company and then the x-axis is just the individual ticker names okay and then we can see how our index fund compares to the actual s p 500 over 2020 and just you know visually approximately it doesn't do such a bad job there's some discrepancies uh along the way but everyone cares about percent return so if you would have just bought one share of every single stock in the s p 500 at the beginning of 2020 and then sold those uh same shares at the beginning of 2021 you would have made 20 return if you would have instead followed the investing strategy of this particular index fund derived from pca you would have made 25 so that was the video on principal component analysis i hope that cleared things up if you want to learn more about principal component analysis i have provided a link to my blog post on medium on the topic stay tuned for the next video where i'll be talking about a similar but different technique independent component analysis if you enjoyed this video be sure to like comment subscribe hit the bell share with your friends and family so they too can learn about principal component analysis thanks for watching you
Original Description
🤝 Work with me: https://aibuilder.academy/yt/WDjzgnqyz4s
🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/WDjzgnqyz4s
The first video in a 2-part series on Principal Component Analysis (PCA) and Independent Component Analysis (ICA). This video gives some intuition, math, and an example of using PCA to create an S&P 500 index fund.
More in this series:
- Blog: https://medium.com/towards-data-science/principal-component-analysis-pca-79d228eb9d24?sk=4c5b8fd7fd28a09c10ed483e51dd975a
- ICA: https://youtu.be/GgLaP4Des1Q
- Example code: https://github.com/ShawhinT/YouTube/tree/main/pca
Resources I found helpful:
- R. Bro, A. K. Smilde, Anal. Methods, 2014,6, 2812-2831
- Golden, R. (2020). Statistical machine learning: A unified framework. Boca Raton: CRC Press C.
Introduction - 0:00
An analogy - 0:43
PCA - 2:20
Some math - 3:22
Recap - 9:32
Example: S&P 500 index fund - 10:52
Closing remarks - 14:10
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Shaw Talebi · Shaw Talebi · 6 of 60
1
2
3
4
5
▶
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
biometricDashboard2 DEMO
Shaw Talebi
biometricDahboard3 DEMO
Shaw Talebi
Time Series, Signals, & the Fourier Transform | Introduction
Shaw Talebi
The Fast Fourier Transform | How does it (actually) work?
Shaw Talebi
The Wavelet Transform | Introduction & Example Code
Shaw Talebi
Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Shaw Talebi
Independent Component Analysis (ICA) | EEG Analysis Example Code
Shaw Talebi
Kmeans-based Blink Detecter DEMO
Shaw Talebi
Shit Happens, Stay Solution Oriented
Shaw Talebi
Why Conflict Is Good & How You Can Use It
Shaw Talebi
Causality: An Introduction | How (naive) statistics can fail us
Shaw Talebi
Causal Inference | Answering causal questions
Shaw Talebi
Causal Discovery | Inferring causality from observational data
Shaw Talebi
How to Be Antifragile | 7 Practical Tips
Shaw Talebi
Multi-kills: How to Do More With Less (no, not by multi-tasking)
Shaw Talebi
Topological Data Analysis (TDA) | An introduction
Shaw Talebi
The Mapper Algorithm | Overview & Python Example Code
Shaw Talebi
Persistent Homology | Introduction & Python Example Code
Shaw Talebi
What Is Data Science & How To Start? | A Beginner's Guide
Shaw Talebi
How to do MORE with LESS - multikills
Shaw Talebi
Causal Effects | An introduction
Shaw Talebi
Causal Effects via Propensity Scores | Introduction & Python Code
Shaw Talebi
Causal Effects via the Do-operator | Overview & Example
Shaw Talebi
Causal Effects via DAGs | How to Handle Unobserved Confounders
Shaw Talebi
Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Shaw Talebi
Causal Effects via Regression w/ Python Code
Shaw Talebi
5 Reasons Why Every Data Scientist Should Consider Freelancing
Shaw Talebi
An Introduction to Decision Trees | Gini Impurity & Python Code
Shaw Talebi
10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
Shaw Talebi
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Shaw Talebi
How to Make a Data Science Portfolio With GitHub Pages (2025)
Shaw Talebi
My $100,000+ Data Science Resume (what got me hired)
Shaw Talebi
How to Create a Custom Email Signature in Gmail (2025)
Shaw Talebi
I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned
Shaw Talebi
Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Shaw Talebi
A Practical Introduction to Large Language Models (LLMs)
Shaw Talebi
The OpenAI (Python) API | Introduction & Example Code
Shaw Talebi
The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
Shaw Talebi
Why I Quit My $150,000 Data Science Job
Shaw Talebi
Prompt Engineering: How to Trick AI into Solving Your Problems
Shaw Talebi
The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
Shaw Talebi
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Shaw Talebi
How to Build an LLM from Scratch | An Overview
Shaw Talebi
I Have 90 Days to Make $10k/mo—Here's my plan
Shaw Talebi
I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.
Shaw Talebi
Pareto, Power Laws, and Fat Tails
Shaw Talebi
Do NOT become an entrepreneur #entrepreneurship
Shaw Talebi
Detecting Power Laws in Real-world Data | w/ Python Code
Shaw Talebi
How I’d learn data analytics (if I had to start over in 2024) #dataanalytics
Shaw Talebi
4 Ways to Measure Fat Tails with Python (+ Example Code)
Shaw Talebi
Fine-tuning EXPLAINED in 40 sec #generativeai
Shaw Talebi
How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
Shaw Talebi
5 Questions Every Data Scientist Should Hardcode into Their Brain
Shaw Talebi
AI for Business: A (non-technical) introduction
Shaw Talebi
LLMs EXPLAINED in 60 seconds #ai
Shaw Talebi
3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
Shaw Talebi
What is #ai? — Simply Explained
Shaw Talebi
QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
Shaw Talebi
How to Improve LLMs with RAG (Overview + Python Code)
Shaw Talebi
Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Shaw Talebi
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
How to Learn a Hard Technical Skill Without Burning Out
Dev.to · Anas Kalthoum | FreeBrain
After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.
Medium · Machine Learning
How AI Learns with Less Labeled Data
Medium · Machine Learning
Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2
Medium · JavaScript
🎓
Tutor Explanation
DeepCamp AI