UMAP explained | The best dimensionality reduction?
Key Takeaways
The video explains the Uniform Manifold Approximation and Projection (UMAP) algorithm for dimensionality reduction, comparing it to PCA and discussing its strengths and applications.
Full Transcript
hey we are back with the dimensionality reduction series in our last video of the series we talked about one way to escape the curse of dimensionality through an older algorithm called pca today we will talk about a newer and very popular dimensionality reduction algorithm called umap pca and umap are very different pca factorizes a matrix characterizing the data which puts it into company with algorithms like nmf or svd but you map like disney if you know it builds a neighbor graph in the original space of the data and tries to find a similar graph in lower dimensions but how does it do it umap stands for uniform manifold approximation and projection this sounds intimidating and the paper behind youmap can be even more intimidating but do not worry because we break it down for you the two steps of umap are high dimensional graph constructions and it's mapping to a lower dimensional graph the construction of this high dimensional graph is what makes umap so special compared to its competitors since it's hard to do it right and fast and the cool part about umap is that its steps are mathematically proven to work so first there was the data in the high dimensions and we want to approximate its shape or topology each data point is a so-called zero simplex and a certain theorem ensures that the shape of the data can be approximated when we connect these zero simplices which are our data points with their neighboring data points forming one or two or higher dimensional simplices and with this we can approximate the topology so all what we need to do is to make these connections for this the u-map algorithm extends a radius around each point and makes a connection between each point and its neighbors with intersecting radii so far the radii are equal but remember we want to approximate the shape of the data so we want a connected graph containing all our data points but this wish of ours brings in two problems firstly it often happens that in the data there are larger gaps where there is no next point to connect to in the graph this happens usually in low density regions secondly there are often high density regions where there are a lot of neighbors in the given radius and everything is way too connected this second problem gets even worse with the curse of dimensionality where in high dimensional spaces the distances between points become more and more similar okay then so if we have these two problems with a fixed radius then let's use a variable radius instead this choice is also mathematically supported by the definition of a romanian metric on the manifold but do not worry about that just keep in mind that there is math proving that the choice of a variable radius does not cause any trouble so now the radius is greater in low density regions and smaller in high density regions but u-map does not estimate density directly as a number but uses a proxy the density is estimated to be higher when the k-th nearest neighbor is close and lower when the k-th nearest neighbor is far away notice that this k in k nearest neighbor is a hyper parameter that we need to choose because with its help umap makes a density estimation to find the right local radius if k is big then more global structure is preserved if k is small then the radius decreases and the local structure is more preserved so the right k could give the perfect balance between local and global structure preservation but there are rarely any recipes for finding the optimum automatically some trial and error is required since k depends on each data set individually but not all k nearest neighbors are equal since each have different distances from the point we are looking at then the connections between each point and their neighbors get a weight a connection probability where points which are far away are weighted less and lower connection probability now that this high dimensional graph is constructed it is ready to be projected to lower dimensions this graph projection algorithm is too much for miss coffee bean to explain in detail in this video but you can imagine this projection as taking the high dimensional graph with their edges as being springs where each spring is stronger as the edge probability increases which means that points connected by high weighted edges are more likely to stay together in the lower dimensional space because the spring holds these points together and perhaps interesting to notice is that these spring forces are rotationally symmetric which leads to clusters sometimes landing on one side after one new map run and on the other side after another projection so umap has two main strengths over the famous graph based dimensionality reduction technique called disney it is faster due to its optimizations and strong mathematical foundations and it has also a better balance between locality and globality in clustering take for example this visualization from the awesome blog from google pair linked below we have this mammoth in 3d on the left and we can see side by side how umap and disney map these 3d mammoths into two dimensions we can play around with the number of neighbors taken into account when constructing the high dimensional graph and we can clearly see how low numbers focus on the local structure while higher numbers more on the global structure the minimum distance parameter allows to specify how tightly the algorithm will map points into the target low dimensional space a high minimum distance will spread the points more but it is important to notice that a stepwise change of these two parameters continuously changes the umap result disney on the other side is not that great in this aspect because when changing the parameter of this knee disney's result completely changes we really recommend you to play around yourself with all examples in this blog post so far we have seen examples where umap maps from 3d to 2d but the visualizations we have seen so far are toy examples they're just for us to get an intuition about the inner workings of the umap dimensionality reduction algorithm what umap excels at is reducing from a lot of dimensions here is a real world example of 764 dimensional mnist data containing handwritten digits it could be nice if we could reduce their dimensions to two or three dimensions so we can visualize this pixel space the digits are living in for this we can write a little python code to load the mnist data to load the umap package for dimensionality reduction and a visualization package of your liking we like baby plots and you will see why we read in the data and we see we have 60 000 training instances of 28 times 28 pixels which are together the 784 dimensions we plan to reduce from for reducing we fit and apply the umap algorithm and we do it once for two dimensions and again for three dimensions we reduced to 2d and 3d to show you what the cool thing baby plots can do it takes both the 3d and 2d embedding and can animate a transition between the two how cool is that hereby we can see that umap could already cluster almost all handwritten digits together meaning that umap here worked as an unsupervised clustering algorithm also we can see how useful a 3d visualization can be over just 2d where more complicated structures and relations can be visualized if you want to visualize these things in 3d yourself in either r javascript or python and load your interactive 3d plots into a powerpoint presentation to show to everybody check out the babyplot's website this was it from miss coffeebean read the paper if you're interested in the mathematical theory and proofs behind you map find it linked in the description below or watch the first author of the yuma paper presenting his umap invention linked below now go and reduce your dimensions with umap
Original Description
UMAP explained! The great dimensionality reduction algorithm in one video with a lot of visualizations and a little code.
Uniform Manifold Approximation and Projection for all!
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
📺 PCA video: https://youtu.be/3AUfWllnO7c
📺 Curse of dimensionality video: https://youtu.be/4v7ngaiFdp4
💻 Babyplots interactive 3D visualization in R, Python, Javascript with PowerPoint Add-in! Check it out at https://bp.bleb.li/
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Outline:
* 00:00 UMAP intro
* 01:31 Graph construction
* 04:49 Graph projection
* 05:48 UMAP vs. t-SNE visualized
* 07:31 Code
* 08:12 Babyplots
📚 Coenen, Pearce | Google Pair blog: https://pair-code.github.io/understanding-umap/
📄 UMAP paper: McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. https://arxiv.org/abs/1802.03426
📺 Leland McInnes talk @enthought : https://youtu.be/nq6iPZVUxZU
🎵 Music (intro and outro): Dakar Flow - Carmen María and Edu Espinal
-------------------------------
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #UMAP #MachineLearning #research #AI
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from AI Coffee Break with Letitia · AI Coffee Break with Letitia · 28 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
▶
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
AI Coffee Break - Channel Trailer
AI Coffee Break with Letitia
How to check if a neural network has learned a specific phenomenon?
AI Coffee Break with Letitia
A brief history of the Transformer architecture in NLP
AI Coffee Break with Letitia
Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop
AI Coffee Break with Letitia
The Transformer neural network architecture EXPLAINED. “Attention is all you need”
AI Coffee Break with Letitia
Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision
AI Coffee Break with Letitia
Pre-training of BERT-based Transformer architectures explained – language and vision!
AI Coffee Break with Letitia
GPT-3 explained with examples. Possibilities, and implications.
AI Coffee Break with Letitia
Adversarial Machine Learning explained! | With examples.
AI Coffee Break with Letitia
BERTology meets Biology | Solving biological problems with Transformers
AI Coffee Break with Letitia
Can a neural network tell if an image is mirrored? – Visual Chirality
AI Coffee Break with Letitia
The ultimate intro to Graph Neural Networks. Maybe.
AI Coffee Break with Letitia
Can language models understand? Bender and Koller argument.
AI Coffee Break with Letitia
GANs explained | Generative Adversarial Networks video with showcase!
AI Coffee Break with Letitia
What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.
AI Coffee Break with Letitia
Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS
AI Coffee Break with Letitia
Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES
AI Coffee Break with Letitia
An image is worth 16x16 words: ViT | Vision Transformer explained
AI Coffee Break with Letitia
AI understanding language!? A roadmap to natural language understanding.
AI Coffee Break with Letitia
"What Can We Do to Improve Peer Review in NLP?" 👀
AI Coffee Break with Letitia
The curse of dimensionality. Or is it a blessing?
AI Coffee Break with Letitia
PCA explained with intuition, a little math and code
AI Coffee Break with Letitia
Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper
AI Coffee Break with Letitia
OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.
AI Coffee Break with Letitia
Leaking training data from GPT-2. How is this possible?
AI Coffee Break with Letitia
OpenAI’s CLIP explained! | Examples, links to code and pretrained model
AI Coffee Break with Letitia
Transformers can do both images and text. Here is why.
AI Coffee Break with Letitia
UMAP explained | The best dimensionality reduction?
AI Coffee Break with Letitia
NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean
AI Coffee Break with Letitia
Transformer in Transformer: Paper explained and visualized | TNT
AI Coffee Break with Letitia
[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?
AI Coffee Break with Letitia
Pattern Exploiting Training explained! | PET, iPET, ADAPET
AI Coffee Break with Letitia
Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED
AI Coffee Break with Letitia
FNet: Mixing Tokens with Fourier Transforms – Paper Explained
AI Coffee Break with Letitia
Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained
AI Coffee Break with Letitia
"Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.
AI Coffee Break with Letitia
Scaling Vision Transformers? How much data can a transformer get? #Shorts
AI Coffee Break with Letitia
How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]
AI Coffee Break with Letitia
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained
AI Coffee Break with Letitia
Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.
AI Coffee Break with Letitia
Adding vs. concatenating positional embeddings & Learned positional encodings
AI Coffee Break with Letitia
Self-Attention with Relative Position Representations – Paper explained
AI Coffee Break with Letitia
Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes
AI Coffee Break with Letitia
Is today's AI smarter than YOU? #Shorts
AI Coffee Break with Letitia
Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts
AI Coffee Break with Letitia
Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
What is tokenization and how does it work? Tokenizers explained.
AI Coffee Break with Letitia
Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”
AI Coffee Break with Letitia
How modern search engines work – Vector databases explained! | Weaviate open-source
AI Coffee Break with Letitia
Eyes tell all: How to tell that an AI generated a face?
AI Coffee Break with Letitia
Swin Transformer paper animated and explained
AI Coffee Break with Letitia
Data BAD | What Will it Take to Fix Benchmarking for NLU?
AI Coffee Break with Letitia
SimVLM explained | What the paper doesn’t tell you
AI Coffee Break with Letitia
Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?
AI Coffee Break with Letitia
Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz
AI Coffee Break with Letitia
The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?
AI Coffee Break with Letitia
More on: Unsupervised Learning
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI