UMAP explained | The best dimensionality reduction?

AI Coffee Break with Letitia · Beginner ·📄 Research Papers Explained ·5y ago

Key Takeaways

The video explains the Uniform Manifold Approximation and Projection (UMAP) algorithm for dimensionality reduction, comparing it to PCA and discussing its strengths and applications.

Full Transcript

hey we are back with the dimensionality reduction series in our last video of the series we talked about one way to escape the curse of dimensionality through an older algorithm called pca today we will talk about a newer and very popular dimensionality reduction algorithm called umap pca and umap are very different pca factorizes a matrix characterizing the data which puts it into company with algorithms like nmf or svd but you map like disney if you know it builds a neighbor graph in the original space of the data and tries to find a similar graph in lower dimensions but how does it do it umap stands for uniform manifold approximation and projection this sounds intimidating and the paper behind youmap can be even more intimidating but do not worry because we break it down for you the two steps of umap are high dimensional graph constructions and it's mapping to a lower dimensional graph the construction of this high dimensional graph is what makes umap so special compared to its competitors since it's hard to do it right and fast and the cool part about umap is that its steps are mathematically proven to work so first there was the data in the high dimensions and we want to approximate its shape or topology each data point is a so-called zero simplex and a certain theorem ensures that the shape of the data can be approximated when we connect these zero simplices which are our data points with their neighboring data points forming one or two or higher dimensional simplices and with this we can approximate the topology so all what we need to do is to make these connections for this the u-map algorithm extends a radius around each point and makes a connection between each point and its neighbors with intersecting radii so far the radii are equal but remember we want to approximate the shape of the data so we want a connected graph containing all our data points but this wish of ours brings in two problems firstly it often happens that in the data there are larger gaps where there is no next point to connect to in the graph this happens usually in low density regions secondly there are often high density regions where there are a lot of neighbors in the given radius and everything is way too connected this second problem gets even worse with the curse of dimensionality where in high dimensional spaces the distances between points become more and more similar okay then so if we have these two problems with a fixed radius then let's use a variable radius instead this choice is also mathematically supported by the definition of a romanian metric on the manifold but do not worry about that just keep in mind that there is math proving that the choice of a variable radius does not cause any trouble so now the radius is greater in low density regions and smaller in high density regions but u-map does not estimate density directly as a number but uses a proxy the density is estimated to be higher when the k-th nearest neighbor is close and lower when the k-th nearest neighbor is far away notice that this k in k nearest neighbor is a hyper parameter that we need to choose because with its help umap makes a density estimation to find the right local radius if k is big then more global structure is preserved if k is small then the radius decreases and the local structure is more preserved so the right k could give the perfect balance between local and global structure preservation but there are rarely any recipes for finding the optimum automatically some trial and error is required since k depends on each data set individually but not all k nearest neighbors are equal since each have different distances from the point we are looking at then the connections between each point and their neighbors get a weight a connection probability where points which are far away are weighted less and lower connection probability now that this high dimensional graph is constructed it is ready to be projected to lower dimensions this graph projection algorithm is too much for miss coffee bean to explain in detail in this video but you can imagine this projection as taking the high dimensional graph with their edges as being springs where each spring is stronger as the edge probability increases which means that points connected by high weighted edges are more likely to stay together in the lower dimensional space because the spring holds these points together and perhaps interesting to notice is that these spring forces are rotationally symmetric which leads to clusters sometimes landing on one side after one new map run and on the other side after another projection so umap has two main strengths over the famous graph based dimensionality reduction technique called disney it is faster due to its optimizations and strong mathematical foundations and it has also a better balance between locality and globality in clustering take for example this visualization from the awesome blog from google pair linked below we have this mammoth in 3d on the left and we can see side by side how umap and disney map these 3d mammoths into two dimensions we can play around with the number of neighbors taken into account when constructing the high dimensional graph and we can clearly see how low numbers focus on the local structure while higher numbers more on the global structure the minimum distance parameter allows to specify how tightly the algorithm will map points into the target low dimensional space a high minimum distance will spread the points more but it is important to notice that a stepwise change of these two parameters continuously changes the umap result disney on the other side is not that great in this aspect because when changing the parameter of this knee disney's result completely changes we really recommend you to play around yourself with all examples in this blog post so far we have seen examples where umap maps from 3d to 2d but the visualizations we have seen so far are toy examples they're just for us to get an intuition about the inner workings of the umap dimensionality reduction algorithm what umap excels at is reducing from a lot of dimensions here is a real world example of 764 dimensional mnist data containing handwritten digits it could be nice if we could reduce their dimensions to two or three dimensions so we can visualize this pixel space the digits are living in for this we can write a little python code to load the mnist data to load the umap package for dimensionality reduction and a visualization package of your liking we like baby plots and you will see why we read in the data and we see we have 60 000 training instances of 28 times 28 pixels which are together the 784 dimensions we plan to reduce from for reducing we fit and apply the umap algorithm and we do it once for two dimensions and again for three dimensions we reduced to 2d and 3d to show you what the cool thing baby plots can do it takes both the 3d and 2d embedding and can animate a transition between the two how cool is that hereby we can see that umap could already cluster almost all handwritten digits together meaning that umap here worked as an unsupervised clustering algorithm also we can see how useful a 3d visualization can be over just 2d where more complicated structures and relations can be visualized if you want to visualize these things in 3d yourself in either r javascript or python and load your interactive 3d plots into a powerpoint presentation to show to everybody check out the babyplot's website this was it from miss coffeebean read the paper if you're interested in the mathematical theory and proofs behind you map find it linked in the description below or watch the first author of the yuma paper presenting his umap invention linked below now go and reduce your dimensions with umap

Original Description

UMAP explained! The great dimensionality reduction algorithm in one video with a lot of visualizations and a little code. Uniform Manifold Approximation and Projection for all! ➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/ 📺 PCA video: https://youtu.be/3AUfWllnO7c 📺 Curse of dimensionality video: https://youtu.be/4v7ngaiFdp4 💻 Babyplots interactive 3D visualization in R, Python, Javascript with PowerPoint Add-in! Check it out at https://bp.bleb.li/ ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕ Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ Outline: * 00:00 UMAP intro * 01:31 Graph construction * 04:49 Graph projection * 05:48 UMAP vs. t-SNE visualized * 07:31 Code * 08:12 Babyplots 📚 Coenen, Pearce | Google Pair blog: https://pair-code.github.io/understanding-umap/ 📄 UMAP paper: McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. https://arxiv.org/abs/1802.03426 📺 Leland McInnes talk  @enthought : https://youtu.be/nq6iPZVUxZU 🎵 Music (intro and outro): Dakar Flow - Carmen María and Edu Espinal ------------------------------- 🔗 Links: YouTube: https://www.youtube.com/AICoffeeBreak Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ #AICoffeeBreak #MsCoffeeBean #UMAP #MachineLearning #research #AI
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Coffee Break with Letitia · AI Coffee Break with Letitia · 28 of 60

1 AI Coffee Break - Channel Trailer
AI Coffee Break - Channel Trailer
AI Coffee Break with Letitia
2 How to check if a neural network has learned a specific phenomenon?
How to check if a neural network has learned a specific phenomenon?
AI Coffee Break with Letitia
3 A brief history of the Transformer architecture in NLP
A brief history of the Transformer architecture in NLP
AI Coffee Break with Letitia
4 Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop
Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop
AI Coffee Break with Letitia
5 The Transformer neural network architecture EXPLAINED. “Attention is all you need”
The Transformer neural network architecture EXPLAINED. “Attention is all you need”
AI Coffee Break with Letitia
6 Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision
Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision
AI Coffee Break with Letitia
7 Pre-training of BERT-based Transformer architectures explained – language and vision!
Pre-training of BERT-based Transformer architectures explained – language and vision!
AI Coffee Break with Letitia
8 GPT-3 explained with examples. Possibilities, and implications.
GPT-3 explained with examples. Possibilities, and implications.
AI Coffee Break with Letitia
9 Adversarial Machine Learning explained! | With examples.
Adversarial Machine Learning explained! | With examples.
AI Coffee Break with Letitia
10 BERTology meets Biology | Solving biological problems with Transformers
BERTology meets Biology | Solving biological problems with Transformers
AI Coffee Break with Letitia
11 Can a neural network tell if an image is mirrored? – Visual Chirality
Can a neural network tell if an image is mirrored? – Visual Chirality
AI Coffee Break with Letitia
12 The ultimate intro to Graph Neural Networks. Maybe.
The ultimate intro to Graph Neural Networks. Maybe.
AI Coffee Break with Letitia
13 Can language models understand? Bender and Koller argument.
Can language models understand? Bender and Koller argument.
AI Coffee Break with Letitia
14 GANs explained | Generative Adversarial Networks video with showcase!
GANs explained | Generative Adversarial Networks video with showcase!
AI Coffee Break with Letitia
15 What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.
What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.
AI Coffee Break with Letitia
16 Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS
Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS
AI Coffee Break with Letitia
17 Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES
Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES
AI Coffee Break with Letitia
18 An image is worth 16x16 words: ViT | Vision Transformer explained
An image is worth 16x16 words: ViT | Vision Transformer explained
AI Coffee Break with Letitia
19 AI understanding language!? A roadmap to natural language understanding.
AI understanding language!? A roadmap to natural language understanding.
AI Coffee Break with Letitia
20 "What Can We Do to Improve Peer Review in NLP?" 👀
"What Can We Do to Improve Peer Review in NLP?" 👀
AI Coffee Break with Letitia
21 The curse of dimensionality. Or is it a blessing?
The curse of dimensionality. Or is it a blessing?
AI Coffee Break with Letitia
22 PCA explained with intuition, a little math and code
PCA explained with intuition, a little math and code
AI Coffee Break with Letitia
23 Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper
Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper
AI Coffee Break with Letitia
24 OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.
OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.
AI Coffee Break with Letitia
25 Leaking training data from GPT-2. How is this possible?
Leaking training data from GPT-2. How is this possible?
AI Coffee Break with Letitia
26 OpenAI’s CLIP explained! | Examples, links to code and pretrained model
OpenAI’s CLIP explained! | Examples, links to code and pretrained model
AI Coffee Break with Letitia
27 Transformers can do both images and text. Here is why.
Transformers can do both images and text. Here is why.
AI Coffee Break with Letitia
UMAP explained | The best dimensionality reduction?
UMAP explained | The best dimensionality reduction?
AI Coffee Break with Letitia
29 NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean
NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean
AI Coffee Break with Letitia
30 Transformer in Transformer: Paper explained and visualized | TNT
Transformer in Transformer: Paper explained and visualized | TNT
AI Coffee Break with Letitia
31 [RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?
[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?
AI Coffee Break with Letitia
32 Pattern Exploiting Training explained! | PET, iPET, ADAPET
Pattern Exploiting Training explained! | PET, iPET, ADAPET
AI Coffee Break with Letitia
33 Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED
Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED
AI Coffee Break with Letitia
34 FNet: Mixing Tokens with Fourier Transforms – Paper Explained
FNet: Mixing Tokens with Fourier Transforms – Paper Explained
AI Coffee Break with Letitia
35 Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained
Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained
AI Coffee Break with Letitia
36 "Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.
"Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.
AI Coffee Break with Letitia
37 Scaling Vision Transformers? How much data can a transformer get? #Shorts
Scaling Vision Transformers? How much data can a transformer get? #Shorts
AI Coffee Break with Letitia
38 How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]
How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]
AI Coffee Break with Letitia
39 Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained
AI Coffee Break with Letitia
40 Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.
Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.
AI Coffee Break with Letitia
41 Adding vs. concatenating positional embeddings & Learned positional encodings
Adding vs. concatenating positional embeddings & Learned positional encodings
AI Coffee Break with Letitia
42 Self-Attention with Relative Position Representations – Paper explained
Self-Attention with Relative Position Representations – Paper explained
AI Coffee Break with Letitia
43 Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts
Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
44 What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts
What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
45 Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes
Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes
AI Coffee Break with Letitia
46 Is today's AI smarter than YOU? #Shorts
Is today's AI smarter than YOU? #Shorts
AI Coffee Break with Letitia
47 Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts
Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts
AI Coffee Break with Letitia
48 Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts
Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
49 The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts
The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
50 How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts
How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
51 What is tokenization and how does it work? Tokenizers explained.
What is tokenization and how does it work? Tokenizers explained.
AI Coffee Break with Letitia
52 Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”
Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”
AI Coffee Break with Letitia
53 How modern search engines work – Vector databases explained! | Weaviate open-source
How modern search engines work – Vector databases explained! | Weaviate open-source
AI Coffee Break with Letitia
54 Eyes tell all: How to tell that an AI generated a face?
Eyes tell all: How to tell that an AI generated a face?
AI Coffee Break with Letitia
55 Swin Transformer paper animated and explained
Swin Transformer paper animated and explained
AI Coffee Break with Letitia
56 Data BAD | What Will it Take to Fix Benchmarking for NLU?
Data BAD | What Will it Take to Fix Benchmarking for NLU?
AI Coffee Break with Letitia
57 SimVLM explained | What the paper doesn’t tell you
SimVLM explained | What the paper doesn’t tell you
AI Coffee Break with Letitia
58 Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?
Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?
AI Coffee Break with Letitia
59 Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz
Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz
AI Coffee Break with Letitia
60 The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?
The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?
AI Coffee Break with Letitia

The video explains UMAP, a dimensionality reduction algorithm that constructs a graph in high-dimensional space and projects it to a lower-dimensional space, allowing for visualization and clustering of high-dimensional data.

Key Takeaways
  1. Load a high-dimensional dataset
  2. Apply UMAP to reduce dimensions
  3. Visualize the resulting lower-dimensional data using Babyplots or other tools
  4. Tune hyperparameters such as the number of neighbors and minimum distance to optimize results
💡 UMAP's ability to balance local and global structure preservation makes it a powerful tool for dimensionality reduction and clustering.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →