Equivariant Neural Networks | Part 3/3 - Transformers and GNNs
Skills:
ML Maths Basics70%
Key Takeaways
The video discusses Equivariant Neural Networks, specifically focusing on Transformers and Graph Neural Networks (GNNs), and explores papers such as SchNet, SE(3) Transformer, and Tensor Field Networks.
Full Transcript
welcome back to the final part of this active variant deep learning Series in this video we will have a look at how equivariants can be baked into the architecture of Transformers and gnns this means in the following minutes we will focus on sets graphs and point clouds in fact one of the main motivations for me was to explain how the se3 Transformer works I didn't think it would be such a long way from group Theory oversteerability and other mathematical Concepts to eventually understand this paper but here we are and I hope that some of the following is interesting for you guys out there so let's get into it Point clouds and graphs fall under the umbrella of sets because there is simply a collection of data points that belong together you can also see text as a set with each token being a data point and if we give this set a specific structure we end up with a graph or a point Cloud if we also add coordinates so all of these data structures are somehow connected Transformers and gnns are a reasonable Choice when working with such data modalities Transformer models don't really have a strong inductive bias like cnns do for example they are quite flexible regarding the data they are able to model which is maybe one of the reasons why they are so successful of course given the constraint of having enough data on sparse the data sets it can make sense to introduce additional biases like done in the se3 Transformer gnns on the other hand require craft shape data because otherwise there will be no message passing between different data instances on graph data you can also add coordinates think of three-dimensional molecular graphs for example both of these model types already carry especially equivariance namely permutation Equity variants Transformers and gnns are not sensitive to the order of the inputs and they're also able to handle varying sized inputs they are however not rotation equivalent especially in the context of Point clouds considering these equivalences can benefit many applications such as physical particle simulations molecular predictions or also geometry data sets therefore in this video we will dive into some architectures that bake in 3D equivariants for this we will first have a look at some foundational papers like schnet or tensor field Networks for the following we focus on the 3D setting which means besides other information each of our data points also has a three-dimensional coordinate Vector as features this could for example represent a protein graph the basic idea is that we want the model to ignore the absolute values and instead use the relative information between the different data points such that if we rotate the outputs rotate as well without paying attention to the absolute coordinate vectors in other words the outputs should behave predictably under geometric transformations the advantage is that the model doesn't need to learn how Global rotations affect the features which simplifies the learning process an interesting thing I read is that rotations commute in 2D but not in 3D what this means is that the order of operations is actually relevant in 3D which makes constructing rotation Equity varying neural networks even more difficult I constructed a simple example to visualize this let's say we have this little box with a blue arrow that points into our Direction on the left we see the coordinate system with three dimensions now we perform a 45 degree rotation on the y-axis which makes the blue arrow point to the left after that we perform a 180 degree rotation on the set axis the blue arrow is now on the bottom of the box and points to the right I slightly shaded the arrow to indicate that it's not on the top anymore now let's do the same but with a different order of operations we start with the 180 degree flip and after that we perform a 45 degree rotation as we can see the blue arrow is also on the bottom but instead points into the opposite direction that's why 3D rotations are not commutative this has some implications or rather restrictions on the way these models need to be designed as we will see later in the literature there have been different ideas to introduce 3D active variants to neural networks one of the earlier ones is called schnet it introduces continuous filters that utilize pairwise distances between points to allow for rotation equivariants continuous here simply means that no discrete grid is used and instead the filter is defined on All Points which is a bit like the approach of Point Nets and craft neural networks this type of convolution is also called point convolution r i and J represent the 3D coordinates and the difference or distance between them is what makes the model invariant to rotations the distance stays constant independent of the orientation this actually makes this model rotation invariance but not equivalent a downside of this is that you lose directional information of the vectors which for example makes it impossible to distinguish mirrored versions of objects or to predict directional forces tensor field networks are now a combination of these continuous filters from Schnitz and the idea of basis functions from harmonic networks presented in the last video unlike Schnitz tfns are rotation equivariants recall that harmonic networks used circular harmonics in 2D and in tensor field networks the equivalent for 3D is used namely spherical harmonics the idea is therefore based on steerability as the filters in t events are composed of spherical harmonics we will see in a second what exactly is meant by this the name tensor field networks comes from the fact that the inputs and outputs are n-dimensional tensor fields let's say we have a 3D Vector plus a one hot encoded variable as input features in the following there will be a distinction regarding the type of features type 1 refers to 3D coordinates and type 0 to rotation in variant scalars such as the node type these types later determine how the different parts of such multi-dimensional arrays behave under rotation for example type zero vectors are rotation invariance later we will also see the term rotation order which corresponds to these types the output of tfn layers will be another concatenates its multi-dimensional tensor of different types with clearly defined rotation behavior let's dig a little bit deeper into the math to see what is meant by this in order to understand tensor field networks there are three mathematical terms we need to familiarize with first of all spherical harmonics they are a set of functions which are especially popular in physics you can for example use them to simulate sound waves or the bounds of a ball in general they are geometric functions which means they map from a geometric point to a value the beautiful property here is that they are equivariants to rotations that means if we rotate the coordinate system the output signal stays the same so that might come in handy if we want to design a covariant neural networks mathematically they are defined by the following formula and I've linked a video in the description which derives this expression in case you are interested spherical harmonics form an orthonormal basis and as a result any function on a sphere can be defined as the sum of these space functions it's just like with Fourier transform which decomposes periodic signals additional functions using spherical harmonics is a property that tensor field networks will make use of in the chart on the left you can see a visualization of these harmonics where the rows represent the degree and the columns the order m the two colors represent if the function values are positive or negative this kind of visualization can be a bit confusing at first because it's not really clear what these orbits mean but simply remember that these things are just functions there's also a second way to visualize this this example is a spherical harmonic of degree 2 corresponding to three dimensions the input to the function are x y and C coordinates of the data points and the harmonic tells us which function value will be assigned the next term we need to discuss are we ignor D matrices I mentioned before that we distinguish different types of features when dealing with rotations the weaker D matrices tell us exactly how these types transform under rotation because they are the e-wraps so irreducible representations of the group SO3 they can be decomposed into a block diagonal form and the interesting part is that the orthonormal subspaces of spherical harmonics so the basis functions correspond to these wigner D matrices in practice we can now construct tensors based on a combination of circle harmonics and additionally know how they transform under rotation using these wigner D matrices this allows us to follow the rules of equivariance finally collapse Gordon coefficients these coefficients become interesting once you aim to calculate the product of these composed tensors let's say you use this combination of spherical harmonics from above to build a vector this is what we called a fiber in the last video in parts of the network it might be necessary to multiply fibers the collapsed Gordon coefficients tell us eventually which parts we need to multiply with each other so basically they Define the multiplication rules for different tensor types so these three concepts are used within tensor field networks and hopefully this high level overview was sufficient to gain some intuition about the terms probably it was not mathematically precise and therefore please also take a look into a few other resources I've added in the video description now let's take a look at the final layer definition of tensor field Networks in order to design 3D rotation equivariant filters the idea in tfns is to use a composition of the rotation equivalent spherical harmonics this makes the filters symmetric so the filter definition looks like this where why are the spherical harmonics and are a set of learnable parameters also called a radial function this radial function is implemented as a neural network essentially this defines how the composition of basis functions looks like there's a bunch of other symbols which I've added here for completeness but this is not too important for now the full layer definition of tensor field networks is described by this formula here we can find the filter which I've just talked about and the input for this filter is the difference between Vector A and B A is here the central points and B are all other points within the point clouds for each of these points we have a feature Vector denoted with we finally the collapse Gordon coefficients tell us how to combine the different fibers in a meaningful way so to summarize this there are three things happening first a continuous point convolution that takes all other points into account second the filters are constrained to be a learnable radial function combined with spherical harmonics and finally tensor algebra is used to combine different vectors here is also whether non-communitivity of 3D is considered because this approach is slightly different from what is done in harmonic Networks interested in some Hands-On action using tfns I've linked a Jupiter notebook in the video description it mixes some coding with visual explanations and I think it's a great resource to learn more about these models now let's finally move on to the model I originally was interested in the se3 Transformer as you by now know se3 stands for spherical euclidean group and represents translations and rotations in 3D it turns out that this model is heavily based on tensor field networks as we will see in a second you can also see maxwelling on the list of authors who heavily influenced the field of graph neural networks this paper on one hand presented in equivariant attention mechanism and on the other hand combines it with graph neural networks there's a great visual summary of that which I took from the paper step one is to introduce local neighborhoods and treat them as a graph this means for each Center Point All Points within a certain radius based on their 3D distance are selected the motivation behind this is to make the attention mechanism more scalable because otherwise it has quadratic complexity as each point needs to attend to all other points the next step is where the tensor field networks comes into play instead of using regular weight matrices as in a plane Transformer the space of learnable functions is limited to rotation equivariant kernels based on tensor field Networks this means that all of the components we've learned about before are used here spherical harmonics a learnable radial Network and collapse Gordon coefficients using these new kernels we are able to override the Transformer architecture for Keys queries and values we have a separate weight Matrix that transforms the features in an equivariant manner Additionally the nearest neighbor graph is used to select points that are used for the attention mechanism finally the attention scores are calculated as usual namely as a DOT product of queries keys and normalized using the softmax function so the overall trick here is really to replace all weight matrices with equivariant kernels that were presented in tensor field Networks as a result the whole attention mechanism is 3D rotation and translation Equity variants if you are interested in playing around with this there's a public repository with an implementation from the authors you can dive into the model implementation and get a deeper understanding about how things are put into code for example how fibers are implemented or how the forward function of the model looks like they also point to an updated implementation by Nvidia which speeds up calculations significantly that's all for this video and also the whole equivarian deep learning series of course there are many other interesting models I couldn't talk about here but I would argue that's the aggregated knowledge of this video series is sufficient to understand most of the models out there I hope you gained some useful insights from this and would be happy to see you again in a future video foreign
Original Description
▬▬ Papers / Resources ▬▬▬
SchNet: https://arxiv.org/abs/1706.08566
SE(3) Transformer: https://arxiv.org/abs/2006.10503
Tensor Field Network: https://arxiv.org/abs/1802.08219
Spherical Harmonics Youtube Video: https://www.youtube.com/watch?v=EcKgJhFdtEY&ab_channel=BJBodner
Spherical Harmonics Formula: https://www.youtube.com/watch?v=5PMqf3Hj-Aw&ab_channel=ProfessorMdoesScience
Tensor Field Network Jupyter Notebook: https://github.com/UPEIChemistry/tensor-field-networks/blob/master/tutorials/tutorial.ipynb
SE(3) Repo: https://github.com/FabianFuchsML/se3-transformer-public/
NVIDIA Updated Version: https://developer.nvidia.com/blog/accelerating-se3-transformers-training-using-an-nvidia-open-source-model-implementation/
▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬
Music from #Uppbeat (free for Creators!):
https://uppbeat.io/t/yokonap/birds
License code: WXVHOOZRRWDUCKIU
▬▬ Used Icons ▬▬▬▬▬▬▬▬▬▬
All Icons are from flaticon: https://www.flaticon.com/authors/freepik
▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction
00:43 Points, Graphs and Sets
01:11 Inductive Biases & Equivariance
03:15 3D is not commutative
04:38 SchNet
05:48 Tensor Field Networks
07:17 Math Terminology
12:47 Hands on TFNs
13:08 SE(3) Transformer
15:24 Hands on SE(3) Transf
▬▬ Support me if you like 🌟
►Link to this channel: https://bit.ly/3zEqL1W
►Support me on Patreon: https://bit.ly/2Wed242
►Buy me a coffee on Ko-Fi: https://bit.ly/3kJYEdl
►E-Mail: deepfindr@gmail.com
▬▬ My equipment 💻
- Microphone: https://amzn.to/3DVqB8H
- Microphone mount: https://amzn.to/3BWUcOJ
- Monitors: https://amzn.to/3G2Jjgr
- Monitor mount: https://amzn.to/3AWGIAY
- Height-adjustable table: https://amzn.to/3aUysXC
- Ergonomic chair: https://amzn.to/3phQg7r
- PC case: https://amzn.to/3jdlI2Y
- GPU: https://amzn.to/3AWyzwy
- Keyboard: https://amzn.to/2XskWHP
- Bluelight filter glasses: https://amzn.to/3pj0fK2
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from DeepFindr · DeepFindr · 48 of 56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
▶
49
50
51
52
53
54
55
56
Understanding Graph Neural Networks | Part 1/3 - Introduction
DeepFindr
Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants
DeepFindr
Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit
DeepFindr
Node Classification on Knowledge Graphs using PyTorch Geometric
DeepFindr
Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics
DeepFindr
Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch
DeepFindr
Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI
DeepFindr
How to use edge features in Graph Neural Networks (and PyTorch Geometric)
DeepFindr
Explainable AI explained! | #1 Introduction
DeepFindr
Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML
DeepFindr
Explainable AI explained! | #3 LIME
DeepFindr
Explainable AI explained! | #4 SHAP
DeepFindr
Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks
DeepFindr
Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data
DeepFindr
Understanding Graph Attention Networks
DeepFindr
GNN Project #1 - Introduction to HIV dataset
DeepFindr
GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric
DeepFindr
GNN Project #3.2 - Graph Transformer
DeepFindr
GNN Project #4.1 - Graph Variational Autoencoders
DeepFindr
GNN Project #4.2 - GVAE Training and Adjacency reconstruction
DeepFindr
GNN Project #4.3 - One-shot molecule generation - Part 1
DeepFindr
GNN Project #4.3 - Code explanation
DeepFindr
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2
DeepFindr
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2
DeepFindr
How to explain Graph Neural Networks (with XAI)
DeepFindr
Explaining Twitch Predictions with GNNExplainer
DeepFindr
Python Graph Neural Network Libraries (an Overview)
DeepFindr
Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)
DeepFindr
Traffic Forecasting with Pytorch Geometric Temporal
DeepFindr
Fraud Detection with Graph Neural Networks
DeepFindr
Fake News Detection using Graphs with Pytorch Geometric
DeepFindr
Recommender Systems using Graph Neural Networks
DeepFindr
How to handle Uncertainty in Deep Learning #1.1
DeepFindr
How to handle Uncertainty in Deep Learning #1.2
DeepFindr
How to handle Uncertainty in Deep Learning #2.1
DeepFindr
How to handle Uncertainty in Deep Learning #2.2
DeepFindr
Converting a Tabular Dataset to a Graph Dataset for GNNs
DeepFindr
Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs
DeepFindr
How to get started with Data Science (Career tracks and advice)
DeepFindr
Causality and (Graph) Neural Networks
DeepFindr
Diffusion models from scratch in PyTorch
DeepFindr
Self-/Unsupervised GNN Training
DeepFindr
Contrastive Learning in PyTorch - Part 1: Introduction
DeepFindr
Contrastive Learning in PyTorch - Part 2: CL on Point Clouds
DeepFindr
State of AI 2022 - My Highlights
DeepFindr
Equivariant Neural Networks | Part 1/3 - Introduction
DeepFindr
Equivariant Neural Networks | Part 2/3 - Generalized CNNs
DeepFindr
Equivariant Neural Networks | Part 3/3 - Transformers and GNNs
DeepFindr
Personalized Image Generation (using Dreambooth) explained!
DeepFindr
Vision Transformer Quick Guide - Theory and Code in (almost) 15 min
DeepFindr
LoRA explained (and a bit about precision and quantization)
DeepFindr
Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)
DeepFindr
Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)
DeepFindr
Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)
DeepFindr
t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)
DeepFindr
Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)
DeepFindr
More on: ML Maths Basics
View skill →Related AI Lessons
Chapters (10)
Introduction
0:43
Points, Graphs and Sets
1:11
Inductive Biases & Equivariance
3:15
3D is not commutative
4:38
SchNet
5:48
Tensor Field Networks
7:17
Math Terminology
12:47
Hands on TFNs
13:08
SE(3) Transformer
15:24
Hands on SE(3) Transf
🎓
Tutor Explanation
DeepCamp AI