Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

DeepFindr · Beginner ·📰 AI News & Updates ·3y ago

Skills: ML Maths Basics70%

Key Takeaways

The video discusses Equivariant Neural Networks, specifically focusing on Transformers and Graph Neural Networks (GNNs), and explores papers such as SchNet, SE(3) Transformer, and Tensor Field Networks.

Full Transcript

welcome back to the final part of this active variant deep learning Series in this video we will have a look at how equivariants can be baked into the architecture of Transformers and gnns this means in the following minutes we will focus on sets graphs and point clouds in fact one of the main motivations for me was to explain how the se3 Transformer works I didn't think it would be such a long way from group Theory oversteerability and other mathematical Concepts to eventually understand this paper but here we are and I hope that some of the following is interesting for you guys out there so let's get into it Point clouds and graphs fall under the umbrella of sets because there is simply a collection of data points that belong together you can also see text as a set with each token being a data point and if we give this set a specific structure we end up with a graph or a point Cloud if we also add coordinates so all of these data structures are somehow connected Transformers and gnns are a reasonable Choice when working with such data modalities Transformer models don't really have a strong inductive bias like cnns do for example they are quite flexible regarding the data they are able to model which is maybe one of the reasons why they are so successful of course given the constraint of having enough data on sparse the data sets it can make sense to introduce additional biases like done in the se3 Transformer gnns on the other hand require craft shape data because otherwise there will be no message passing between different data instances on graph data you can also add coordinates think of three-dimensional molecular graphs for example both of these model types already carry especially equivariance namely permutation Equity variants Transformers and gnns are not sensitive to the order of the inputs and they're also able to handle varying sized inputs they are however not rotation equivalent especially in the context of Point clouds considering these equivalences can benefit many applications such as physical particle simulations molecular predictions or also geometry data sets therefore in this video we will dive into some architectures that bake in 3D equivariants for this we will first have a look at some foundational papers like schnet or tensor field Networks for the following we focus on the 3D setting which means besides other information each of our data points also has a three-dimensional coordinate Vector as features this could for example represent a protein graph the basic idea is that we want the model to ignore the absolute values and instead use the relative information between the different data points such that if we rotate the outputs rotate as well without paying attention to the absolute coordinate vectors in other words the outputs should behave predictably under geometric transformations the advantage is that the model doesn't need to learn how Global rotations affect the features which simplifies the learning process an interesting thing I read is that rotations commute in 2D but not in 3D what this means is that the order of operations is actually relevant in 3D which makes constructing rotation Equity varying neural networks even more difficult I constructed a simple example to visualize this let's say we have this little box with a blue arrow that points into our Direction on the left we see the coordinate system with three dimensions now we perform a 45 degree rotation on the y-axis which makes the blue arrow point to the left after that we perform a 180 degree rotation on the set axis the blue arrow is now on the bottom of the box and points to the right I slightly shaded the arrow to indicate that it's not on the top anymore now let's do the same but with a different order of operations we start with the 180 degree flip and after that we perform a 45 degree rotation as we can see the blue arrow is also on the bottom but instead points into the opposite direction that's why 3D rotations are not commutative this has some implications or rather restrictions on the way these models need to be designed as we will see later in the literature there have been different ideas to introduce 3D active variants to neural networks one of the earlier ones is called schnet it introduces continuous filters that utilize pairwise distances between points to allow for rotation equivariants continuous here simply means that no discrete grid is used and instead the filter is defined on All Points which is a bit like the approach of Point Nets and craft neural networks this type of convolution is also called point convolution r i and J represent the 3D coordinates and the difference or distance between them is what makes the model invariant to rotations the distance stays constant independent of the orientation this actually makes this model rotation invariance but not equivalent a downside of this is that you lose directional information of the vectors which for example makes it impossible to distinguish mirrored versions of objects or to predict directional forces tensor field networks are now a combination of these continuous filters from Schnitz and the idea of basis functions from harmonic networks presented in the last video unlike Schnitz tfns are rotation equivariants recall that harmonic networks used circular harmonics in 2D and in tensor field networks the equivalent for 3D is used namely spherical harmonics the idea is therefore based on steerability as the filters in t events are composed of spherical harmonics we will see in a second what exactly is meant by this the name tensor field networks comes from the fact that the inputs and outputs are n-dimensional tensor fields let's say we have a 3D Vector plus a one hot encoded variable as input features in the following there will be a distinction regarding the type of features type 1 refers to 3D coordinates and type 0 to rotation in variant scalars such as the node type these types later determine how the different parts of such multi-dimensional arrays behave under rotation for example type zero vectors are rotation invariance later we will also see the term rotation order which corresponds to these types the output of tfn layers will be another concatenates its multi-dimensional tensor of different types with clearly defined rotation behavior let's dig a little bit deeper into the math to see what is meant by this in order to understand tensor field networks there are three mathematical terms we need to familiarize with first of all spherical harmonics they are a set of functions which are especially popular in physics you can for example use them to simulate sound waves or the bounds of a ball in general they are geometric functions which means they map from a geometric point to a value the beautiful property here is that they are equivariants to rotations that means if we rotate the coordinate system the output signal stays the same so that might come in handy if we want to design a covariant neural networks mathematically they are defined by the following formula and I've linked a video in the description which derives this expression in case you are interested spherical harmonics form an orthonormal basis and as a result any function on a sphere can be defined as the sum of these space functions it's just like with Fourier transform which decomposes periodic signals additional functions using spherical harmonics is a property that tensor field networks will make use of in the chart on the left you can see a visualization of these harmonics where the rows represent the degree and the columns the order m the two colors represent if the function values are positive or negative this kind of visualization can be a bit confusing at first because it's not really clear what these orbits mean but simply remember that these things are just functions there's also a second way to visualize this this example is a spherical harmonic of degree 2 corresponding to three dimensions the input to the function are x y and C coordinates of the data points and the harmonic tells us which function value will be assigned the next term we need to discuss are we ignor D matrices I mentioned before that we distinguish different types of features when dealing with rotations the weaker D matrices tell us exactly how these types transform under rotation because they are the e-wraps so irreducible representations of the group SO3 they can be decomposed into a block diagonal form and the interesting part is that the orthonormal subspaces of spherical harmonics so the basis functions correspond to these wigner D matrices in practice we can now construct tensors based on a combination of circle harmonics and additionally know how they transform under rotation using these wigner D matrices this allows us to follow the rules of equivariance finally collapse Gordon coefficients these coefficients become interesting once you aim to calculate the product of these composed tensors let's say you use this combination of spherical harmonics from above to build a vector this is what we called a fiber in the last video in parts of the network it might be necessary to multiply fibers the collapsed Gordon coefficients tell us eventually which parts we need to multiply with each other so basically they Define the multiplication rules for different tensor types so these three concepts are used within tensor field networks and hopefully this high level overview was sufficient to gain some intuition about the terms probably it was not mathematically precise and therefore please also take a look into a few other resources I've added in the video description now let's take a look at the final layer definition of tensor field Networks in order to design 3D rotation equivariant filters the idea in tfns is to use a composition of the rotation equivalent spherical harmonics this makes the filters symmetric so the filter definition looks like this where why are the spherical harmonics and are a set of learnable parameters also called a radial function this radial function is implemented as a neural network essentially this defines how the composition of basis functions looks like there's a bunch of other symbols which I've added here for completeness but this is not too important for now the full layer definition of tensor field networks is described by this formula here we can find the filter which I've just talked about and the input for this filter is the difference between Vector A and B A is here the central points and B are all other points within the point clouds for each of these points we have a feature Vector denoted with we finally the collapse Gordon coefficients tell us how to combine the different fibers in a meaningful way so to summarize this there are three things happening first a continuous point convolution that takes all other points into account second the filters are constrained to be a learnable radial function combined with spherical harmonics and finally tensor algebra is used to combine different vectors here is also whether non-communitivity of 3D is considered because this approach is slightly different from what is done in harmonic Networks interested in some Hands-On action using tfns I've linked a Jupiter notebook in the video description it mixes some coding with visual explanations and I think it's a great resource to learn more about these models now let's finally move on to the model I originally was interested in the se3 Transformer as you by now know se3 stands for spherical euclidean group and represents translations and rotations in 3D it turns out that this model is heavily based on tensor field networks as we will see in a second you can also see maxwelling on the list of authors who heavily influenced the field of graph neural networks this paper on one hand presented in equivariant attention mechanism and on the other hand combines it with graph neural networks there's a great visual summary of that which I took from the paper step one is to introduce local neighborhoods and treat them as a graph this means for each Center Point All Points within a certain radius based on their 3D distance are selected the motivation behind this is to make the attention mechanism more scalable because otherwise it has quadratic complexity as each point needs to attend to all other points the next step is where the tensor field networks comes into play instead of using regular weight matrices as in a plane Transformer the space of learnable functions is limited to rotation equivariant kernels based on tensor field Networks this means that all of the components we've learned about before are used here spherical harmonics a learnable radial Network and collapse Gordon coefficients using these new kernels we are able to override the Transformer architecture for Keys queries and values we have a separate weight Matrix that transforms the features in an equivariant manner Additionally the nearest neighbor graph is used to select points that are used for the attention mechanism finally the attention scores are calculated as usual namely as a DOT product of queries keys and normalized using the softmax function so the overall trick here is really to replace all weight matrices with equivariant kernels that were presented in tensor field Networks as a result the whole attention mechanism is 3D rotation and translation Equity variants if you are interested in playing around with this there's a public repository with an implementation from the authors you can dive into the model implementation and get a deeper understanding about how things are put into code for example how fibers are implemented or how the forward function of the model looks like they also point to an updated implementation by Nvidia which speeds up calculations significantly that's all for this video and also the whole equivarian deep learning series of course there are many other interesting models I couldn't talk about here but I would argue that's the aggregated knowledge of this video series is sufficient to understand most of the models out there I hope you gained some useful insights from this and would be happy to see you again in a future video foreign

Original Description

▬▬ Papers / Resources ▬▬▬ SchNet: https://arxiv.org/abs/1706.08566 SE(3) Transformer: https://arxiv.org/abs/2006.10503 Tensor Field Network: https://arxiv.org/abs/1802.08219 Spherical Harmonics Youtube Video: https://www.youtube.com/watch?v=EcKgJhFdtEY&ab_channel=BJBodner Spherical Harmonics Formula: https://www.youtube.com/watch?v=5PMqf3Hj-Aw&ab_channel=ProfessorMdoesScience Tensor Field Network Jupyter Notebook: https://github.com/UPEIChemistry/tensor-field-networks/blob/master/tutorials/tutorial.ipynb SE(3) Repo: https://github.com/FabianFuchsML/se3-transformer-public/ NVIDIA Updated Version: https://developer.nvidia.com/blog/accelerating-se3-transformers-training-using-an-nvidia-open-source-model-implementation/ ▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬ Music from #Uppbeat (free for Creators!): https://uppbeat.io/t/yokonap/birds License code: WXVHOOZRRWDUCKIU ▬▬ Used Icons ▬▬▬▬▬▬▬▬▬▬ All Icons are from flaticon: https://www.flaticon.com/authors/freepik ▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬ 00:00 Introduction 00:43 Points, Graphs and Sets 01:11 Inductive Biases & Equivariance 03:15 3D is not commutative 04:38 SchNet 05:48 Tensor Field Networks 07:17 Math Terminology 12:47 Hands on TFNs 13:08 SE(3) Transformer 15:24 Hands on SE(3) Transf ▬▬ Support me if you like 🌟 ►Link to this channel: https://bit.ly/3zEqL1W ►Support me on Patreon: https://bit.ly/2Wed242 ►Buy me a coffee on Ko-Fi: https://bit.ly/3kJYEdl ►E-Mail: deepfindr@gmail.com ▬▬ My equipment 💻 - Microphone: https://amzn.to/3DVqB8H - Microphone mount: https://amzn.to/3BWUcOJ - Monitors: https://amzn.to/3G2Jjgr - Monitor mount: https://amzn.to/3AWGIAY - Height-adjustable table: https://amzn.to/3aUysXC - Ergonomic chair: https://amzn.to/3phQg7r - PC case: https://amzn.to/3jdlI2Y - GPU: https://amzn.to/3AWyzwy - Keyboard: https://amzn.to/2XskWHP - Bluelight filter glasses: https://amzn.to/3pj0fK2

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepFindr · DeepFindr · 48 of 56

← Previous Next →

Understanding Graph Neural Networks | Part 1/3 - Introduction

Understanding Graph Neural Networks | Part 1/3 - Introduction

Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants

Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants

Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit

Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit

Node Classification on Knowledge Graphs using PyTorch Geometric

Node Classification on Knowledge Graphs using PyTorch Geometric

Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics

Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics

Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch

Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch

Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI

Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI

How to use edge features in Graph Neural Networks (and PyTorch Geometric)

How to use edge features in Graph Neural Networks (and PyTorch Geometric)

Explainable AI explained! | #1 Introduction

Explainable AI explained! | #1 Introduction

Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML

Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML

Explainable AI explained! | #3 LIME

Explainable AI explained! | #3 LIME

Explainable AI explained! | #4 SHAP

Explainable AI explained! | #4 SHAP

Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks

Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks

Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data

Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data

Understanding Graph Attention Networks

Understanding Graph Attention Networks

GNN Project #1 - Introduction to HIV dataset

GNN Project #1 - Introduction to HIV dataset

GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric

GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric

GNN Project #3.2 - Graph Transformer

GNN Project #3.2 - Graph Transformer

GNN Project #4.1 - Graph Variational Autoencoders

GNN Project #4.1 - Graph Variational Autoencoders

GNN Project #4.2 - GVAE Training and Adjacency reconstruction

GNN Project #4.2 - GVAE Training and Adjacency reconstruction

GNN Project #4.3 - One-shot molecule generation - Part 1

GNN Project #4.3 - One-shot molecule generation - Part 1

GNN Project #4.3 - Code explanation

GNN Project #4.3 - Code explanation

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2

How to explain Graph Neural Networks (with XAI)

How to explain Graph Neural Networks (with XAI)

Explaining Twitch Predictions with GNNExplainer

Explaining Twitch Predictions with GNNExplainer

Python Graph Neural Network Libraries (an Overview)

Python Graph Neural Network Libraries (an Overview)

Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)

Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)

Traffic Forecasting with Pytorch Geometric Temporal

Traffic Forecasting with Pytorch Geometric Temporal

Fraud Detection with Graph Neural Networks

Fraud Detection with Graph Neural Networks

Fake News Detection using Graphs with Pytorch Geometric

Fake News Detection using Graphs with Pytorch Geometric

Recommender Systems using Graph Neural Networks

Recommender Systems using Graph Neural Networks

How to handle Uncertainty in Deep Learning #1.1

How to handle Uncertainty in Deep Learning #1.1

How to handle Uncertainty in Deep Learning #1.2

How to handle Uncertainty in Deep Learning #1.2

How to handle Uncertainty in Deep Learning #2.1

How to handle Uncertainty in Deep Learning #2.1

How to handle Uncertainty in Deep Learning #2.2

How to handle Uncertainty in Deep Learning #2.2

Converting a Tabular Dataset to a Graph Dataset for GNNs

Converting a Tabular Dataset to a Graph Dataset for GNNs

Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs

Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs

How to get started with Data Science (Career tracks and advice)

How to get started with Data Science (Career tracks and advice)

Causality and (Graph) Neural Networks

Causality and (Graph) Neural Networks

Diffusion models from scratch in PyTorch

Diffusion models from scratch in PyTorch

Self-/Unsupervised GNN Training

Self-/Unsupervised GNN Training

Contrastive Learning in PyTorch - Part 1: Introduction

Contrastive Learning in PyTorch - Part 1: Introduction

Contrastive Learning in PyTorch - Part 2: CL on Point Clouds

Contrastive Learning in PyTorch - Part 2: CL on Point Clouds

State of AI 2022 - My Highlights

State of AI 2022 - My Highlights

Equivariant Neural Networks | Part 1/3 - Introduction

Equivariant Neural Networks | Part 1/3 - Introduction

Equivariant Neural Networks | Part 2/3 - Generalized CNNs

Equivariant Neural Networks | Part 2/3 - Generalized CNNs

Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

Personalized Image Generation (using Dreambooth) explained!

Personalized Image Generation (using Dreambooth) explained!

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)

Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)

Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)

Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)

Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)

Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)

t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)

t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)

Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)

Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)

This video explores the concepts of Equivariant Neural Networks, including Transformers and Graph Neural Networks, and discusses relevant papers and techniques. It provides a comprehensive overview of the topic and offers hands-on experience with Tensor Field Networks and SE(3) Transformers.

Key Takeaways

Understand the concept of equivariance in neural networks
Learn about Transformers and Graph Neural Networks
Explore the papers SchNet, SE(3) Transformer, and Tensor Field Networks
Apply mathematical concepts to neural networks
Implement supervised and unsupervised learning techniques

💡 Equivariant Neural Networks can be applied to various domains, including computer vision and graph neural networks, by incorporating techniques such as spherical harmonics and tensor field networks.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

The AI Moat Paradox: The Better Models Become, the Less Models Matter

The AI moat paradox suggests that as AI models improve, their importance may decrease, and understanding this concept is crucial for AI professionals and businesses.

[PoV] When Everyone Is Smart, No One Is

In a world where AI makes everyone smart, the value of intelligence decreases, and new challenges arise

The Honeymoon Is Over: AI Music Has Entered Its Institutional Era

AI music has transitioned from proving its functionality to proving its value and deservingness of existence

Critical thinking in the AI Era

Develop critical thinking skills to navigate the AI era effectively and make informed decisions

Medium · Data Science

Chapters (10)

Introduction

0:43 Points, Graphs and Sets

1:11 Inductive Biases & Equivariance

3:15 3D is not commutative

4:38 SchNet

5:48 Tensor Field Networks

7:17 Math Terminology

12:47 Hands on TFNs

13:08 SE(3) Transformer

15:24 Hands on SE(3) Transf

‘ENOUGH IS ENOUGH’: Lebanon is STANDING UP to Iran, expert says