How to use edge features in Graph Neural Networks (and PyTorch Geometric)
Key Takeaways
The video discusses how to use edge features in Graph Neural Networks (GNNs) and PyTorch Geometric, covering topics such as edge weights, edge types, and edge features, with references to research papers and implementations in PyTorch Geometric.
Full Transcript
hello everyone today we want to have a closer look at how we can include edge features and graphene networks i assume in the following that you are familiar with gnns or have watched the corresponding series i recently uploaded at the end i'll also quickly show how edge features can be used in pytorch geometric let's start with the question why edge features are even important isn't the information in a node sufficient to create meaningful embeddings a typical graph can be a social network like the one shown here note features in our graph are for instance the age of the people their weight or whether they smoke additionally we know for each of our notes if they like the movie hobbit these are the labels in this example now let's assume we have a new person joining our graph and of course we are immediately interested if the person is also a hobbit fan well to answer this question we can build a graph new network that combines the note features and the connections of the notes in order to classify this new member of the network doing so all we use if two people are connected or not but there is so much more information we can get out of this relationship if we had edge features that describe the type of connection such as since when the people are friends or if they live together we would have a valuable additional source of information and this is the case for many applications of graph new networks by adding further properties to the edges so not just the binary information we can empower the gnn to get much better in the following i want to show a couple of ways how edge features are typically utilized in the literature to help you to get started using edge features in graph new networks is still a hot research topic and there are different ways how we can do this as you might know etch features are just like note features nothing else but a vector of values let's start with the most basic form of this vector a single binary value this simply means either we have a connection or not if we have a look at our simple graph we can easily represent the connections in a matrix numerically this can be converted to either one or zero and voila that's the adjacency matrix of our graph it's symmetrical along the diagonal as we have bi-directional edges in our social network to make sure that we are on the same page let's quickly have a look at how this basic edge information is utilized in a regular gnn so i try to generalize the overall process in graph new networks to make sure we have the same thing in mind let me tell you that it's not straightforward to generalize all of the different g and n variants into one summary and please forgive me if there's an approach that doesn't fit perfectly into that pattern say we want to generate a note embedding for alice what we always do is collect the neighbor notes in our case those two gentlemen with the note feature vectors in blue next we prepare the messages for the message passing step most gene ends therefore apply some sort of differential transformation to these note features in order to get a high level representation this can be simply a multi-layer perceptron but also things like relu these transformed representations are then aggregated in some way the important thing here is that this aggregation is permutation invariant that means the order of our notes is not relevant these aggregations are often also normalized according to the degree of the node which means how many neighbors a node has what we retrieve is a summarized representation of alice's neighborhood in the graph finally we combine the original note features with the aggregated neighbor embedding and this can be again any differential function such as another mlp a gated recurrent unit or just a sum we obtain a new embedding for alice that contains information about her and her neighbors this embedding can be used to perform a prediction for our hobbit classification by using another fully connected layer then we can calculate the loss so how far are we away from the correct prediction and then we adjust all the learnable matrices in our layers such as transform and update that's especially the reason why they need to be differentiable we want to be able to calculate gradients so that's how we perform representation learning in a nutshell we can summarize this procedure in the following formula again there exist many different variants so this might deviate from approach to approach for instance we can add self loops and simplify the formula like this as alice herself is now part of her neighborhood okay so now back to the original question where is the edge information used in this process the basic binary edge information is used directly when we select the neighbor nodes for this selection we of course don't loop over all nodes instead in a g and n layer matrix multiplications are performed when we multiply the adjacency matrix with the feature matrix this neighborhood aggregation is implicitly performed all non-adjacent nodes are basically zeroed out and we only share information between the nodes that are directly connected so in our formula this part stands for the multiplication with the adjacency matrix so far so good now the first trivial option to utilize edge features is by using edge weights that simply means instead of ones and zeroes we have weights in the adjacency matrix for instance we could encode how happy the people are with the other person okay this is a stupid example but for instance alice likes it a lot to spend time with her boyfriend but not vice versa that's why we put a 0.9 and a 0.4 here let's have a look at this propagation formula in the matrix form from the gcn paper the first part is the normalized adjacency matrix x the current node feature matrix and the last part is the multiplication with the learnable weight matrix x prime is the new embedding it's straightforward to replace the adjacency matrix now with the weighted adjacency matrix and as a result people alice is close to are more emphasized in the propagation this usage of edge weights can be easily added in most of the graph neural network implementations now imagine we not only have a wait for the connection but also use different types of connections in our social network we would for instance differentiate between different relationship types such as friends couple or colleagues if we have such a setup our edge features are simply one-dimensional vectors with integer values this for instance typically occurs when working with molecule data as you have single double or triple bonds there exist several papers on how we can include such discrete edge types in a gnn the first approach we want to have a look at is called relational graph convolutional network from the paper reference below let's quickly investigate this propagation formula we calculate alice's new embedding by summing over the neighbor nodes so this is our aggregation and applying a mlp transformation to each of these node feature vectors finally there's a non-linear function such as relu applied to generate a new embedding the green section is just a normalization and the last part of this formula is another transformation applied on alice's original note features which doesn't really fit into my structure here the new part here is now that we have the sum over r and this sum simply represents the different relations we have so edge types you see that the weight matrix is indexed with this r as well that simply means depending on the type of edge we apply different transformations to the nodes this is sometimes also called edge conditioned gnn if we visualize this we quickly see depending on the type of alice's neighbor friend couple or colleague we pass the note vector through the corresponding weight matrix doing so we can include the edge information as we have different transformations applied based on the type of connection as a consequence we will of course also have different adjacency matrices so one that holds the information for friends one for couple connections and finally another one for colleagues also note how the embedding of alice's partner is yellow and the embedding of her friend is blue as they went through different transformations so as you see this first approach is pretty intuitive let's have a look at the next paper which is called graph new networks with feature-wise linear modulation the propagation formula looks slightly different but regarding edges we have exactly the same concept here we again sum over all neighbors but differentiate between the type of connection l and this l is also the index of our transformation matrix w so the transformation we apply on each different neighbor node vector depends on the relationship with alice there are a couple of other things happening in this formula but we can ignore them as we just want to look at edge related things here for including different edge types other similar papers exist but i think you get a point how this can be handled i found this overview in the g gnn film paper which provides a nice summary it shows how different note features a b c d are multiplied with separate weight matrices the little arrow that appears in the index of some weight matrices stands for self loops now let's have a look at the most interesting and also most general case what if we have multi-dimensional vectors for each of our edges this is basically what we had in the introductory example when we added since when are people friends or if they live together one way to handle these edge features is to directly integrate them into the transformation of the neighborhood states let's have a look at the general propagation formula presented in the message passing neural network paper here we see that we can include the edge features e between node w and node v in the transformation step when we calculate the embedding you see that we have two indices here so that's the edge information from node w to node v another way to think of these edge features is like an adjacency matrix that has vectors instead of ones and zeros so the zero here stands for edge feature vector filled with zeros the shape of our adjacency matrix is then number of nodes times number of nodes times the dimension of the edge features this is just a side note and happens mostly internally when multiplying the different matrices again let's have a look at a couple of papers to understand how we can include these multi-dimensional edge features in the paper neural message passing for quantum chemistry the authors simply input both the note features as well as the edge features into the message function this transformation is typically a multi-layer perceptron so we can visualize it like this in the case of alice we always take alice's embedding her direct neighbor and in between the edge features for that connection this way we include the edge features into our transformed representation the other things in this propagation formula are already familiar to us we perform some sort of aggregation and combine the representation through an update function with alice's original note features pretty intuitive right a similar idea can be found in the paper principle neighborhood aggregation for graph nets here we also simply include the edge features into the transformation step as it's shown here the paper about crystal graph convolutional nets uses the same approach and we can easily see how the edge features denoted with u here are concatenated with the node features v in order to obtain a vector for each node edge node triple this combined vector is then again transformed by multiplying it with a learnable weight matrix again there exist other papers that share similar approaches for the multi-dimensional edge features and i'll link a couple of them here for completeness most of them are also implemented for pytorch geometric so now we've already seen a lot of ways how we can include edge features in graph new networks finally another way to use them is to create edge embeddings that's like creating node embeddings but using the edge features instead this is the last approach we will quickly investigate in this video one recent paper displayed on the right uses a so-called hierarchical dual level attention mechanism that simply means they have alternating layers one that updates node embeddings and then one that creates edge embeddings and so on the propagation formulas look like this we can see that the edge features are used in both layers to generate new embeddings the left layer generates node embeddings and the right layer edge embeddings additionally they use the attention mechanism and thus learn how important specific nodes or edges are for the new embedding the importance coefficients are alpha and beta here so to summarize it this approach iteratively updates node and edge embeddings in order to merge both information together similarly as the previous paper this approach now also incorporates the edge features when calculating the attention coefficients here only one layer is required as both the node and edge embeddings are updated simultaneously the edge embeddings are simply set to the calculated attention coefficients alpha so instead of using the adjacency matrix and calculating the node embeddings as on the left here we now use both the edge and node features to update the embeddings again there exist a couple of other papers that go into a similar direction and i display some of them here on this page sense net for instance also alternates note and edge embedding layers but without using the attention mechanism the co-embedding of nodes and edges is basically the same paper as it comes from the same group of researchers so now we've seen many different ways how we can use edge features and gnns finally let's quickly talk about how we can use these approaches in the popular gene n library pytorch geometric all you have to do is navigate to the documentation and scan the different layers for the following attributes if you find edge weight as argument for one layer that simply means you can pass other values then 0 or 1 to the adjacency matrix edge type means that the implementation can work with different edge types as we've seen it before finally if you find etch etcher that means the layer can handle multi-dimensional edge features for more recent papers with edge embeddings there's currently not so much available but i can imagine that the implementations will follow soon otherwise you can always create a pull request with your own implementation of a paper and help the deep learning community with this contribution let's quickly have a look at two examples in pytorch geometric okay so here we are on the documentation page and you can see we have this rtc and conf layer on github there's for the repository python geometric also an example part where you find different examples and here you can see there's one example for this rgcn paper and here we import this layer and we can directly use it in our model definition here and we can now specify the number of relations so the number of edge types we have and down here in the usage you can see we pass the edge types of our data set to our model so the second example is for this n n conf layer again if i click on this you can see the propagation formula here and down here you find edge edger and as i said that stands for multi-dimensional edge features so now if we go to github again and look at the examples folder we find another example for this nnconflare and it's simply imported here and you can see in this function the edge attributes are calculated in some way and the layer and end conf is defined as conf1 here and another conf 2 here and in the forward function we now pass the edge edger so our multi-dimensional edge features to this layer and simply include it as it's described in the paper so now that's it for this video we've seen different possibilities to use edge features in graph neural networks i hope this helped you as a starting point and i'm pretty sure we will see many new approaches in the next years but wait there's one more thing who are these people well they are created of course with a generative adversary network and i really thought whether i need to cite them or not that's actually also an interesting thought in my opinion who do you cite if an eye creates things like text or images leave a comment what you think and i'll see you soon in the next video
Original Description
In this video I talk about edge weights, edge types and edge features and how to include them in Graph Neural Networks. :)
▬▬ Papers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Edge types:
- Modeling Relational Data with Graph Convolutional Network (https://arxiv.org/pdf/1703.06103.pdf)
- GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation (https://arxiv.org/pdf/1906.12192.pdf)
Multidim. edge features:
- Neural Message Passing for Quantum Chemistry (https://arxiv.org/pdf/1704.01212.pdf)
- Principal Neighbourhood Aggregation for Graph Nets (https://arxiv.org/pdf/2004.05718.pdf)
- Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties (https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.120.145301)
Edge feature embeddings:
- NENN: Incorporate Node and Edge Features in Graph Neural Networks (http://proceedings.mlr.press/v129/yang20a.html)
- Exploiting Edge Features in Graph Neural Networks (https://arxiv.org/pdf/1809.02709.pdf)
▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction
05:10 Edge weights
06:10 Edge types / relations
09:21 Multidim. edge features
12:04 Edge feature embeddings
13:52 Pytorch Geometric
▬▬ Support me if you like 🌟
►Link to this channel: https://bit.ly/3zEqL1W
►Support me on Patreon: https://bit.ly/2Wed242
►Buy me a coffee on Ko-Fi: https://bit.ly/3kJYEdl
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from DeepFindr · DeepFindr · 8 of 56
1
2
3
4
5
6
7
▶
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Understanding Graph Neural Networks | Part 1/3 - Introduction
DeepFindr
Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants
DeepFindr
Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit
DeepFindr
Node Classification on Knowledge Graphs using PyTorch Geometric
DeepFindr
Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics
DeepFindr
Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch
DeepFindr
Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI
DeepFindr
How to use edge features in Graph Neural Networks (and PyTorch Geometric)
DeepFindr
Explainable AI explained! | #1 Introduction
DeepFindr
Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML
DeepFindr
Explainable AI explained! | #3 LIME
DeepFindr
Explainable AI explained! | #4 SHAP
DeepFindr
Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks
DeepFindr
Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data
DeepFindr
Understanding Graph Attention Networks
DeepFindr
GNN Project #1 - Introduction to HIV dataset
DeepFindr
GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric
DeepFindr
GNN Project #3.2 - Graph Transformer
DeepFindr
GNN Project #4.1 - Graph Variational Autoencoders
DeepFindr
GNN Project #4.2 - GVAE Training and Adjacency reconstruction
DeepFindr
GNN Project #4.3 - One-shot molecule generation - Part 1
DeepFindr
GNN Project #4.3 - Code explanation
DeepFindr
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2
DeepFindr
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2
DeepFindr
How to explain Graph Neural Networks (with XAI)
DeepFindr
Explaining Twitch Predictions with GNNExplainer
DeepFindr
Python Graph Neural Network Libraries (an Overview)
DeepFindr
Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)
DeepFindr
Traffic Forecasting with Pytorch Geometric Temporal
DeepFindr
Fraud Detection with Graph Neural Networks
DeepFindr
Fake News Detection using Graphs with Pytorch Geometric
DeepFindr
Recommender Systems using Graph Neural Networks
DeepFindr
How to handle Uncertainty in Deep Learning #1.1
DeepFindr
How to handle Uncertainty in Deep Learning #1.2
DeepFindr
How to handle Uncertainty in Deep Learning #2.1
DeepFindr
How to handle Uncertainty in Deep Learning #2.2
DeepFindr
Converting a Tabular Dataset to a Graph Dataset for GNNs
DeepFindr
Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs
DeepFindr
How to get started with Data Science (Career tracks and advice)
DeepFindr
Causality and (Graph) Neural Networks
DeepFindr
Diffusion models from scratch in PyTorch
DeepFindr
Self-/Unsupervised GNN Training
DeepFindr
Contrastive Learning in PyTorch - Part 1: Introduction
DeepFindr
Contrastive Learning in PyTorch - Part 2: CL on Point Clouds
DeepFindr
State of AI 2022 - My Highlights
DeepFindr
Equivariant Neural Networks | Part 1/3 - Introduction
DeepFindr
Equivariant Neural Networks | Part 2/3 - Generalized CNNs
DeepFindr
Equivariant Neural Networks | Part 3/3 - Transformers and GNNs
DeepFindr
Personalized Image Generation (using Dreambooth) explained!
DeepFindr
Vision Transformer Quick Guide - Theory and Code in (almost) 15 min
DeepFindr
LoRA explained (and a bit about precision and quantization)
DeepFindr
Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)
DeepFindr
Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)
DeepFindr
Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)
DeepFindr
t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)
DeepFindr
Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)
DeepFindr
More on: Research Methods
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
Chapters (6)
Introduction
5:10
Edge weights
6:10
Edge types / relations
9:21
Multidim. edge features
12:04
Edge feature embeddings
13:52
Pytorch Geometric
🎓
Tutor Explanation
DeepCamp AI