Authors:
- Nurudin Alvarez-Gonzalez
- Andreas Kaltenbrunner
- Vicenç Gómez
tl;dr
Creates features of a node by encoding the degree and distance of the nodes in the neighborhood. Neighborhood size is a hyperparameter, \alpha. The size of the resulting vector is \alpha \cdot d_\text{max}. The values of this (sparse) vector are the count of the number of nodes that satisfy that degree/distance criteria.
This local encoding is then simply passed through a linear layer to densify and embed.
Experimental Setting
Datasets
- Les Miserables
- Sister Cities
- ArXiv AstroPhysics
- PPI
Baselines
- Unattributed graph embeddings (all transductive):
- DeepWalk
- node2vec
- LINE (Line: Large-scale information network embedding)
- VERSE (Verse: Versatile graph embeddings from similarity measures)
- Inductive methods (all require node attributes):
- GCN
- GraphSAGE
- LGCL (large-scale learnable graph convolutional networks)
- PinSAGE
Evaluation metrics
Unsupervised
Samples random walks from each node. Then use a skip-gram negative sampling objective.
Run this to generate embeddings and then cluster on the embeddings. In Les Miserables (which have two copies of the graph with a single edge between a random pairing across graphs), they find that by setting the receptive field of the structure encoding to \alpha=1, clusters mostly capture densely connected nodes. However, when \alpha=2, cluster assignment is more based on structural role and less so on node proximity.
Also look at ROC-AUC on Link Prediction tasks using Facebook and ArXiv datasets.
Supervised
Exactly what you’d expect–use these as input embeddings to a downstream model.
Multi-label node classification task where the objective is to predict 121 different binary labels capturing protein functions in a series of Protein-to-Protein Interaction (PPI) graphs.