- Xiangnan He
- Kuan Deng
- Xiang Wang
- Yan Li
- Yongdong Zhang
- Meng Wang
It’s argued that for recommendation tasks in which there are only user/item interactions (no rating information), two fundamental operations of Graph Convolutional Networks cause more harm than good: feature transformation and non-linearities. They therefore propose a simplified framework that simply aggregates the embeddings uses a weighted sum:
Here, the weights are given by the inverse root of the degrees of the source and destination nodes, as is done in Graph Convolutional Networks.
The initial node embeddings, \mathbf e_i^0, are learned parameters. The final representation of a node is then given as a weighted sum across these LightGCN layers: \mathbf e_i = \sum_k \alpha_k \mathbf e_i^k, where the \alpha_k coefficients can be either hyper-parameters, or learned parameters. In the experiments, the authors set this with the heuristic: \alpha_k = 1/(k+1).
LightGCN out-performs methods that use a more traditional GCN backbone on the benchmark datasets. Ablation studies show that when only considering the final layer’s output \mathbf e_i^K (as opposed to a weighted sum across the K layer outputs), performance improves with the first few message passing steps, but then degrades, presumably as a consequence of over-smoothing. By allowing the model to combine outputs from the layers, this problem is mitigated and performance plateaus, rather than degrades, for large K.
They also show that the symmetric square-root normalization is important, and other obvious types of normalization (or none at all) are worse. They also define a measure of embedding smoothness, and speculate that the tendency of their method to create smoother embeddings is responsible for the performance gains.