LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation

zak_jost · July 7, 2023, 1:54pm

Authors:

Xiangnan He
Kuan Deng
Xiang Wang
Yan Li
Yongdong Zhang
Meng Wang

tl;dr

It’s argued that for recommendation tasks in which there are only user/item interactions (no rating information), two fundamental operations of Graph Convolutional Networks cause more harm than good: feature transformation and non-linearities. They therefore propose a simplified framework that simply aggregates the embeddings uses a weighted sum:

\mathbf e_i^{(k+1)} = \sum_{u \in \mathcal N_i} \frac{1}{\sqrt {| \mathcal N_i |} \sqrt {| \mathcal N_u |}} \mathbf e_u^{(k)}

Here, the weights are given by the inverse root of the degrees of the source and destination nodes, as is done in Graph Convolutional Networks.

The initial node embeddings, \mathbf e_i^0, are learned parameters. The final representation of a node is then given as a weighted sum across these LightGCN layers: \mathbf e_i = \sum_k \alpha_k \mathbf e_i^k, where the \alpha_k coefficients can be either hyper-parameters, or learned parameters. In the experiments, the authors set this with the heuristic: \alpha_k = 1/(k+1).

Experiments

LightGCN out-performs methods that use a more traditional GCN backbone on the benchmark datasets. Ablation studies show that when only considering the final layer’s output \mathbf e_i^K (as opposed to a weighted sum across the K layer outputs), performance improves with the first few message passing steps, but then degrades, presumably as a consequence of over-smoothing. By allowing the model to combine outputs from the layers, this problem is mitigated and performance plateaus, rather than degrades, for large K.

They also show that the symmetric square-root normalization is important, and other obvious types of normalization (or none at all) are worse. They also define a measure of embedding smoothness, and speculate that the tendency of their method to create smoother embeddings is responsible for the performance gains.