TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual

tl;dr

They run TransE to embed nodes, do a similarity search to find “socially similar” tweets, and then add a contrastive loss term (to the typical BERT masking loss) to distinguish between Tweets that are and are not socially similar. This indirectly injects graph information.