tl;dr
They run TransE to embed nodes, do a similarity search to find “socially similar” tweets, and then add a contrastive loss term (to the typical BERT masking loss) to distinguish between Tweets that are and are not socially similar. This indirectly injects graph information.