Tweet content and Social User embedding

How are the tweet content and social based user embedding learned? Looking at the Embedding chapter, is it the final two-tower approach in which training data is a list of similar users and a list of non similar users? Each user is represented initially by the sparse social/tweet content based vector.

