Storing popularity with index

phoebe · October 19, 2020, 5:51pm

The popularity is changing frequently, I don’t think store it with the index is a good design. Maybe we should first retrieve the batch of TweetID and later sort them runtime by calling the ranking algorithm services.

himani_agarwal · January 31, 2021, 2:01pm

I agree … I think thats what twitter is doing actually…

rahul9 · February 13, 2021, 6:59pm

yeah agree with you.What i think is that we will have cache on top of index servers.Everything can be served from cache.We can keep on passing update stream to the index server like like,retweet,favourites and so on and keep on updating the popularity index in index server at back end after very 15 mins,for 15 mins we can store in temporary storageNow while updating we can see if something has become immense popular and its not in cache then update cache.immense popular means number of likes retweets etc, are increased a lot in last x mins,what do u suggest?

AmSh · February 18, 2021, 3:29am

do you have some link for that … that will be really helpful… thanks

AmSh · February 21, 2021, 4:18pm

@rahul9 I have few thoughts below :

Solution 1 : So when the user search for the word search we can get the all the tweet_id for that word sort that list of tweet_id in memory and return back the result. This will increase the latency during read time. But will help to keep the less complexity for updating the likes and rank of the tweet.
Solution 2 :
So when the tweet gets like, retweet, and so and so… we need to find the all the words that this tweet_id is part of. So that we can increase the weight of that tweetId . We can have some sort of sorted set for redis where we can increase the weight for the value in the list of tweedt_id for particular word.
But How can we find what all word we need to look into so that we can update the priority of the tweed-id?..either we need to have reveres index mapping like tweeid : listof words … Solution 2 will help for read faster… since its already sorted

rahul9 · February 18, 2021, 4:24pm

hey
for solution 1 : i did not understand what you propose.if i understand correctly, We got all tweet id for a word,but there may be million of tweet, so cannot sort all,you have to store some top 100 or something in cached index.

solution 2:we have index ,word to tweet id right?

AmSh · February 21, 2021, 4:21pm

solution 2: word to tweet id will help in searching the tweets for the given word. Imagine now when tweet gets more like or retweeted… You know the twet_id of that. Know you need to know what all the words that this tweet has so that you can revisit <word, tweet_id> map and increase all tweetd_id priority for particular words. Now in this case how can you find what all word this tweet has … Either you process it again like break all the tweet word and find it (which is really bad) or ether you need to have some kind of reverse mapping so that you can go in each word map and increase the priority of the tweet_id in it.