Storing tweets based on creation time

Dewey_Munoz · October 9, 2021, 1:40am

Storing tweets based on creation time will give us the advantage of fetching all the top tweets quickly and we only have to query a very small set of servers. The problem here is that the traffic load will not be distributed, e.g., while writing, all new tweets will be going to one server and the remaining servers will be sitting idle. Similarly, while reading, the server holding the latest data will have a very high load as compared to servers holding old data.

Why will only a few servers be loaded? Can’t we hash the timestamp. Then, any given tweet with have an equal chance of going to any server, no matter what the age of the tweet is.

Shaheryaar_Kamal · October 11, 2021, 7:52am

Hi @Dewey_Munoz

Here our purpose is to fetch the top tweets quickly, if we store tweets based on timestamps and hash the timestamp, then different servers will be holding different timestamps which be randomized and it will create difficulty how to fetch top tweets only(for example a server stores 3 top tweets and we choose 2 top tweets from each server, which will result in ignorance of that top tweet).