Storing tweets based on creation time will give us the advantage of fetching all the top tweets quickly and we only have to query a very small set of servers. The problem here is that the traffic load will not be distributed, e.g., while writing, all new tweets will be going to one server and the remaining servers will be sitting idle. Similarly, while reading, the server holding the latest data will have a very high load as compared to servers holding old data.
Why will only a few servers be loaded? Can’t we hash the timestamp. Then, any given tweet with have an equal chance of going to any server, no matter what the age of the tweet is.