educative.io

Sharding based on the tweet object

Sharding based on the tweet object: While storing, we will pass the TweetID to our hash function to find the server and index all the words of the tweet on that server. While querying for a particular word, we have to query all the servers, and each server will return a set of TweetIDs. A centralized server will aggregate these results to return them to the user.

it seems that one of the main benefits of sharding based on tweet object is that the storage becomes more balanced. is this correct? for example, if some word is very popular, it would be spread across many index servers. with sharding based on word, the word and the tweets associated with it would all be stored in one server. if there is a hot word, we can scale by increasing the replication factor of all servers by the same amount in the shard by tweet object case.