educative.io

Sharding question

for 7.1, if shard by UserId, some machine might be hot, but if there is cache sits in front this hot querying will not be a problem.
If shard on TweetId, and build a secondary index on creationTime, then the tweets will be ordered base on creationTime, is this also a solution?

Hi Huimin,

We have looked into the issue and we are working on it. We will get back to you when it is resolved.

Thank you.

Yes that could be a solution. Keep this thing in mind though that secondary indexes make writes slow and take extra storage. This means it is ideal if we have few writes and many reads - which seems true in this case. This kind of discussion, where we try to describe pros/cons, is what the interviewer is expecting.

2 Likes

@Design_Gurus, Can you please tell how consistent hashing helps to handle hot users as mentioned in chapter?Consistent hashing will distribute evenly but for that data should also be uniformly distributed. If we take user id and a user becomes hot, still all tweets of a user will goto one machine only.Consistent hashing cannot avoid this.Please confirm

Consistent Hashing not only help distribute/partition the data but also help to replicate the data. Meaning, we keep multiple copies of data for fault tolerance and availability. The replica servers can be used for ‘read’ traffic, thus, distributing the read load for hot users. Even, some NoSQL DBs keep track of fast (with low latency) servers and try to divert read traffic towards them. Lastly, there are always caching solution available on all nodes, to help read frequently accessed data.