educative.io

Educative

Sharing by tweet id doubts,why we need?

  1. @Design_Gurus, Can you please tell how consistent hashing helps to handle hot users as mentioned in chapter?Consistent hashing will distribute evenly but for that data should also be uniformly distributed. If we take user id and a user becomes hot, still all tweets of a user will goto one machine only.Consistent hashing cannot avoid this.Please confirm

  2. If u shard by tweet id, then how will u find all tweets by user id?That is the main query on which system is built? U have to search all servers to search by userid and we will never search by tweet id?Why why are we emphasising so much on this thing?Why would u search on tweet id?first we need all tweet of a user,and then we need to search for this tweets, if we search for in tweet table for tweets by a user, then there is not need to search by tweet id.Please clear this

2 Likes

Did you find an answer for the 1?
I have a same question but I can’t find answer

No i did not.Based upon what I have learnt, this statement is wrong

Consistent hashing with replication helps with hot data. This means that each node’s data is replicated on a set of other nodes (generally next nodes in clockwise directions). Read traffic is, then, directed based on each node’s load.

For example, Cassandra implemented its Consistent Hash with replication. Cassandra has a component called ‘Snitch’ which keeps track of each node’s load, which is used for redirecting traffic.

Now, if we have a replication factor of 3, meaning we will have 3 replicas of the ‘hot user’. Based on each node’s load we can redirect the read traffic to the node with the least load.

BTW, this has been answered in other questions too.

Does that mean that sharding based on user id is preferred?

shard by user id yes. It will suffice for interviews.We should not shard by tweet id.A hot user cannot cause any problems as generally a shard is Terabyte of size and a user cannot store that much data in practical. Keep it as simple as it can. Even if we create index on creation time it is fine.

1 Like