educative.io

How can you handle hot words by consistent hashing?

hi @Design_Gurus

I see couple of problems in the last

  1. You said we will shard by word hash and then it can create hot word problems so we use consistent hashing.
    So, the problem is consistent hashing will not help you to overcome hot word problem, if a word is hot, then even with consistent hashing,it will go only to one shard, you cannot avoid it.

  2. Now even if you shard by tweet id, still a word can be hot and sharding by id does not solve it.
    SO this is a confusion.

Now what i think is, that hashing by word as key is a problem because if we dynamically add or remove servers, the unbalancing will come and that why we use consistent hashing, The main other reason is that some words are used more than others and if we store all of them in one server, the the server can get unbalanced load. So it cannot be handled by consistent hashing and that why we shard by tweetid, because if a word is in 100 tweets that possibility is that the tweet id is unique and all tweets goes to different shard

3 Likes

With Consistent Hashing, we can have replicas to improve read traffic. Cassandra implements its consistent hashing like that. In our case, say, a word is mapped to a server, then we replicate that server’s data to the next two servers on the consistent hash ring. This way we will have three copies of every data.

While readying, the read traffic is diverted to one of the replica with the lowest load. Cassandra nodes, keep record of all other nodes’ load using Gossip protocol and Snitches.

1 Like

what i mean let us say we have a word as abc, now hash of abc is always same, it will always goes same place in ring and hit same server in clockwise direction, so consistent hashing cannot help here.if u have read replicas in different servers, then that is a separate thing