Sharding based on word/tweet object

Franklin · July 27, 2020, 2:29pm

"index would be like a big distributed hash table, where ‘key’ would be the word and ‘value’ will be a list of TweetIDs of all those tweets which contain that word. "

When sharding based on word. I don’t understand "While building our index, we will iterate through all the words of a tweet and calculate the hash of each word to find the server where it would be indexed. "

@Design_Gurus Do you mean every each word in a tweet, we take the word as key and tweet id as value, then for every tweet, we have many keys and every key has same tweet id, and after go through all tweets, we combine all words and their values together?

Can you tell more details about Sharding based on Words and also Sharding based on the tweet object: because other learner found this confusing too?

Rah · October 2, 2020, 5:50am

@Design_Gurus Please reply on this.

himani_agarwal · January 31, 2021, 3:30pm

I think its not what it meant.

It will be map from word to list of tweet ids only.

While adding a tweet to index, we will hash on tweetid and the map of the correspoding server would be updated with the words.
Lets say tweet said : “car , cycle and bus” and tweetid is 100

id 1 to 100 go to shard1
id 100 to 200 to shard2

then the shard1( suppose) would be updated with these 3 entries
bus : [ 100, 56] – here the key “bus” already existed so the value updated to hv 100
cycle :[ 100, 42,23]
car : [100]

lets say not tweet id 205 comes and tweed looks like “car ride”
shard 2 would look like :
car : [205, 201]
ride: [205, 201,200]

similarly a key-value pair could be at every shard depending on the tweetid since we are choosing the shard based on tweetid.

Now inorder to get search results for the search text “car”, you cannot figure out a particular shard to go to as you dont have any such mapping …

So in order to get search results we would go to all the shards and look for work “car” . Since we will end up with multiple results we need to aggregate them and rank them based on ranking algo.

Hope it helps