educative.io

Index server vs db in the diagram

In the section for “sharding based on tweet object” it says “we will pass the TweetID to our hash function to find the server and index all the words of the tweet on that server. ”. Can you explain more?
Are we saying the server containing the shard for tweet table is also on the same table as index table? What type of db is being used.
Also in the diagram index server is different than database. What’s in the database if index server has the tweet shard? I’m a bit confused.

2 Likes

Hi Sandeep,

Thank you for reaching out to us.

We’ll contact the authors for this course and get back to you soon!

If you have any further concerns/questions/comments, please let us know.

I have the same question. It’s so confusing without tell specific server and db.

Waiting for the answer

i am as well watiting

Good question Sandeep
I think it means that every DB server will have an index correspond to all words of all tweets in that server which give us multiple indexes for multiple servers and then aggregate them and combine them into one big index that distributed across index servers

@Design_Gurus correct me if I am wrong

I think the index server only has TweetIDs while the database records the original information (e.g., TweetID, content, image, etc.) of each tweet.
In this case, same word may appear in multiple index servers.

1 Like

We are discussing sharding of the the index here (not the DBs). Sharding based on the the TweetID means, find an index server based on the hash of TweetID and store the index for all words of that tweet on that index server. The actual tweet is stored separately in a db server. So while searching tweets for a word, we will query all index servers, aggregate and find top TweetIDs. And finally fetch these tweets from DB/cache.

6 Likes

Thanks for your post. But, why do we need separate index server? As per the article, index servers would be like a big distributed hash table, where ‘key’ would be the word and ‘value’ will be a list of TweetIDs of all those tweets which contain that word. But, isn’t full text index of MySQL/Postgres also the same? If we create full-text index on tweet text in database server, it will also contain mapping of word to TweetID. So, can’t we use this already available functionality instead of maintaining our custom index servers? If we feel that full text index of MySQL/Postgres is slow, we can use Elastic search.

1 Like

find an index server based on the hash of TweetID and store the index for all words of that tweet on that index server.

that means when we need to add a index for a word, like "love"m and we have 2 tweets

we assume that we have text “A love you” in tweet A, and “B love you” in tweet B

we are going to have 2 indexer servers to hold the mapping between “love” and A, “love” and B

(A mapped to) index-server-1: love: A->other tweet ids

(B mapped to) index-server-2: love: B->other tweet ids

And finally fetch these tweets from DB/cache.

it says we have

(A mapped to) db-1: {id: A, ....,, text: "A love you", ...}

(B mapped to) db-2: {id: B, ....,, text: "B love you", ...}

is that the idea?