educative.io

Sharding on UserId? How does the other user get their messages?

The recommendation is to shard on user id but doesn’t that mean that the other user will not be able to easily retrieve their messages? Otherwise you need to replicate the messages per user? I’m a little confused by this recommendation.

1 Like

There are a few mistakes in your problem statement, but according to my understanding, the answer to your question can be, “We can introduce a software load balancer in front of our chat servers; that can map each UserID to a server to redirect the request.”

A load balancer doesn’t help in this case. The data is sharded on the userid. A load balancer would only help you distribute the load across the shards.

I had the same question.

I agree with @Samer_Abraham.

If you partition and store messages per UserId, you will have to query multiple shards in most cases to build up single one-to-one chat because you need to query messages sent to “me” as well as to the “other” person. UNLESS you decide to duplicate message for both users, which will double the storage requirements.

@fahim Can you please clarify or include someone who can assist?
Thanks in Advance

This is an important question, and a gap in the article.

Duplication is a reasonable answer: store a message under both user ids, the sender as well as the receiver. This doubles the storage requirement, but makes sure each user can review their message history more fluently. Storage is cheap, and it’s almost always better to optimize for user experience.


Course: Grokking the System Design Interview - Learn Interactively
Lesson: Designing Facebook Messenger - Grokking the System Design Interview

In my opinion, I would introduce an entity called conversation with a conversation ID format that looks like “user_id_a-user_id_b”. For example, if user A’s ID is 1000, and user B’s ID is 1001, the conversion ID will be “1001-1000”, and the greater number will always take the lead.

And then we can consider sharding data with the conversation ID so that we don’t need to double store the messages or entries for every single message.

3 Likes