educative.io

Educative

Database design of Facebook Messenger

It says “Both of our requirements can be easily met with a wide-column database solution like HBase” but I don’t understand how. Can you give me a more detailed explanation on how we can design the database and how it meets the requirements of Facebook Messenger?

10 Likes

Hi Jeong,

Thanks for reaching out to us. We have received your query and we’ll be looking into it. We’ll get back to you shortly.

Regards
Team Educative

1 Like

I’ve got the same question too. Any answer would be much appreciated.

Stuck with same thought as well, could someone answer please

If you read the facebook engineering blog. They initially used HBase but now switched to MyRock which is a MySQL storage engine. Take these design posts with a grain of salt. Sometimes they are just taken from some old engineering posts with no explanation. Do your own research. You can actually use any type of database to store messages.

5 Likes
2 Likes

It would be great if you can share a database schema.

Hi Team, please share the Data Modelling /Database schema for this.

Is this question already answered? Or in how much time it will be?

Thanks

Please give db schema information. The designs in this course are too generic. They are not much help and the money not worth it.

1 Like

You might review this slide deck on HBase schema:

3 Likes

Not sure if this is correct, but here’s my current understanding of the HBase DB schema:

Row Key = user ID
Column name = user ID that the row-key user ID is chatting with
Version = message ID
Value = who sent the message, and the text of the message itself

I spent some time on trying to answer this question for myself, I didn’t quite get how a wide-column database such as HBase could be used to store messages, what the schema would look like.

I think an important thing to realise here is that HBase stores data in 3-dimensions. We have a row, rows are split into columns and columns hold versioned values.
[1] rows → [2] columns → [3] values (v0, v1, v2, … vn).

I’m not familiar with how HBase schema should look like, but my naive representation of it using the messenger example would be as follows:

Row key = UserID
Column key = ConversationID
Value = the last message saved.

We first identify the row using the UserID, and we read the value stored under the column the name of which is the ConversationID. The read value is the message that was written as last in that conversation. HBase allows you to access all versions of a value, so that way we have access to all messages (versions) that came before. Writing a new message simply means overwriting the value with the contents of a new message.

I realise the above “schema” would essentially duplicate the message data as we’re grouping the messages by UserID, so both users within a conversation would have their own copy of messages.

Maybe a better approach would be to use the ConversationID as the row key and have a column with a fixed name like “Messages” where the messages would be stored (in the same way as before, using versions).

This is just my conceptual understanding, it might be wrong in some aspects. Hopefully though, it’ll give others with the same problem at least some idea of how HBase could be used for the FB messenger.

Resources I found useful:
Storage Infrastructure Behind Facebook Messages Using HBase at Scale (try searching for the “schema” word)
https://www.slideshare.net/brizzzdotcom/facebook-messages-hbase

There are many unproven statements mentioned there and it’s also unclear why any K/V storage is not good enough. It seems like it is talking about optimization of batch writing to the DB and keep stuff in memory which is unrelated.

Aren’t multiple writes and deletes bad in any no-sql column database ? Can someone please explain ?

@Design_Gurus Can you please answer my question ?