Where Should We Put the Segment ID and Server ID Mapping? Zookeeper or the Table Itself?

It sounds to me weird that we are explicitly putting the server ID in the Segment table. Should we let the KeyValue store decides (maybe it uses Zookeer)?

Ultimately, when to put the mapping in the table itself and when to let the KeyValue store decide?


Course: Grokking Modern System Design Interview for Engineers & Managers - Learn Interactively
Lesson: Google Maps’s Detailed Design - Grokking Modern System Design Interview for Engineers & Managers

It’s a good question. Thanks for asking.

First, I would like to clarify that we are storing the segment-to-server mapping in a key-value store and not in a table (relational DB).

Secondly, the server allocation for segments is actually based on some partitioning logic. Let’s assume that we are using range partitioning (it’s just an example; there could be any partitioning function), where the segment IDs in a specific range will belong to a particular server. This way, we don’t need to store the assigned server’s ID for each segment explicitly. The service that implements the partitioning logic will serve the purpose. We can use the ZooKeeper service for that. In the lesson, we wanted to show that segment-to-server mapping is one of the things that needs to be managed.

Though we need to store the list of segments assigned to each server so that we can find out if a segment exists on the server it is mapped to or not, That information we can store in the key-value store, or we can use the ZooKeeper service to implement the segment-to-server mappings.

When client-perceived latency is important, we usually like to keep the required information ready to use. Otherwise, we could make such information a soft state that we could compute when needed (and not explicitly store anywhere).

I hope the above answer helps. Let us know if you have any other questions. Thanks!