educative.io

What is KGS? Why 2 table?

hi that really help for that explanation
roomme
best regards

why do we need two table. It will not help to resolve concurrency problem.As soon as you read and move to used key table, this step itself needs locking,else two process can read at same time.So when locking is needed ,used key table does not make any sense at all @Design_Gurus

2 Likes

Concurrency is not addressed here. We need extra design to make KGS thread-safe.

E.g. if we have 1 unused key table and multiple request, the problem to solve here is how to get N unused keys efficiently.

Suppose we just pick N random rows for N keys, we then need to remove them from this table so future requests won’t use them for other URLs. If N is small it would acquire row-level locks on each rows. However, if N is large, it may escalate to block-level or even table-level locks and contention can lag the performance down: e.g. 2 requests both pick the same key, after the 1st request gets the key, the 2nd need to pick another one until it can get one valid row.

Typically this unused key should be really large, with billions of keys. So sharding the table is necessary. As these are only hash keys, we can use range-based sharding. Suppose N requests and M shards, each request would be balanced into shards and get locks accordingly.

When we say remove the picked key from unused key table, we cannot actually remove it. When URLs are expired/removed, we need to push these removed keys back. Otherwise you may use up all the keys in the future. This is the purpose of the second table.

So you can see the two-table design has nothing to do with concurrency.

Of course you can use a single table, but you need to add an extra bit to note which key is used and which is not. This would cause extra latency because a request may fall on a row of used key and need to re-pick again.

When writing these removed keys to used-key table or vice versa, there can also be issues. How can we write them efficiently in a fault-tolerant way?

Suppose you just setup a transaction of writing the picked key to the second table and once succeeded, return the picked key. This may take a long time However, if you don’t wait for the success of writes, what if it fails?

One possible solution is to move this latency to somewhere else. For example, a PubSub. It has some durability assurance so it would be safe. Also it can smooth the traffic.

So you can see the benefit of KGS is that the server doesn’t need to try picking keys again and again, but it adds extra complexity. The KGS service need to be available, reliable with high-throughput.

3 Likes

Two tables do not compensate for cocurrency problem, it’s still there. The original intention to use two tables was to move the expired keys to unused_key table from used_key table.
Let’s say, a key was used 2 years ago, and now it has been expired. We can move that key to unused_key table and use it again.

You can argue that this could be achieved using one table too. We can add a column like is_used and mark it as true/false. Yes, that is possible too.

1 Like

Let me try to answer this, since KGS is keep generating new keys, you can not guarantee that the newly generated key has never been generated in the past if you just delete the used keys. I think whenever KGS generates a new key, it needs to check whether it has been generated in the past. So I guess, it needs to check both unused and used table before actually put the new key into the unused table.

The doc is also not quite clear enough. I am not sure if NoSQL is the right choice for this since it can not guarantee consistency. We might use RDBMS here, and one table with a flag USED/UNUSED?

2 Likes

How does the standyby for KGS work? If the master fails, how do we know that the standby is synchronized with the master?

even with 2 tables, i dont think the concurrency issue is resolved. Lets ignore the keys in memory for a bit.
you will have two app servers trying to read from the unused table at the same time and end up getting the same key.
Here there is no oppurtunity for putting the record in used table… because before it puts the second request has also arrived. If we treat it as a pure db solution, then i see only “read locks” as soln. But locks cause wait times and are bad for performance.

In the real world, I don’t think anybody would ever use two tables to represent the same type of entity when a simple boolean flag is_used would work. There is nothing that 2 tables solves that can’t be done via the normal process of using an attribute.

Secondly, you would never want to re-use a key. Do you really think it’s a good idea that some short url over time would start pointing to something completely different!? Imagine a preschool sends out an email with a shortened link to a health form to fill out. 2 years later, the key is re-used and that same link is now shortening some porn site.

Youtube only uses 12 character lengths right now for their key and it’s served them fine. As they grow, Im sure they’ll tack on a 13th digit. This rediculous requirement to keep keys 6 characters comes with a thimblefull of benefits vs the rediculous simplicity of just expanding the key space and never having to worry about key re-use. Ok so your db will now use 2x the space to store the key. Who cares.

It’s stuff like this that make me doubt every other word in this tutorial. Totally unrealistic decisions.


Course: Grokking the System Design Interview - Learn Interactively
Lesson: Designing a URL Shortening service like TinyURL - Grokking the System Design Interview


Course: Grokking the System Design Interview - Learn Interactively
Lesson: Designing a URL Shortening service like TinyURL - Grokking the System Design Interview

Hi @Mackle
We really appreciated that you have shared your thoughts with us and explained them very well. It is great seeing that you guys are always coming up with betterments. We’ll look into it, your feedback is much appreciated :slight_smile: