educative.io

Educative

What is KGS? Why 2 table?

What is kgs.
Is it another application?
Why does it need two table to mark used?

1 Like

Hi @Muhit_Sarwar,

Key Generation Service (KGS) is a standalone service (that the design proposed) to generates random six letter strings beforehand and store these keys in a database. KGS uses two tables to handle concurrency problems:

Servers can use KGS to read/mark keys in the database. KGS can use two tables to store keys: one for keys that are not used yet, and one for all the used keys. As soon as KGS gives keys to one of the servers, it can move them to the used keys table. KGS can always keep some keys in memory so that it can quickly provide them whenever a server needs them.

Hope this answers your question.

How concurrency is being solved by two table?
Canā€™t we just use single table and locking?
How two table mechanism is better than single table with locking?
Can u plz explain with an example?

6 Likes

Same question here about how KGS can prevent concurrency problems. Does each KGS use UUID to make sure the key generated on each KGS server are unique?

The discussion around two tables was to solve the read concurrency. The problem we are trying to solve is to not give the same key to two servers (as we donā€™t want two URLs to have the same short key). For this, as soon as the KGS read some keys, it moves them to the other table (or can simply store the last key that it has read). This will ensure that no two URLs get the same key. Locking will surely help while sending keys to servers from memory (remember we will be having some keys brought in memory for efficiency).

For writes, we donā€™t have concurrency problem, as there is only one KGS in the system at any time, to generate new keys.

Hope this clarifies.

1 Like

To check if iā€™m understanding correctly,

The problem is that if 2 app servers request keys from the KGS, the KGS could provide the same key to both app servers ( though using the same key would only be a problem if the url that the app server is trying to generate is the same as well (?)).

To solve this, we have the KGS load in keys into a locked data queue structure in-memory and provide the keys to each app server. We separate the KGS into two tables, used and unused, so that the KGS itself only loads unused keys into memory.

Does this sound about right?

4 Likes

Loading keys in-memory is happening whenever keys present in-memory getting low.
When 2 simultaneous request came and no in-memory keys present in KGS, then KGS both request thread, tries to load keys from DB to in-memory and also move keys from DB. Then with one KFG server and for read , we can have concurrency problem.

2 Likes

Ans: This will be also solved by same in-memory lock.

I am not satisfied by the authorā€™s response on why two tables are needed. Why canā€™t we just delete the keys as they are used up instead of moving them into a different table?

9 Likes

I think the reason for moving those used keys into the table for all the used keys is because that after removing an expired link, we can put the key back in the key-DB to be reused. The ā€œreusedā€ here means to move those keys back to the ā€œnot used yet keyā€ table. If you just ā€œdeleteā€ the keys, how do you proceed the reuse process? To regenerate it again? If so, how do you know what key you should regenerate?

Thatā€™s why I think itā€™s reasonable for introducing two tables.

7 Likes

Can you explain it bit practically with small example as it is very confusing to go to any conclusion.

I believe generating url is the core part of this interview which the author seems to have over looked and hasnā€™t provided sufficient details.

I recommend checking out this video. Itā€™s much clearer.

1 Like

Thanks for clarifying that.
So you mean whenever a new key is to be generated, then after generating the key, Used Key table will be checked for possible duplicate and if not found then only this generated key will be added to Not Used Key Table?
Is that accurate ?

There should not be any duplicate key; so any new key should not be present in the used or un-used keys table.

Thanks for the reply.
If so, then what is the use of maintaining an additional Used Key Table ?

Can you please provide some details on how KGS is implemented ?

Your answer make sense.

Just elaborating your answer
Concurrency Issue:
When multiple application servers make simultaneous request(threads) for key set from a single shared source of data(KGS). Then KGS have to make sure that it doesnā€™t gives the same key set to multiple application server.
Possible solution(there can be other solutions too):

  1. Create a queue of request(Synchronisation) and server them one by one( by use locking mechanism) on shared resource (here it will be KGS shared memory).
    Get a subset from KGS in-memory and discard it from in-memory and return this subset to the current requested while others are waiting. after this request is served(or the critical section) then process the one of the next blocked request and do the same.

Use of creating separate table for used urls:

  1. After expiry of a used key, it can be return back to unused table (although it can be done by keeping a additional flag for used/unused in the same single table(with some overhead of scanning).
  2. When KGS asked for unused key set it can directly pick from unused key table(although it can be done by keeping a additional flag for used/unused in the same single table and pick only unused records: pick unused and updated their flag as used)
2 Likes

But the big question is how to generate the actual 6 char unique shor url :
So generating short url journey starts below ā€”

  • using md5 hash function for hashing the original URL
  • since above md5 will create 128 bit and we need short 6 character url so used base 64 to encode the 128 bit.
  • Still it gives 21 character so went to selecting last 6 characters.
  • Still this will create duplicate issue so went randomizing characters .
  • Still there will good chance there might be duplicate key
  • So started adding the user_id to the original url and then do above (base64(md5(original-rl)) % 6)
  • Still this will add problem of only logged in user can generate the short url.
  • Then jumped to KGSā€¦
    ā€¦ DIrectly this statement : that generates random six-letter strings beforehand and stores them in a database (letā€™s call it key-DB).

I mean but how ?..

2 Likes

Same question.

There are multiple confusions you raised from your questions; let us try answering:

  1. KGS generates random six letters strings. It has nothing to do with the algorithm described previously. Random means ā€˜literally randomā€™ (e.g., using any random number generation algorithm), all languages have implementations of such algorithms available. The benefit of generating short URLs in advance is to speed up the process and not to worry about generating the short URL on the fly.

  2. For the encoding URL solution, there are multiple options given. Practically, we should be able to generate a unique URL through any of the following options:

  • a. By appending the user_id. (practically all services require log-in before using their functions).
  • b. By appending the sequence number, when a duplicate happens keep incrementing the sequence number. (a 64-bit sequence number can be incremented quadrillions of times, this should be enough for our service).
  • c. By appending a user-provided string. If a duplicate happens, keep asking the user.
  1. The purpose of giving different options was not to confuse, instead, to educate on how the discussion should happen during a system design interview. The interviewer likes when we give different strategies and present their pros and cons. For example, in the above-given options ā€˜cā€™, we will definitely be able to find the unique short URL, but we might end up asking the user multiple times to input a string. Hence, the con is, the user experience will not be great.

Thatā€™s why, in the end, generating random six letters strings (using any random algorithm) is proposed. It saves us from problems arising from duplications, at the same time gives high performance and a great user experience.

Having said this, can you think of what problems KGS introduced in our system!!

KGS made our system complex with the addition of new servers, handling of concurrent uses, size of the DB, etc. These are the pros and cons we were referring to; this is the discussion that should happen in the interview.

Finally, in a system design interview, the answer to nearly every question starts with ā€œit dependsā€. As we have to give solutions (based on the requirements), present options, describe the pros and cons, and keep evolving the solution.

Hope this clarifies some confusion.

6 Likes