educative.io

Educative

Encoding actual URL key duplication/URL-encoded/custom aliases solutions?

I have some questions about this design question. The solution mentioned a lot of new questions but no solutions for that. Does anyone have some ideas?

For example:

Encoding actual URL
Since we only have space for 8 characters per short key, how will we choose our key then? We can take the first 6 (or 8) letters for the key. This could result in key duplication.

1 What’s the solution of this issue?

What if parts of the URL are URL-encoded? e.g., http://www.educative.io/distributed.php?id=design, and http://www.educative.io/distributed.php%3Fid%3Ddesign are identical except for the URL encoding.

2 What’s the solution of this issue?

3 How do we support custom aliases in Encoding actual URL solution or Generating keys offline solution?

1 Like

Hi Franklin,

Thanks for reaching out.

Answer to your first question is given in the last sentence of the paragraph:

We can take the first 6 (or 8) letters for the key. This could result in key duplication though, upon which we can choose some other characters out of the encoding string or swap some characters.

Answer to the second and the third question is given under the section - Workaround for the issues

We can append an increasing sequence number to each input URL to make it unique, and then generate a hash of it. We don’t need to store this sequence number in the databases, though. Possible problems with this approach could be an ever-increasing sequence number. Can it overflow? Appending an increasing sequence number will also impact the performance of the service.

Another solution could be to append user id (which should be unique) to the input URL. However, if the user has not signed in, we would have to ask the user to choose a uniqueness key. Even after this, if we have a conflict, we have to keep generating a key until we get a unique one.

In general, appending an increasing sequence number or user id to the URL and then generating the hash will solve the issue. Even after this, if we have a conflict, we have to keep generating a key with the new sequence number until we get a unique one.

Hope this answers your question.

1 Like

I have the same question. Why does url-encoded URLs matter if we are using an MD5 hash?

Although it’s mentioned that Key Generation Service (KGS) that generates random six letter strings beforehand, this key generation implementation is not outlined here or perhaps it was too obvious. But just to confirm - can the standard key generator from let’s say javax.crypto.KeyGenerator be used here ? since it itself has couple of algorithms viz AES, DES etc… (does it require more deliberation ?) or is there an alternative line of thought behind the 6 letter string generation ?

Thanks.
Appreciate the replies.

1 Like

Hi @Rohan_Palkar,

The KGS will generate the keys (random six-letter strings) based on the algorithm defined above in the chapter. This will be the main algorithm implemented by KGS. Having said this, you can come up with any algorithm, even mentioning a library algorithm will be fine and it will be great if you happen to have some knowledge on how these library algorithms are implemented. This will definitely give you an edge over other candidates.

Remember, in a System Design Interview, you have to build/explain the overall system. You can always make use of external systems/libraries, just like this chapter mentioned hashing algorithms MD5, etc. As a general rule of thumb, if the interviewer wants to dig deeper in some external component/library, because they feel it is important to know the underlying algorithm or if they expect you to know that because of your experience etc., then you should try your best to explain it or even design that external system on the fly.

Hope this answers your question.

@Design_Gurus There is no algo defined to generate 6 digit random key in the book. In the algo section it just tells various problems and then it says kgs can solve.How exactly KGS solves, is not answerd?

  1. Why cant we use DB auto increment as a random key?
  2. How exactly is KGS gonna work? If it generates random number and then check if it is unique or not, then it is basically doing same thing, but just doing offline.How does it solve problem?

As mentioned in the chapter, KGS will generate the keys (random six-letter strings) using the same algorithm defined above in the chapter.

We can’t use DB’s auto-increment, as it can easily be predicted that what will be the next key; this will cause security issues.

yeah,so generating random can be done like, KGS will use autoincrement db to generate some 10k keys and then shuffle them before returning,so that way KGS dont have to check again and again if key is unique or not.

1 Like