educative.io

Regarding the Encoding actual URL into unique string

First of all, forgive me if my questions are not valid. I am new to learning the system design and hoping to get my questions clarified.

In this TinyURL design discussion, i am not getting the overall mechanism on hashing and encoding of the users given url.

Please correct me if my understanding is correct or wrong, my understanding is as follows,

MD5 Hashing will convert the given url value into 128 bit hash value and that will be unique value for the given url.

Now we use the Base 64 encoding on the hash value(128 bit) which will produce the string value of 21 characters(provided adding an sequence or username in the original URL). So we have to pick the first 6 or 8 characters from the encoded string and check the uniqueness while inserting into the DB.

My questions is?
For every unique url, MD5 hash function will produce the unique 128 bit hash representation from the pool of 2^128 possible hash values. Now you convert that hash using the base 64 encoding and you get 21 characters of base 64 text.

If so, i am not getting the idea behind this below calculation from the course? For what purpose we are doing this calculation and coming up to the conclusion of choosing base 64?

Using base64 encoding, a 6 letter long key would result in 64^6 = ~68.7 billion possible strings
Using base64 encoding, an 8 letter long key would result in 64^8 = ~281 trillion possible strings
With 68.7B unique strings, let’s assume for our system six letter keys would suffice.

Kindly help me on clarifying my above doubt what is calculation is all about the given context.

Thanks Mohan for putting up the question. Your understanding of the hashing and encoding algorithm is correct.

The calculation (64^6 = ~68.7 billion) has nothing to do with the algorithm, except that it tells us that we can have 68.7 billion different 6 characters strings for base64. This means that we can store only 68.7 billion unique short URLs in our system. This defines an upper bound of the system.

Hope this answered your question.

@Design_Gurus, Shouldn’t nPr be the calculation here? why 64^6 instead of 64P6?
we can have 68.7 billion different 6 characters strings for base64

64P6 gives approx 54Billion. Or I am calculating it wrongly. Can you please elaborate on why power is chosen over Permutations?

3 Likes

@Design_Gurus: I too had the same question. How do we generate 68.7 billion 6 charater string using base 64? Whats the algorithm? We are not hashing or encoding URL’s in the KGS. We are pre-creating them using some algorithm. May i ask you how did we come up with the number? and what algorithm we use.

1 Like