First of all, forgive me if my questions are not valid. I am new to learning the system design and hoping to get my questions clarified.
In this TinyURL design discussion, i am not getting the overall mechanism on hashing and encoding of the users given url.
Please correct me if my understanding is correct or wrong, my understanding is as follows,
MD5 Hashing will convert the given url value into 128 bit hash value and that will be unique value for the given url.
Now we use the Base 64 encoding on the hash value(128 bit) which will produce the string value of 21 characters(provided adding an sequence or username in the original URL). So we have to pick the first 6 or 8 characters from the encoded string and check the uniqueness while inserting into the DB.
My questions is?
For every unique url, MD5 hash function will produce the unique 128 bit hash representation from the pool of 2^128 possible hash values. Now you convert that hash using the base 64 encoding and you get 21 characters of base 64 text.
If so, i am not getting the idea behind this below calculation from the course? For what purpose we are doing this calculation and coming up to the conclusion of choosing base 64?
Using base64 encoding, a 6 letter long key would result in 64^6 = ~68.7 billion possible strings
Using base64 encoding, an 8 letter long key would result in 64^8 = ~281 trillion possible strings
With 68.7B unique strings, let’s assume for our system six letter keys would suffice.
Kindly help me on clarifying my above doubt what is calculation is all about the given context.