The random key generator is used to prepolutate the DB with blank keys that don’t have any source URLs. However, I don’t understand whether they solve any real life problem.
-
When inserting a new key/source URL pair in the database, we need to first find the key location to insert, which is a “lookup” operation in the DB (most likely some BTree). After looking-up the key, it’s value would be modified to point to the source URL. Now, whether the key preexisted as a blank key or not does not save us any IOPS on the DB server.
-
Also, an implication of storing blank keys in the DB implies we cannot check whether a user is adding duplicate key/source URL entries and prevent it. This is so because since the key is not derived as a hash of the source URL, we cannot check if the key/source URL mapping already exists. Now, I agree that the cost of storage is low enough that we can accept duplicates from the same user but if we don’t have any extra cost to implement this feature, why not ?
-
Given that we don’t need a cryptographically secure hash of the source URL to compute a key from it, the cost of a murmur3 hash of the source URL is very low, typically 150/200 nano seconds. I think this low cost makes the random key generator quite useless or in-fact disadvantageous since we lose the feature mentioned in (2).
-
With murmur3 hash, there would be collisions but the probability is very low. Consequently, when inserting the key/source URL in the BTree, a simple check for preexisting key in the DB would allow us to modify the key should there be a collision. The way the key would be modified in such a case, is to create new key by adding a byte to the byte stream of the original key till we get a new unique key.
So then again, what real life problems are being solved by the KGS and is it even worth having one ?