Range based partitioning in TinyUrl - should not be an issue?

Stefan_Vrecic · May 21, 2022, 6:46am

The material states

The main problem with this approach is that it can lead to unbalanced DB servers. For example, we decide to put all URLs starting with the letter ‘E’ into a DB partition, but later we realize that we have too many URLs that start with the letter ‘E.’

However, because they keygen is generating randomly, it is likely that we have a relatively even distribution of URLs starting with ‘E’ just as much as ‘x’ or ‘S’ or whatever other character you can think of. So this should not be a problem as it is quite uniform?

lfrah_Dar · June 17, 2022, 4:48pm

Hi @Stefan_Vrecic,

The partitioning mentioned here is being done based on the actual URLs rather than the short URLs generated by the key generator, and the former is statistically not evenly distributed. So, such range-based partitioning will lead to unbalanced DB servers.

Muhammad_J_Saleet · June 27, 2022, 7:40am

But partitioning based on the actual URLs does not make any sense. If we take a look at the database schema, we can see that we are storing the shortened URLs as primary key (so these are unique) and the actual URL as a regular column (not unique). The actual URLs are not unique because if different users request the same URL, they should get different shortened URLs (as stated in an earlier section).

Ateeq_Ur_Rehman_Baig · June 27, 2022, 1:11pm

@Muhammad_J_Saleet you’re right but this was how the partitioning is being done over here. I would like this opportunity to announce our brand new course on System Design. Please refer to this course for updated content on the TinyURL design.

Nima_Tohid · October 11, 2022, 5:24am

I mean. Why not just update this course as well? This seems lazy.