Cache memory size calculation

Kyle3 · April 26, 2022, 2:41am

I don’t follow what’s being done in the “Memory estimates” section for caching in the URL shortening service.

It states:

Memory estimates: If we want to cache some of the hot URLs that are frequently accessed, how much memory will we need to store them? If we follow the 80-20 rule, meaning 20% of URLs generate 80% of traffic, we would like to cache these 20% hot URLs.

Previously we estimated 300 million total URLs will be stored. So I understand this to mean 60 million URLs (20%) are the cause of 80% of the requests. There are ~1.7 billion requests per day, so ~1.35 billion (80%) of them should be cache hits, leaving ~34 million (20%) as cache misses.

However, the next paragraph goes on to say “To cache 20% of these requests, we will need 170GB of memory.” I think this is an error, but maybe not. Why would we cache 20% of requests… where does this come from? I thought we are caching the top 20% of URLs, which result in cache hits on 80% of requests.

To cache 20% of these requests, we will need 170GB of memory.

0.2 * 1.7 billion * 500 bytes = ~170GB

I am further lost by this calculation. Isn’t this ~170GB the amount of bandwidth spent on cache misses? Caching 20% of the total 300 million URLs would require only ~27 GB (300 million * 20% * 500 bytes).

Can someone tell me if/where I’ve gone wrong?

Ateeq_Ur_Rehman_Baig · April 27, 2022, 6:16am

Hi @Kyle3

The cache estimations are based on the incoming requests rather than the stored records. Out of 100 requests per day, the cached 20 requests would be the meat of the matter. In other words, 80 different requests per day would map to the same 20 URLs over and over again. That’s why the author is caching the 20% of the incoming hot URL requests that actually matter.

Moreover, the total URLs to be stored would be 30 Billion or 30000 Million. Caching 20% of them would result in (30 Billion * 20% * 500) = 3000 GB. So, caching based on requests would make more sense rather than the stored URLs.

Kyle3 · April 27, 2022, 2:18pm

Thanks @Ateeq_Ur_Rehman_Baig, you’re right. I somehow had 300 million in mind when it should’ve been 300 billion. What you said makes sense.