Cache Memory estimates [Capacity Estimation and Constraints]

vikram_singh · August 20, 2021, 9:26am

The memory estimates for the cache states that

If we follow the 80-20 rule, meaning 20% of URLs generate 80% of traffic, we would like to cache these 20% hot URLs.

Since we have 20K requests per second, we will be getting 1.7 billion requests per day:

20K * 3600 seconds * 24 hours = ~1.7 billion

To cache 20% of these requests, we will need 170GB of memory.

0.2 * 1.7 billion * 500 bytes = ~170GB

Why are we calculating the cache estimates based on the read QPS? Should it not be the write QPS?
We cache 20% of the requests, which would lead to 
200 QPS meaning 200 * 60 * 60 * 24 = 17 million requests in day.  
=> 17,280,000 * 500 bytes = 8.6 TB of memory.
 
Caching 20% of the data = 1.72 TB of memory.

Also 30 billion * 500 bytes = 15 PB and not 15 TB

Claudiu · August 30, 2021, 5:31pm

Why are we calculating the cache estimates based on the read QPS?

Because the cache is used to serve read requests. A write request does not have any impact on the cache. In this design, a newly created short URL (ie. via a write request) will be cached on the first read.

Design_Gurus · August 31, 2021, 11:30pm

30 billion * 500 bytes = 15 PB is correct.

30 billion * 500 bytes = 15,000 billion bytes = 15 trillion bytes = 15 TB

Jorge_Maroto_Garcia · December 28, 2021, 7:03pm

Because the cache is used to serve read requests. A write request does not have any impact on the cache.

But if we’re caching 100 items, the cache size would be the same for 1000 read-requests as for 1.000.000, right? I mean, in the worst-case scenario, we’ll have all the items in the cache, but that’s the maximum, as the cache will store once (as maximum) each item (that is, the available at that moment), so thinking on that way, it looks like more connected to the sum of the writings across the time, isn’t it?

Thanks for answer!