educative.io

How to decide the 20% pf URL's generates 80% of the traffic?

If we follow the 80-20 rule, meaning 20% of URLs generate 80% of traffic, we would like to cache these 20% hot URLs

how do we decide this 20% factor ?
Isn’t it altogether different problem ? or does this feature comes in-built with caches ?

Hi @Ayush_Chaubey,

80-20 rule is a general rule for estimation. From wikipedia, this is how it is defined:

The Pareto principle (also known as the 80/20 rule, or the law of the vital few, ) states that, for many events, roughly 80% of the effects come from 20% of the causes.

Here we are trying to “estimate” how much URLs we should try to cache. You can read more about this rule here: https://en.wikipedia.org/wiki/Pareto_principle

This estimation is a good start - if we have enough resources. We can increase/decrease the cache size based on the traffic pattern of the service. But that would require more measurements.

Hope this answers your question.

1 Like

Hey, thanks for reply.

I got the point about ParetoPrinciple. Thanks for sharing this.
I am also interested in knowing
“How does a cache maintains this 80:20 rule” ?
i.e. How does a cache system knows that this URL (20% part) going to generate the 80% of the traffic. This all depends on usage patterns.

So does this kind of feature is in-built with caches ? or do we need to implement something for this ?