Number of servers required does not seem to to right

Smitha_naik · March 4, 2023, 3:34pm

If daily users are 300 million and each user has 20 request/responses, queries per second will be ~3500QPS * 20 = 70000QPS

If each server can handle upto 8000 RPS, then the total number of servers required should be ~10

Course: Grokking Modern System Design Interview for Engineers & Managers - Learn Interactively
Lesson: Requirements of Quora's Design - Grokking Modern System Design Interview for Engineers & Managers

Ibrahim_Nadir · May 26, 2023, 10:30am

Hi Smitha,

The calculations performed in the table and above are used for estimating the queries per second (QPS) only. To obtain a practical number of servers required to serve such a large number of QPS, we will need 37,500 servers as calculated in the lesson. Please remember that these calculations are based on a formula derived in the “Back-of-the-Envelope Calculations” chapter that we recently updated. The calculations performed in this lesson are there to obtain a QPS only.

Aditya_Verma · August 11, 2023, 1:12pm

@Ibrahim_Nadir I have referred to the Back of the envelop calculaton, and the only thing we derive from that section is average rps supported by server, which we assume to be 8000rps

Why wouldn’t qps be 69500/8000 ~ 10?
To obtain a practical number of servers required to serve such a large number of QPS, we will need 37,500 servers as calculated in the lesson. seems incorrect.

Ibrahim_Nadir · August 15, 2023, 6:02am

@Aditya_Verma the calculations performed above (to obtain ~10 servers) are actually logical and good enough to be presented in an interview, and it seems to work best if we have a uniform distribution of requests during the day.

However, the formula we presented assumes that all daily active users are concurrently requesting and that each user generates one request per second. It covers the request load during peak hours. But of course, besides peak hours, our servers will mostly be idle. And for this, we can use the elasticity concept for resources, as we acquire resources when needed and give back resources to the provider when there is no need.

Kindly revisit the second lesson of the Back of the envelope calculation that details why performing the calculation number of RPS/number of RPS a system can handle is not optimal even if it seems logical and mathematically correct. Please read starting from:

Indeed, the number above doesn’t seem right. If we only need 15 commodity servers to serve 500M daily users, then why do big services use millions of servers in a data center?

Before this discussion, we calculate the number of servers exactly the way you are suggesting. However, this is the best-case scenario when systems should ideally be prepared for the worst-case scenarios as well.

Hope this was helpful.

Aditya_Verma · August 15, 2023, 11:17am

Thanks Ibrahim. That is helpful

Sudeep_Kaushik · August 16, 2023, 7:48pm

@Ibrahim_Nadir I reviewed your answer and reflected back several times on the “2nd lesson” reference you had in your answer. So essentially, what your assumption is that you are estimating the max capacity by assuming the peak load scenario of 300M users hitting the servers simultaneously at any given second during the day. So then the calculation is done for 300M request (one for each user) at any given second, the worst case scenario so to speak. If this was the case then calculating the number of requests per second of 69500 was a moot point and probably a waste of time during the interview process. Looks like what you are suggesting is to calculate number of servers directly by the number of DAU that can be handled at per server level per second. This whole estimation approach should be reworded and the 65900 calculation should be skipped/eliminated as not only does it confuses the reader but also is summarily dismissed later on in the calculation.

In the end, the realistic number is going to be lower than the number calculated as the probability of all 300M users hitting the servers at the same time is practically negligible.

Ibrahim_Nadir · August 17, 2023, 6:47am

@Sudeep_Kaushik, we are glad to see such a healthy debate from interested users like yourself. Let me answer your query below:

You are right by pointing out that we are estimating the max capacity by assuming the peak load. This makes the system available and allow us to meet the non-functional requirements. We also discuss this because systems should always be ready for a surge in user requests. However, we cannot cross out the need to calculate the RPS (69,500 in this case) because this number allows us to calculate the lower end of the spectrum. i.e. the number of servers required to meet an average number of user requests. In an interview, both approaches would work and we want to prepare our learners as much as possible.

Additionally, I want to clarify that one user request (out of 300M requests) may fan out to multiple services and result in multiple requests. Imagine a user requesting to open a page containing a Quora question. This will result in requests to different services such as recommendations, users, ads, comments, and so on. Thus, different servers are required to deal with one query. Hence, the need for a large number of servers.

Note that we are currently working to revamp this portion of the course to facilitate our learners. Each concern of our learners is forwarded to the relevant team.