All of the calculations in this section are based off of a bad assumption

Bryce · December 26, 2022, 4:05am

The reason your initial calculation is giving you inaccurate results isn’t in largest part due to (User_RPS)/(Server_RPS) not taking into account that you need multiple servers to handle a request, it’s because you’ve incorrectly estimated the average throughput of a mix of computationally bound and memory bound workloads.

Instead of taking the mean of these RPS’s, the minimum of them. To see why, imagine we got 1000 requests, half of which are memory bound and half of which are CPU bound. The total time to process these requests won’t be (1000/8000) seconds, it’ll be (500/360) seconds, which about 16 times as long.

Instead of fixing this fundamental problem with your calculations, you instead introduced a completely arbitrary formula:

Which makes no mathematical sense unless you assume that there will be a single second in which every user makes a request.

Although the number you get out the other end is clearly more accurate, this formula is completely arbitrary. It should be replaced with something based on a rigorous mathematical foundation (like some estimations for how many servers you need to handle + fan out an individual request).

The quality of the mathematics in this section makes me regret paying for Educative. This isn’t hard stuff, it’s high school level maths.

EDIT: Silly mistake in my original post. The point still stands though.

Course: Grokking Modern System Design Interview for Engineers & Managers - Learn Interactively
Lesson: How the Domain Name System Works - Grokking Modern System Design Interview for Engineers & Managers

Bismillah_Jan · January 25, 2023, 7:30am

Hello Bryce,

Thank you for your detailed feedback. We can use a suitable RPS value. We used mean, while there can be many other options like using RPS at P95, or choosing a mix of CPU and Mem/IO based RPS, etc. We can use some variables if we don’t want to be specific with numbers. The formula to calculate servers (Number of daily active users/RPS of a server) is not arbitrary. Though, you are right that we had made a couple of assumptions implicitly that were causing the confusion. We have now explicitly written our assumptions. Please see the updated text in this section.

Thanks.