educative.io

Why over index on scaling calculation on index server?

While I appreciate the author does a good job in detailed explanation on how many machines we need for the index server, I agree even if we delegate the index server job entirely to database it is still a good idea to calculate how many hardware is needed. But why such calculation is only on index server? Doesn’t tweet db also need the exact same calculation. Such bias is hard to be justified during the interview. And it is easy to run out of time by doing so. So my question is how valuable is such biased hardware scaling calculation? Also feel confused why index on having dedicated index server + aggr server while such job can entirely be delegated to database server?

The author has mentioned both these numbers. i.e., a total number of database servers (assuming each has 4TB storage) = 125 and the index servers (assuming each has a storage capacity of 144GB) = 152. This answers your question; hence, we have no such bias because both the answers are provided in the calculation.
Coming to your last question on why is the entire job not delegated to the Database server, I think there are two possible reasons for that. The Database is an important element in the infrastructure and you should not be making such a server do multiple tasks for the following reasons:

  1. Performance: Overloading the DB servers could increase the turnaround time for querying data from DB
  2. Security: If you have multiple services running on one machine and one gets hacked, possibility of hacking the other is very high.
1 Like

I’m curious to know why index servers are assumed to have only 144GB each while database servers have 4TB each. Can someone clarify?

Most of this is clearly assumed. The assumptions are made on the basis that generally, DB servers have higher storage than any other server which is not supposed to store a large amount of data. And if you keep one index server instead of multiple, you have a lower level of multiplexing possibilities which can give a higher response time.