There are multiple questions here, let’s tackle these one by one:
- Maximum Connections
There could be some misunderstanding about sockets here, let’s clear that first: a server listens only on one port and can have large numbers of open sockets from clients connecting to that one port.
On the TCP level, the tuple (source IP, source port, destination IP, destination port) must be unique for each simultaneous connection. That means a single client cannot open more than 65535 simultaneous connections to a server. But a server can (theoretically) server 65535 simultaneous connections per client.
So in practice, the server is only limited by how much CPU power, memory etc. it has to serve requests, not by the number of TCP connections to the server.
-
The above explanation tells us that we can serve more than 10k (as you mentioned) concurrent connections on one server, but it depends upon the CPU/memory on the server. Let’s say you will server 20k connection on one server, which means 50 machines, our guess is that the interviewer will still say it is too much. So, let’s dig deeper.
-
Another point (and a hint) raised in your interview was, how would you handle a cross region messaging, i.e., one user in India and one in the US. Both connecting to their respective region will cause a lot of latency, and supposedly Whatsapp is doing something else!
-
What can we do here? This was tricky especially when we didn’t discuss it in the chapter. One answer could be “peer-to-peer” chat. “Probably” Whatsapp does this. Let’s discuss the workflow:
a. Both the clients keep a connection with each other, in addition to the server.
b. All login/online/offline and chat initiating requests are served by the server.
c. Once the chat session initiates, there will be a direct connection between the two clients (i.e., peer-to-peer).
d. So all messages are transferred directly between the two clients, ensuring minimum latency.
e. Should we store the chat history on the server? Did you ask this question to the interviewer? There are three scenarios:
I. We don’t store any chat history on the server. We are cool.
II. We do store the chat for a short period of time on the server. Probably, Whatsapp does this.
III. We have permanent storage of the chat history on the server.
Let’s discuss the design of the last two scenarios.
- In both cases, the clients need to send the message to the server. So the client will broadcast the message to the server as well as to the other client. This will ensure minimum latency between the two clients.
- On the server, we can have a distributed queue kind of a system (like Kafka or RabitMQ). The message gets pushed to a distributed queue and an acknowledgment is sent to the client immediately. The server makes sure that everything pushed to the queue gets stored. There could be some network failures between the client and the server; in that case, the client (not the user) needs to send the message again.
- The client can independently retry sending the failed message to the other client or the server. This should not affect the latency between the two clients.
- Since the server is, simply pushing the message to a queue before acknowledging and also, the server is not responsible to pass that message to the other client; we can serve a lot more traffic on the server side. Remember, there could be separate servers taking messages from the queue and storing them.
- This could mean, we might need 20-25 servers to serve one million concurrent users.
Please see this about the TCP connections: https://serverfault.com/questions/533611/how-do-high-traffic-sites-service-more-than-65535-tcp-connections
Hope this will help.