Facebook messenger asked at interview

manish · July 27, 2020, 2:29pm

While designing FACEBOOK chat application in an interview the following scenario happened .

He said suppose you have 10 million users out of which 1 million users chat at same time .

He asked “How many servers will be needed ?” This was asked after I said that each server can open only ~65000 sockets but practialy only 10000 . So I said 1million / 10000 i.e aprox 100 aprox which he said was too much . Anyways he said lets continue further . How many sockets can be opened on a server ? On a socket how many connections can be opened were his follow up question .
He said how will you make people talk so I said per user who logs in 1 will open a tcp socket and then between these sockets there will be data exchange . He said suppose I am in US and you are in India so you will open one socket in US server and one socket in India server so you should ideally see more than seconds or minutes delay but we do not see any delay in whatsapp . He said you are missing big picture .

REGARDING ABOVE EXPERIENCE: what should have been the answer to the question of no of people who can talk at same time on a server and sockets . please discuss as i got rejected because of this .

I posted same at stackoverflow but got downvoted so no one is answering . please help me . https://stackoverflow.com/questions/54319819/chat-server-design-interview-discussion

Design_Gurus · January 26, 2019, 8:25am

There are multiple questions here, let’s tackle these one by one:

Maximum Connections

There could be some misunderstanding about sockets here, let’s clear that first: a server listens only on one port and can have large numbers of open sockets from clients connecting to that one port.

On the TCP level, the tuple (source IP, source port, destination IP, destination port) must be unique for each simultaneous connection. That means a single client cannot open more than 65535 simultaneous connections to a server. But a server can (theoretically) server 65535 simultaneous connections per client.

So in practice, the server is only limited by how much CPU power, memory etc. it has to serve requests, not by the number of TCP connections to the server.

The above explanation tells us that we can serve more than 10k (as you mentioned) concurrent connections on one server, but it depends upon the CPU/memory on the server. Let’s say you will server 20k connection on one server, which means 50 machines, our guess is that the interviewer will still say it is too much. So, let’s dig deeper.
Another point (and a hint) raised in your interview was, how would you handle a cross region messaging, i.e., one user in India and one in the US. Both connecting to their respective region will cause a lot of latency, and supposedly Whatsapp is doing something else!
What can we do here? This was tricky especially when we didn’t discuss it in the chapter. One answer could be “peer-to-peer” chat. “Probably” Whatsapp does this. Let’s discuss the workflow:

a. Both the clients keep a connection with each other, in addition to the server.
b. All login/online/offline and chat initiating requests are served by the server.
c. Once the chat session initiates, there will be a direct connection between the two clients (i.e., peer-to-peer).
d. So all messages are transferred directly between the two clients, ensuring minimum latency.
e. Should we store the chat history on the server? Did you ask this question to the interviewer? There are three scenarios:

I. We don’t store any chat history on the server. We are cool.
II. We do store the chat for a short period of time on the server. Probably, Whatsapp does this.
III. We have permanent storage of the chat history on the server.

Let’s discuss the design of the last two scenarios.

In both cases, the clients need to send the message to the server. So the client will broadcast the message to the server as well as to the other client. This will ensure minimum latency between the two clients.
On the server, we can have a distributed queue kind of a system (like Kafka or RabitMQ). The message gets pushed to a distributed queue and an acknowledgment is sent to the client immediately. The server makes sure that everything pushed to the queue gets stored. There could be some network failures between the client and the server; in that case, the client (not the user) needs to send the message again.
The client can independently retry sending the failed message to the other client or the server. This should not affect the latency between the two clients.
Since the server is, simply pushing the message to a queue before acknowledging and also, the server is not responsible to pass that message to the other client; we can serve a lot more traffic on the server side. Remember, there could be separate servers taking messages from the queue and storing them.
This could mean, we might need 20-25 servers to serve one million concurrent users.

Please see this about the TCP connections: https://serverfault.com/questions/533611/how-do-high-traffic-sites-service-more-than-65535-tcp-connections

Hope this will help.

Ritesh_Bansal · January 28, 2019, 12:01am

Hey, Hi I am fairly new to these type of problems so pardon my questions.
U mentioned

Please correct me if I am wrong in my understanding, a server listens only from one port (ever? or it can be increased) but can open multiple connections with different clients (Can u please specify the limit?).

Does it mean a single server can only serve 65535 or less simultaneous connections with different clients?

Please i want to learn more about this stuff, can you suggest the place to start.

Design_Gurus · January 28, 2019, 12:37am

Hi Ritesh,

Yes, you are right, TCP listens on 1 port and talk on that same port.
No, a server can (theoretically) serve a lot more concurrent connection than 64k. The 64K limit is only for per client per port.

Here is a good read (see link at the bottom) - some relevant text (you should read the whole blog though):

What is the maximum number of concurrent TCP connections that a server can handle, in theory ?

A single listening port can accept more than one connection simultaneously.There is a ‘64K’ limit that is often cited, but that is per client per server port, and needs clarifying.

If a client has many connections to the same port on the same destination, then three of those fields will be the same — only source_port varies to differentiate the different connections. Ports are 16-bit numbers, therefore the maximum number of connections any given client can have to any given host port is 64K.

However, multiple clients can each have up to 64K connections to some server’s port, and if the server has multiple ports or either is multi-homed then you can multiply that further

So the real limit is file descriptors. Each individual socket connection is given a file descriptor, so the limit is really the number of file descriptors that the system has been configured to allow and resources to handle. The maximum limit is typically up over 300K, but is configurable e.g. with sysctl

manish · January 28, 2019, 4:03am

HI , design gurus . Thanks for your awesome response . It helps me understand better . Regarding your how whatssapp response I has a follow up question after reading your detailed response .

Suppose manish is talking to ravi’s whoose phone is now off . it means when ravi phone comes oneline he will get a new socket on his phone which ravi’s client will tell to server .

So p2p connection will be broken and each client will check before sending each message else that message will be sent to the server alone ?
Does It means a client will first ask server what is ip/port of ravi and then start a p2p talking to ravi .
Does It also means that server will have to store all ip/port information withitself . So for 10 million people across geographies how will it be stored and how will it be searched ?

I might be completely offtrack so please correct me if i am wrong …

Design_Gurus · January 29, 2019, 10:13am

Hi Manish,

Here are answers to your questions:

Yes, if the p2p connection is broken then, we have two options.

a. The client can fail the message and notify the user. Which means users can only send messages to online users.
b. The client sends the message to the server alone. So when the other client comes online, it asks the server for all the pending messages.

Yes, the server should facilitate both the clients to make the p2p connection.
Yes, the server will store the information about all the “active” users. This is similar to any messaging system. It is not a huge amount of data, e.g., if the connection details constitute 200bytes, we will need (10m * 200 = 2GB) this can easily fit on one server. A hash-table can efficiently store this information, “userID -> ConnectionDetails”.

Hope this answered your questions.

ma.bikram · May 22, 2019, 3:29pm

This could mean, we might need 20-25 servers to serve one million concurrent users.

I got bit confused on this part, how did you come up with 20-25 servers?

Design_Gurus · May 24, 2019, 5:06pm

Hi @ma.bikram,

This is just an estimation. As the server is only acknowledging and pushing the messages to a Distributed Queue, therefore it can handle a lot more connections. If we double our initial connections count from 20K to 40K (or even 50K) for one million concurrent users, we can reduce our server count from 50 to 25.

Practically, we have to see how strong our servers are and adjust things based on that. As mentioned above that the connection count is not the limiting factor but CPU and memory are.

Hope we were able to answer your question.

ma.bikram · May 24, 2019, 4:51pm

@Design_Gurus thanks for the explanation

Joseph · June 17, 2019, 9:05pm

hello, as far as I know whatsapp does not use peer to peer. It started with XMPP and evolved into their own.
Concerning delays between locations, it will happen in peer to peer.

So what I would suggest is:
1- All messages go through server and never peer to peer. Messages are also end to end encrypted (so server cannot read them).

2- Several servers are dispatched geographically, and you get connected to nearest server.
servers are also connected internally and can propagate messages between each other.

3- We are talking here about minimum delays. i.e if you are connected to nearest server at 100ms-200ms delay and we add 100ms delay for internal server propagation (to send message back to other connected client). This will have a maximum of 300 ms delay.

The above is also noticeable in whatsapp:

Try sending message to a friend in current country, it will take around 200ms
Try sending to another country you will notice the delay a bit higher

For scaling, websocket servers can be scaled (not via regular loadbalancers) but as stated before we can have an intelligent load balancing mechanism routing users to servers with least resource usage and per geolocation.

There are lots of services built on top of such design (not only messaging) ex: Firebase, socketcluster.io, pubnub.com

Rahul1 · July 26, 2019, 5:30pm

Could you please explain cross region scenarios?
I am in US and you are in India so you will open one socket in US server and one socket in India server so you should ideally see more than seconds or minutes delay but we do not see any delay in whatsapp.

If whatspp doesn’t do peer to peer instead it have server in between…

Thanks…

rahul9 · November 25, 2020, 7:49pm

agree.I was just reading that peer to peer thing and watsapp does not do this.Everything should go though server here.

Yash_Nasery · December 22, 2020, 3:45pm

Hi @Design_Gurus

I have a couple of questions. They are labelled with 1), 2) 3) and so on. I have written the rest to clarify whether I understand things correctly.
So correct me if I am wrong somewhere in between.

socket = IP addr + Port
connection = socket_client + socket_server

10 million users -> 1 million users chat at the same time (so basically 1 million / sec)

So for each client -
1) is 1 user = 1 client?
2) is 1 “connection” sufficient per user?

each connection = source IP + source port + client IP + client port

source IP -> fixed
source port -> variable (16 bit number)
client IP -> fixed
client port -> fixed

Does “client IP -> fixed, client port -> fixed” mean a server is always listening on 1 destination IP & 1 port only even if there’s multiple users?

If so that means we get “source port -> variable (16 bit number)” 16 bits i.e. 2^16 ~= 64k concurrent connections (that is per client).

So if 1 client == 1 user then why would we need to open 64k simlutaneous connections to the server?
Here we have 1 mil clients/users (depends on answer for my question 1), and a 300k max file descriptors
so basically if for each user source IP is different then the combo (connection = source IP + source port + client IP + client port)
will be different. So technically does that mean we can have 300k / (#file descriptors per connection) #concurrent connections?

I may be well off the track here. But I hope this makes sense. This has got me very interested and I want to know the answers to all these questions.

Thanks.