Cache server load balancing

Franklin · July 27, 2020, 2:28pm

I knew we often use a simple Round Robin approach to distributes incoming requests equally among backend/application servers. Because every backend server is the same and can handle the same request. And it’s very easy to understand that.

But for cache server, I was so confused. Does cache server contain the CACHE like database server contain DATABASE? Or does cache server just mean logic code for handling cache and cache is in other place?
(I’m confused by the expression in this course, such as cache server vs cache, database server vs database. Does former contains latter or they are the same?)

“To resolve this issue any busy server in one location can redirect a client to a less busy server in the same cache location.” from this sentence, I assume cache server doesn’t store cache itself and just handle logic there. Then why can we just use Round Robin approach to handle the load balancing?

@Design_Gurus can you help me understand this? thanks!

Ahmed_Ali · December 15, 2019, 11:51pm

I am also confused about this part especially this part “For instance, if a video becomes popular, the logical replica corresponding to that video will experience more traffic than other servers. These uneven loads for logical replicas can then translate into uneven load distribution on corresponding physical servers. To resolve this issue any busy server in one location can redirect a client to a less busy server in the same cache location. We can use dynamic HTTP redirections for this scenario.”

please @Design_Gurus help us with this? thanks

Ido · April 13, 2020, 8:20pm

I believe that when they say cache servers, they probably mean netflix’s main work-horse - the open-connect boxes, which store most of the videos, and are deployed at every major ISP in the world.

The following post is fascinating: https://netflixtechblog.com/netflix-edge-load-balancing-695308b5548c and explains thoroughly and clearly all tricks of the load-balancing trade.

Main take-aways:

load balancer base their decision on what they know of the server’s load, based only on their requests and what they know of the queue lengths…
…plus what the servers tell them on their load, as there are other load balancers. this information is piggy-backed on the server’s response to the load-balancer for each request
these 2 factors are blended into the load-balancer’s decision. Also, the load-balancer uses a best-of-2-random servers approach. As-in selects 2 random servers and then chooses the best according to the above criteria. This is done to prevent selecting the best one at each turn, getting to a state of busy-quiet-busy-quiet pattern
new guys (servers) get a grace period of reduced load, to get a chance to warm up and avoid swamping them right on startup.

but don’t take my tldr, read the post! it’s so detailed and such an eye-opener!