educative.io

Aggregator Service to spread the load for Model Serving components

Hi all,

The Ads recommention section of the Machine Learning Systen design said this

we scale out Model Serving and put Aggregator Service to spread the load for Model Serving components

I would like to know more about how Aggregator Service works and how it helps spreading the load for the Model Serving components. Would someone recommend some literature?

Thanks in advance


Course: https://www.educative.io/courses/machine-learning-system-design
Lesson: https://www.educative.io/courses/machine-learning-system-design/xlrLVmXRDnq

Hi @Pablo1 !!
The Aggregator Service plays a crucial role in spreading the load for the Model Serving components in the ad ranking system. It receives the list of ad candidates generated by the Candidate Generation Service and is responsible for distributing the workload across multiple instances of the Model Serving components to achieve scalability and meet the latency requirements.

Here is an overview of how the Aggregator Service works and how it helps spread the load:

  1. Candidate Generation: The Candidate Generation Service generates a list of ad candidates based on user information and other factors. These candidates are sent to the Aggregator Service for further processing.

  2. Load Distribution: The Aggregator Service receives the list of ad candidates and is designed to distribute the workload evenly across multiple instances of the Model Serving components. This is crucial to ensure that the latency requirements are met, even when dealing with a large volume of ad candidates.

  3. Model Serving Integration: The Aggregator Service interacts with the Model Serving components to obtain the latest model for ad ranking. It retrieves the necessary features from the Feature Store and sends them along with the ad candidates to the Model Serving components for scoring.

  4. Parallel Processing: The Aggregator Service splits the list of ad candidates and distributes them among multiple instances of the Model Serving components in a parallel manner. Each instance processes a subset of the ad candidates concurrently, which helps in spreading the computational load.

  5. Aggregation: Once the Model Serving components have completed scoring the ad candidates, the Aggregator Service collects the results and performs the final aggregation. It selects the top K ads based on their scores and returns the final list of ads to the upstream services.

By spreading the load across multiple instances of the Model Serving components, the Aggregator Service ensures that the computational workload is distributed efficiently, allowing for better scalability and meeting the latency requirements. This approach enables the system to handle a large volume of ad candidates while maintaining low latencies and providing accurate ad rankings.
I hope it helps. Happy Learning :blush:

Thanks for the answer. A per your point 4, it seems like the Aggregator calls is responsible for spreading the load across differen model instances. However, this seems unlikely to me, as the model service is usually hosted in a Kubernetes cluster or similar, which has its own load balancer to spread the load.

So maybe I am understanding it wrong. The Aggregator has the responsability parallelizing the calls to the model service. It gets N candidate ads, it parallelizes the calls to the model service (one call per ad) and these calls are distributed with the load balancer. Is it my understading correct?

This latest architecture seems a bit more plausible, but still a bit complex: there is a fan out process in the Aggregator, a fan in process in the load balancer and a fan out process again in the model serving.

Yes, You’re correct in pointing out that in a typical architecture, a Kubernetes cluster or similar system would handle the load balancing across multiple instances of the Model Serving components.
The primary responsibility of the Aggregator Service is to coordinate the parallel processing and aggregation of ad candidates across multiple instances of the Model Serving components. It acts as a coordinator or orchestrator in the overall workflow.
The Aggregator Service focuses on coordinating the parallel processing and aggregation of ad candidates across multiple instances of the Model Serving components. It splits the workload, sends the requests to the Model Serving components, collects the results, and performs the final aggregation. The infrastructure, such as the Kubernetes cluster, handles the load balancing of individual requests to the Model Serving components. This architecture helps achieve scalability, low latency, and accurate ad rankings.
Happy Learning :blush: