Paper Review: Overload Control of micro-scale RPCs with Breakwater
In this post, I will review and summarize the Breakwater paper by Inho Cho and others, in OSDI’20.
This paper proposed a Breakwater system that relies on a server-driven admission control scheme. In this scheme, the server is responsible for controlling the allocation of resources to client requests based on the availability of resources and the server’s policy.
In the Breakwater system, clients are allowed to send requests only when they receive credits from the server. The Breakwater server manages a global pool of credits and allocates credits to clients based on the server’s policy. In the Breakwater system, the server uses queuing delay as an indicator of system load and decides how many credits to issue to clients based on the current queuing delay.
Table Of Contents
· The Problem
· Previous Work
. Existing Overload Control Mechanisms
. Why are Existing approaches not working?
· The Breakwater approach
· System Architecture
· Breakwater Evaluation Results
Datacenter applications like Big Data processing (Hadoop), HPC, and Financial Services applications that require real-time processing, composed of small independent services called microservices that communicate over RPCs. These applications require low latency. To achieve low latency, microservices imply SLOs that stand for service-level objectives(measurable goals that define the level of services that the system requires. It is usually defined in terms of metrics like response time, resource utilization, and throughput). In a normal workload, SLOs for microservices are achievable, but with overload, the SLOs are violated.
To maintain SLOs for microservices and prevent the server from being overloaded, there are two types of existing techniques:
1. Drop requests at an overloaded server.
2. Throttle the sending rate of requests at clients.
Existing Overload Control Mechanisms
The goal of overload control is to prevent the system from becoming overloaded or overwhelmed by too much demand or traffic and to shed or redirect the excess load before it can consume too many resources and cause the system to slow down or crash. To do that, there are two approaches as the paper stated them:
- Active Queue Management (AQM):
When the server detects an overload situation, it uses AQM to drop packets and reduce the traffic load. To achieve AQM, the system can have a simple queue approach for outstanding requests, or a more advanced approach called ColDel. Both require the server to process the request and generate a failure message to the client which will create a significant overhead with micro-scale RPCs. The figure-1 uses ColDel-AQM, 10 microseconds as an average to process the request, and 200 microseconds for SLO.
- Client-side Rate limiting: this approach adjusts to the server overloading state. If RTT is less than processing time, the reaction is quick and thus the performance is good.
Why is the existing approach not working?
Existing approaches suffer significant performance degradation. Existing approaches are designed for long-running RPCs. Using existing approaches for microsecond scale RPCs results in significant performance degradation. Microsecond-scale RPCs have different characteristics like short average service times, variability in service times (The execution time of an RPC can be stochastic in nature and can follow a long-tailed distribution, which means that some requests take much longer to complete than others), variability in demand, and large numbers of clients.
The Breakwater approach has three components:
- Explicit server-based admission control: The client must first request permission from the server and only after receiving permission can it send the actual request. This won’t add one more RTT because piggybacking and overcommitting credits are used, as detailed later in the context of Breakwater.
- Demand speculation with overcommitment: Explicit server-based admission control involves the server explicitly granting permission to the client to send requests. This allows the server to have complete control over the load it receives and enables more accurate control to maintain high utilization and low latency. On the other hand, demand speculation with overcommitment allows the server to speculate about clients’ demands and issue more credits than its capacity. This approach enables the server to maintain high utilization even in the presence of uncertain demand, without requiring explicit permission from the server for each request.
- Active Queue Management (AQM): Due to overcommitment, the server can occasionally receive more load than its capacity. Thus, we rely on AQM to shed the excess load. In Breakwater’s scheme, the need for AQM to drop requests is rare, as credits are only issued when the server is not overloaded.
The client sends a request that contains a register message to the server and piggybacks its first request. The server upon receiving the request, the server checks if it's overloaded. If not, the server registers the client, processes the request, and sends credits by piggybacking the result of the first request to the client or failure message.
Overload Detection: There are multiple signals we can utilize to determine whether a server is congested.
- CPU load is a popular congestion signal, but it does not capture other resources that can lead to high latency including hard disk and livelock state.
- Another potential congestion signal is queue length at the server. This is not good with high dispersion.
- queuing delay is used as a congestion signal instead of CPU load or queue length. Queuing delay is a more reliable indicator of request latency under RPC service time variability, and it allows for an intuitive mapping between the target SLO and the target queuing delay at the server. By using queuing delay as its congestion signal, Breakwater is able to more accurately control congestion and maintain low tail latency while maximizing goodput.
In Breakwater, there are two stages of queuing:
- Packet queue: Incoming packets are queued while they await processing to create a request.
- Request queue: Threads created to process requests are queued awaiting execution.
By accounting for the sum of each of the queuing delay stages, Breakwater ensures that the system only curbs incoming requests when it is overloaded, ignoring any non-overload-induced delays.
Server-driven Credit-based Admission Control: A Breakwater server controls the admission of incoming requests through a credit-based scheme. Server-driven admission control avoids the need for clients to probe the server to know what rate to send at. The global pool of credits, C_total, represents the maximum load that the server can handle while maintaining its SLO. To ensure that the server doesn’t become overloaded, Breakwater uses a control algorithm that adjusts C_total based on the measured queuing delay (dm) of incoming requests. The target queuing delay (dt) is set based on the SLO of the remote procedure call (RPC). If the measured queuing delay is less than the target queuing delay, Breakwater increases C_total additively, which allows the server to handle more requests.
If the measured queuing delay is more than the target queuing delay, Breakwater decreases C_total, which allows the server to handle fewer requests.
Parameter A controls the level of overcommitment and aggressiveness in the generation of credits. The parameter β controls the sensitivity of Breakwater to queue build-up. In this equation, the value of c_total is updated based on the measured queuing delay (dm) of incoming requests and the target queuing delay (dt) for the system. Parameter B is a scaling factor that determines the sensitivity of the credit pool to changes in the queuing delay.
Demand Speculation with Overcommitment: Breakwater algorithm needs to know the demand for resources from each client in order to accurately generate and distribute credits. To obtain this information, clients need to inform the server when their number of pending requests changes. The server can then select which clients to send credits to, ensuring that all issued credits are used and the server can accurately represent its capacity.
However, as the number of clients increases, exchanging demand messages can become a bottleneck due to additional network latency and processing overhead. Therefore, there is a tradeoff between the accuracy of credit generation and the messaging overhead required to achieve that accuracy.
To reduce messaging overhead, Breakwater only allows clients to notify the server of their demand when they send a request. This means that clients with sporadic demand may not be accurately represented in the credit distribution, but the reduction in messaging overhead makes the system more efficient overall.
A is set proportional to the number of clients, which means that the more clients there are, the higher the overcommitment factor will be. This allows the server to handle more requests than it can actually handle at a given time, which can be useful for handling sudden spikes in demand.
The β parameter is used to control how sensitive Breakwater is to queue build-up, which means that it regulates how quickly the server will reduce the number of requests it can handle if the queue starts to grow.
By adjusting these parameters, Breakwater can be tuned to the specific needs of the system being modeled.
AQM: Breakwater uses delay-based active queue management (AQM) to drop requests if the queuing delay exceeds the SLO-derived threshold, ensuring low tail latency at all times. However, overcommitment may occasionally cause the server to receive a higher load than its capacity, leading to long queues. Drops are rare in Breakwater due to its credit-based admission control scheme and can be further reduced by setting a large SLO budget.
Shenango is an operating system that has been designed to provide low tail latency for microsecond-scale applications with fast core allocations, lightweight user-level threads, and an efficient network stack.
Breakwater provides a simplified interface for applications to interact with the underlying network and handle RPCs (Remote Procedure Calls) in a low-latency and efficient manner. The library takes care of TCP connection management, credit-based admission control, and AQM (Active Queue Management) at the RPC layer. By abstracting these underlying network details, applications can focus on processing their requests without worrying about network complexities.
A request arriving at a Shenango server is first queued in a packet queue. Then a Shenango kernel thread processes packets and moves the payload to the socket memory buffer of the connection. Once all the payload of a request is prepared in the memory buffer, a thread in Breakwater parses the payload to a request and creates a thread to process it. Threads are queued pending execution, and when they execute, they execute to completion.
In Shenango, each core has a packet queue and a thread queue shared with IOKernel, and each queue maintains the timestamp of the oldest item. When the runtime is asked for the queuing delay, it returns the maximum of the packet queueing delay and thread queueing delay, which provides an accurate measurement of the total queuing delay experienced by a request.
This formula is used by the sender thread in deciding whether to issue new credits to a client, not issue any credits or revoke credits based on the available credits and demand information. If Cissued is less than Ctotal, cnew x becomes:
If Cissued is greater than Ctotal, cnew x becomes:
Breakwater Evaluation Results
The testbed used for the Breakwater evaluation was 11 nodes from the Cloudlab xl170 cluster.
DAGOR and SEDA are two baseline systems that are compared to Breakwater in the evaluation section of the paper. DAGOR is a priority-based overload control system used for WeChat microservices that assigns priorities based on business requirements and drops requests below a threshold based on queueing delay. SEDA uses a rate-based rate limiting algorithm that sets rates based on response time. To evaluate the performance of Breakwater, the authors modified SEDA’s algorithm to adjust rates based on the 99%-ile response time. Both DAGOR and SEDA are implemented as an RPC layer in Shenango with the same dispatcher model as Breakwater.
The overall performance for a workload with an exponential service time distribution and an average of 10 microseconds is reported. The paper states that the server’s capacity under this workload is around 850,000 requests per second.
The performance breakdown of Breakwater shows the contribution of each of its three major components: credit-based admission control, demand speculation, and delay-based AQM, to the system’s overall performance. The performance metrics measured were throughput and 99%-ile latency, using a synthetic workload with a service time that follows an exponentially distributed pattern with an average of 10 μs (SLO = 200 μs).
The results shown in Figure 9 indicate that with no overload control, the system’s throughput degrades, and tail latency soars as demand exceeds server capacity. This leads to most requests violating their SLO.
The activation of credit-based admission control effectively lowers and bounds the tail latency, but messaging overhead negatively affects throughput. Activation of demand speculation with message piggybacking reduces messaging overhead but worsens tail latency due to incast caused by credit overcommitment.
The employment of delay-based AQM allows Breakwater to handle incast effectively, leading to high throughput and low tail latency. Overall, it can be inferred that delay-based AQM is the most critical component of Breakwater’s performance, as it effectively handles incast and allows for high throughput and low tail latency.
To evaluate Breakwater under a more realistic workload, a scenario was created where one memcached instance serves 10,000 clients using the USR workload. The results were compared against DAGOR and SEDA. Breakwater achieved steady goodput, low latency, and low drop rate, while DAGOR and SEDA suffered from goodput degradation with high tail latency caused by incast when the server becomes overloaded. With clients’ demand of 2x capacity, Breakwater achieved better performance than both SEDA and DAGOR, with 5% more goodput and 1.8x lower 99%-ile latency than SEDA, and 14.3% more goodput and 2.9x lower 99%-ile latency than DAGOR. However, due to bimodally distributed service time with a mix of GET and SET requests, Breakwater showed around 25 μs higher 99%-ile latency than its SLO and about 1.5% point higher drop rate than DAGOR.
End of the Summary.