A Paper Review of Leopard: Scaling BFT without Sacrificing Efficiency
This paper identifies the major bottlenecks in single leader based BFT protocols and proposes leopard that scales for hundred of replicas. The paper provides a new metric called scaling factor(SF). SF is the heaviest workload for confirming a bit of request among all replicas and captures the scalability of BFT protocols. Leopard has 5X throughput over HotStuff when the scale is 300 replicas.
The contributions of this paper are three things. 1) identifying the major bottleneck. 2) A scalable and efficient BFT. 3) Implementation and experimental evaluations.
Normally, BFT protocols use techniques like vote aggregation to reduce communication complexity, chain consensus ledger to simplify recovery mechanism, and the pipelined paradigm to reduce voting rounds. The equation below shows that how a leader’s throughput drops when n gets larger because the leader disseminates all requests to followers.
For scalable system when replica increases, the replica’s workload should not become a bottleneck. Scaling Factor(SF) is defined below for previous BFT.
The processing capacity C and expected throughput is limited by C/SF. The expected throughput is decreased when n increases.
Leopard is a partial synchronous and responsive. Leopard has a linear communication complexity and decoupled a BFT consensus proposals into two planes: 1) pending requests; 2) indexes each to a package of pending requests. Every non leader replica packs the pending requests in its buffer and disseminates the package to all the others whereby the leader only has to include an index to each outstanding package in a consensus proposals. The Figure below shows that the throughput of HotStuff and Leopard by varying number of replicas and bandwidth size.
Below equation is a scale factor for Leopard.
The main goal is to have a constant scaling factor. It means that the heaviest workload for confirming a bit of request among all replicas when they follow the protocol is constant to n. The protocol proceeds in successive views and each view consists of multiple agreement instances running in parallel and a view-change. In following figure, the paper show the typical message flow.
The client submits pending requests to replicas and waits for F+1 responses. Upon receiving requests from the client, each replica generates and then broadcasts a datablock. The leader of the current view does not generating datablocks, but initiate the agreement instance. The leader extracts good datablocks to generate a BFTblock, which is then sent to other replicas with a unique serial number for each BFTblock. For each agreement instance, a BFTblock follows by two rounds of voting among replicas.
A BFTblock in Leopard is allowed to be processed in parallel to increase the processing rate. Leopoard has two importants blocks: datablock and BFTblock. Datablock is generated by a replica using requests from its mempool. The BFTblock is what replicas actually agree on that generated by the leader of the view using received datablocks in its datablockPool. A BFTblock only contains the hash of the involved datablock and has two states: notarized that completed the 1st round of voting and confirmed that completed the 2nd round of voting. The algorithm 1 is for datablock’s generation and verification.
The honest replica will then guarantee the delivery of the request. To identify the responsible replicas for processing a given request, each client can execute a deterministic function µ(req) which returns s identifiers of replicas. req is sent to those s replicas by the client, and every replica can also use µ(req) to evaluate whether to process req. A simple instantiation of µ(req) is by checking the equality of H(req) mod q = H(i) mod q, where q = (n−1)/ s. In general, when setting s = 1, we can achieve the best possible performance
since there is no repetition of requests among replicas. There is a trade off between higher efficiency and better fault resistance.
Requests are getting confirmed during the normal-case mode of the protocol by using two-round voting see Algorithm 2. Leopard enables handling new requests without waiting for the completion of previous ones with limited number of parallel executed BFTblocks k. This is the idea of PBFT that uses a lower watermark lw and a higher watermark lw+k to limit the region of a valid serial number of BFTblock.
Now, the paper analyze the leader’s workload. The leader will receive n-1 requests from non-leaders.
A non-leader replica’s workload is estimated by the following:
The scaling factor on the heaviest workload that was derived from both leader and non leader workload’s equations.
Table I presents the batch sizes we used in the experiments.
Figure 7 shows throughput and latency of Leopard and HotStuff. HotStuff scales for 300 while Leopard scales for 600. Leopard achieves a 5X throughput when n = 300 compared to HotStuff, and the gap gets wider when n increases. The latencies of both systems share the same tendency as n increases, but is more higher than Leopard because it spends a longer time collecting enough requests.
Bandwidth usage in Leopard and HotStuff
The Bandwidth usage breakdown at the leader and the non-leader replica in Leopard.
The communication complexity of Leopard and other BFT protocols in the normal-case mode where the leader is honest. The Leopard preserves the linear communication of the protocol overall and reduces the cost to the leader. The communication complexity of each replica in Leopard remains constant with respect to n. Hence, the protocol will not get stuck at some specific node when the scale increases.