Paper Review of Kauri: BFT Consensus Protocol
This is a summary of the Kauri paper.
Kauri is a BFT communication abstraction that aims to improve system’s scalability and load balancing. System scalability is an issue due to a single leader bottleneck caused by a star Topology in HotStuff or an all-to-all communication pattern in PBFT.
Kauri novelty is to combine tree Topology and pipeline techniques avoiding a single leader bottleneck, gaining high system throughput, and achieving scalable system performance.
Kauri is an extension design of HotStuff replacing start Topology with dissemination and aggregation patterns. HotStuff has four rounds of communications as depicted in figure below.
Instead of using star Topology for these primitives in HotStuff, this paper described how to be implemented in tree Topology. A root node which is a leader node sends data to its children and each child forwards its received data to its children until data reaches all leaf nodes. Leaf nodes send their signature on the data(Block) to their parents. Then, each parent aggregates the signatures from their children and sends them to their parents and so on until the root node receives all vote signatures.
In case of failure, the Kauri uses an Impatient channel to avoid blocking. Nodes should returns value, ⟂, and never old values.
Cryptographic collection: A process Pi can create a new collection c with a value vi by calling c=new(Pi,vi) and merging two c collections. It checks if a collection c includes at least a given threshold of t distinct tuples with the same value v by calling HAS(c,v,t). BLS multisignature scheme has O(m) for internal nodes where m is the number of fanouts and O(1) is for verification.
In Hotstuff, there are two primitives called broadcastMSG and waitFor(N-f). These primitives are the same in Kauri.
BroadcastMSG is based on an impatient channel. If there is no progress after delta time because the faulty in intermediate nodes, impatient channel will send a special node ⟂. Every node will receive data from it’s parent and broadcast it to all its children. See the algorithm below for the BroadcastMSG primitive.
WaitFor relies on cryptographic primitives to aggregate signatures as it propagates toward the root. WaitFor is also based on an impatient channel to return values or special nodes to the parent.
For pipelining, HotStuff starts the next consensus instance after the first round has been completed. It has done that by piggybacking the first instance first round with second instance first round. HotStuff’s pipelining depends on the number of instances which is 4. See the figure below for both protocols.
In Kauri, the root completes its dissemination phase much faster than in HotStuff and it may become idle long before it starts collecting votes. This is because the fanout m is smaller than N. This allows the root to start multiple instances of consensus during the execution of a single consensus round. In the example of Figure 4, the leader is able to start 3 new instances during the execution of the first round of a given consensus instance.
Kauri has reconfiguration to bound the steps to configure the tree on failure. All nodes N have to be distributed on what they called bins. The number of bins is F+1. Then, each bin will be chosen to represent internal nodes of the tree. If there is no progress, it will choose the second bin. If it does not progress after f+1, then it will fall back to Star Topology.
Figure 5 depicts the throughput achieved when using Kauri in a setting with 𝑁 = 100 in the global scenario (200𝑚𝑠 RTT and 25𝑀𝑏/𝑠 bandwidth) for different block sizes and increasing pipelining stretch values. The best results are achieved with a pipelining stretch close to 5. The figure also shows that, with smaller block sizes, higher pipelining stretch values are needed to make full use of the resources.
HotStuff is sensitive to the available bandwidth and to the total number of nodes in the systems. Also, without pipeline techniques Kauri-np, the system is limited by the RTT.
While the use of trees (Kauri-np) results in better performance than stars in certain scenarios, only the combination of trees and pipelining permits to achieve a substantial performance increase.
In a system with unlimited bandwidth and processing power, the latency of consensus is bound by the RTT. However, in practice, bandwidth is not infinite and the system latency is limited by the sending time. The dissemination/aggregation parallelism enabled by a tree substantially reduces the sending time. So, sending time has a much larger impact on latency than the additional number of communication hops required by a tree.
This was show with a fixed RTT of 100𝑚𝑠 and vary the bandwidth from 25𝑀𝑏/𝑠 to 1000𝑀𝑏/𝑠. Figure 8 shows the results for 𝑁 = 100. As shown, the available bandwidth has a much larger impact on HotStuff-secp than in Kauri. In fact, for bandwidths smaller than 100𝑀𝑏/𝑠 Kauri offers better latency than Hotstuff-secp, and only at high bandwidths HotStuff-secp starts to have substantially better latency.
In Figure 9 for the global scenario with 𝑁 = 100 and the following block sizes: 32𝐾𝑏 64𝐾𝑏, 125𝐾𝑏, 250𝐾𝑏, 500𝐾𝑏, and 1𝑀𝑏. As the block size increases, the latency of all systems increases as expected due to an increase in the sending time.