Paper Review of BIDL: A Blockchain Framework for Datacenter Networks
This is a paper review of BIDL.
Permissioned Blockchains suffer from performance issues. For instance, Diem(Based on HotStuff) has 2k and 200 ms delay and Hyperledger fabric has 9.3k and 100 ms delay. BIDL argues that sequential workflows in permissioned blockchains are the root cause for low performance in the datacenter environment.
Instead of sequential workflow, BIDL proposed shepherded parallel workflow that paralyzes speculative execution and consensus. To prevent malicious behaviors that can affect parallelization in workflow, BIDL crates a denylist based on view change protocol. The results show a reduction in latency up to 72% and 4.3x improvement in throughput.
Paper categorized the blockchain workflows into two types. First, execute, consensus, then validate. This type has a high abort rate around 40$ in a real world applications. Second, consensus, then execute. This type has a zero abort rate. See figure below to indicate the difference between two workflows.
BIDL runs a dedicated node as a sequencer for all clients transactions(T). The function of sequencer is done in routing layer. Specifically, BIDL adds sequence numbers to all transactions and routes all transactions to all nodes by routing-aware multicast. If the sequencer is never faulty, all nodes can receive almost all transactions in the same order and can commit the transactions on their own.
Here is the evaluation results from BIDL compared with Hyberleder Fabric(HLF) and FastFabric(FF).
BIDL protocol has five phases to commit the blocks. The Figure below illustrates all five phases.
Phase 1 (Submit): clients submit signed transactions to the leader of consensus nodes via a TLS-enabled link. The leader drops the connection if the client sends malformed transactions (e.g., with invalid signatures) to avoid the client mounting DoS attacks on the leader.
Phase 2 (Multicast): the leader assigns received transactions with consecutive sequence numbers and multicasts them to all consensus and normal nodes. The leader does not sign on the multicast messages for two reasons. Firstly, signing each sequence number and verifying them on all nodes is computationally expensive. Secondly, using signatures opens the door for the adversary A to mount resource exhaustion attacks on BIDL by broadcasting malformed sequenced transactions with invalid signatures.
Phase 3(Consensus): the consensus nodes run an instance of a BFT protocol to agree on a sequence of hashes for transactions multicasted from the leader.
Phase 4(Speculative Execution): runs in parallel with Phase 3. In BIDL, there are two types of nodes called normal nodes and consensus nodes. The normal nodes (phase-4) speculatively execute client transactions according to sequence numbers from the leader. To ensure state consistency in the presence of non-deterministic transactions, the paper adopts a multi-write protocol(Dynamo) to make nodes follow identical execution results and abort a transaction if nodes produce inconsistent results for the transaction. BIDL’s Phase-5 falls back to the sequential workflow in the presence of A by letting normal nodes follow the agreed transactions and re-execute them. Moreover, Bidl’s denylist protocol is designed to detect such an adversary and to retain BIDL’s high performance of parallel workflow.
Phase 4–2: approve and persist execution results. All normal nodes execute the transaction T. Due to a trust between normal nodes, each organization selects a delegate node. This delegate node signs one result. The delegate node collects signatures from T’s related organizations to produce a vector of results r. Then, BIDL runs persist protocol to ensure r is identical and retrievable. Each delegate node will send to all consensus nodes the r. Each consensus node broadcasts a Persist message to all normal nodes if r matches the transaction hashes proposed by the leader. Upon receiving the Persist messages for r from 2f + 1 consensus nodes, each normal node regards r has been successfully persisted. Then, each normal node commits r to its local state only if all results in r are consistent; otherwise, normal nodes abort the transaction T and r.
Phase 5(Commit): BIDL’s normal nodes commit a transaction only after receiving matching agreed transactions in Phase 3 and persisted execution results in Phase 4, ensuring safety and reasonable liveness. In BIDL, transactions aborts are only caused by non-determinism. Meanwhile, BIDL parallelizes the consensus and execution phases to greatly reduce the latency.
A malicious leader sends inconsistent transactions to nodes, dropping specific clients’ transactions or creating gaps in sequence numbers. BIDL uses view changes to address malicious leaders. Bidl’s view change different from PBFT where the leadership is rotated among consensus nodes in a round-robin way, in BIDL, the leadership rotation is unpredictable. The view change messages piggyback message fields for maintaining BIDL’s denylists.
A consensus node can know that a transaction has to be re-executed by detecting a mismatched r which implies the speculated transaction on a normal node mismatches the one agreed by the consensus nodes. In addition, a malicious leader can also try to break Bidl’s parallel workflow by consistently delaying sending transactions to normal nodes until the consensus is achieved. It handles by using BIDL’s persist protocol.
BIDL’s persist protocol detects this delay attack by consensus nodes due to the increasing delay for receiving persist requests. BIDL eliminates sequence number signatures on ordered transactions and conducts IP multicast. This opens the door for adversary A to pretend the leadership. To overcome this problem, BIDL proposed a denylist mechanism/protocol. BIDL’s has a denylist protocol to detect malicious clients who sign and transmit crafted T. It is good for two reasons. First, A cannot create a large number of clients. Second, A only uses T that sing by malicious client to create conflict.
To ensure correct both client and leader, and T signs by client cm conflict with other T across f+1 views with different leaders then they is high probability that cm is malicious. BIDL allows correct leader to invokes view-change on a conflict and allows random way of rotating leader. BIDL divides view in epochs each 3f+1 views. Each consensus node is a leader of view.
Denylist consists of three steps: 1) Ni detects two conflict T then add malicious client cm to suspension list S. 2) If Ni suspects cm in f+1 views with different leader cm is malicious. 3) During view change, cm carries in view change messages after f+1 all consensus nodes add cm in D which is denylist.
In BIDL’s evaluation, the experiments done on 20 servers with 40 GBPS NIC. The RTT is 0.2 ms between servers. Each sequence number add 20 µs for 1K transaction. BIDL compared with HLF, Streamchain, and FastFabric. In consensus, BIDL uses hash which is 32 bytes instead of payload which is 1K bytes. BIDL consensus integrated with SMaRT, Zyzzyva, HotStuff, and SBFT and block size is 500 transactions.
(BIDL’s A setting) consists of 4 consensus nodes with F=1 and 50 Normal nodes. (BIDL’s B setting) consists of 97 organizations where there is a one consensus node and normal node per organization. BIDL achieves better throughput than FastFabric. BIDL’s lower latency was primarily due to BIDL’s parallelization of the execution and consensus phases.
We different numbers of organizations, as shown in Figure 6, each organization has one consensus node and one normal node in order to evaluate Bidl scalability. With the number of organizations increased, Bidl’s latency on four BFT decreased quickly. When the number of organizations increased, Bidl’s latency first decreased then increased. This is because when the number of organizations was small, Bidl workflow’s performance was dominated by the latency of transaction execution: normal nodes of each organization need to verify and execute more transactions with fewer organizations. When the number of organizations increased towards 30, the number of transactions processed by each organization decreased, leading to lower latency. When the number of organizations continued to increase, the consensus became the major performance bottleneck of Bidl’s workflow.