Snowflake to Avalanche: A Novel Metastable Consensus Protocol Family for Cryptocurrencies
This paper introduced a new leaderless Byzantine fault tolerance protocol called Avalanche. Avalanche focuses on a scalable electronic payment system. It is different from all previous protocols in blockchain space and has no special node like the primary node in PBFT.
The paper claims that Avalanche is scalable and fast compared to all previous protocols. To put things in perspective, Avalanche is the third generation protocol after classical and Nakamoto protocols. Avalanche has four stages for consensus algorithms built on metastability mechanisms and network randomness sampling inspired by sampling based gossip protocol to allow efficient communication.
Before I explain the Avalanche protocols, let me point out the drawbacks of first and second generations in consensus space. Classical consensus requires nodes to know every other node in the network (required knowledge of membership) and to communicate with every other node (i.e., quadratic communication overhead). Nakamoto Consensus provides a probabilistic safety guarantee. Instead of every node agreeing on a value, nodes will all agree on the probability of the value is correct. Nakamoto also has a long waiting time for finality and consumes a large amount of energy due to minor operations.
Avalanche overcomes all earlier generations’ shortcomings. The Communication overhead is O(1) per round for O(log n) rounds instead of O(n) per round in the traditional consensus protocol. Avalanche is a green protocol compared to high energy consumption in Nakamoto (bitcoin). That being said, Avalanche is not perfect. Avalanche has a probabilistic safety guarantee similar to Nakamoto consensus. In this case, Avalanche is more difficult to get a probabilistic safety guarantee right because of tunable security parameters while traditional protocols are deterministic safety guarantees. Avalanche uses DAG data structure and establishes a partial order among dependent transactions. This needs extra engineering efforts compared to blockchain based systems.
Liveness property in Avalanche is synchronous and has a timeout mechanism which is similar to traditional consensus protocols. However, it has been developed on a real internet (which is partially synchronous at best). Liveness is only guaranteed for virtuous transactions and another stall in the network.
The Avalanche consensus approach started off with Slush. It is non-Byzantine protocol, memoryless, and quorum-based (sampling). Then, Snowflake which has short memory (counter and lock). Then, Snowball (state of confidence), and an Avalanche (vote not on transactions but on parent and descendent).
At the beginning of the protocol, the Slush protocol is the most basic part of the Avalanche protocols. Slush protocol starts out initially in an uncolored state. Upon receiving a transaction, an uncolored node updates its own color to the one carried in the transaction and initiates a query. To perform a query, a node picks a small, constant sized (k) sample of the network uniformly at random, and sends a query message. Upon receiving a query, an uncolored node adopts the color in the query, responds with that color, and initiates its own query; whereas a colored node simply responds with its current color. If k responses are not received within a time bound, the node picks an extra sample from the remaining nodes uniformly at random and queries them until it collects all responses. Once the querying node collects k responses, it checks if a fraction ≥ αk is for the same color, where α > 0.5 is a protocol’s parameter. If the α k threshold is met and the sampled color differs from the node’s own color, the node flips to that color. It then goes back to the query step, and initiates a subsequent round of query, for a total of m rounds. Finally, the node decides the color it ended up with at time m. Here you can notice that engineers need to take all parameters into consideration.
Based on my understanding of consensu protocols, too many tunable parameters affect the protocol simplicity. However, the Slush protocol has many excellent features: the status is simple: under this protocol, nodes can be memoryless. Nodes do not retain any state other than the current color between each round of queries, especially the historical records of interactions with other nodes; Small sample: The Slush protocol is significantly different from the traditional consensus protocol; Repeat sampling: Slush requires m rounds of repeated sampling. Low communication cost: each node has a constant and predictable communication cost for each round, and the cost increases logarithmically with n. Snowflake and Snowball protocols which I will explain later, are both guaranteed to terminate after O(k n log n) messages.
To enable BFT, Snowflake is the second protocol in Avalanche. It is a BFT and has a single counter (cnt) that captures the strength of a node’s conviction in its current color. This per-node counter stores how many consecutive samples of the network have all yielded the same color. Upon every color change, the node resets cnt to 0. Upon every successful query that yields alpha * k responses for the same color as the node, the node increments cnt.
Avalanche is the last protocol in the metastable protocol family, generalizes Snowball, and maintains a dynamic append-only Directed Acyclic Graph (DAG) of all known transactions. The DAG has a single sink that is the genesis vertex. Maintaining a DAG provides two significant benefits: high efficiency, voting for a certain node in the DAG means voting for all nodes on the path from the creation point to this node, which improves the efficiency of the protocol; security, because the DAG contains transaction information, which is similar to the chain structure characteristics of the blockchain. This makes it difficult to undo past decisions without the approval of honest nodes. In the cryptocurrency application, transactions that spend the same funds (double spends) conflict, and form a conflict set, out of which only a single one can be accepted. Note that the conflict set of a virtuous transaction is always a singleton. Shaded regions in Figure 7 indicate conflict sets.
Avalanche executes multiple snowball instances with the aid of a DAG. The DAG serves to piggyback multiple instances. When a transaction T is queried, all transactions reachable from T by following the DAG edges are implicitly part of the query. A node will only respond positively to the query if T and its entire ancestry are currently the preferred option in their respective conflict sets. If more than a threshold of responders vote positively, the transaction is said to collect a chit, cuT = 1, otherwise, cuT = 0. With DAG there will be no mining groups. DAG does not need to link large blocks one after another in a long chain. Rather, it is going to build a graph of transactions. Each transaction will refer to two older transactions. As a result, DAG is able to confirm transactions quickly after a transaction has been received and there is no need to wait for the next block to validate the transaction. Nodes can exist in parallel, as long as the information is directed in the same way.
The advantage of this is it allows transactions to be confirmed quicker (no wait to be pooled into a block of all transactions for that period as per Bitcoin or Ethereum) they are confirmed as they are processed by newer transactions. The parent-child relationships encoded in the DAG may, but do not need to, correspond to application-specific dependencies. We use the term ancestor set to refer to all transactions reachable via parent edges back in history, and progeny to refer to all children transactions and their offspring.
Each correct node u keeps track of all transactions it has learned and keeps it in a set Tu. For example, the node u discovers the transaction T through a query process by sampling K random peers. The query starts to add T to the set and send a message for selected peers. If the transaction is strongly preferable, the vote is 1 or 0 otherwise. The node u accumulates the results of k and alpha * k is a threshold pass.
In figure 4, the protocol main loop executed in every node. In each iteration, nodes select a transaction that has not yet been query. If there is no transaction, the loop will stall until a new transaction is added to the set. Next, the transaction is added to Q, so it will not queried again by the node.
In the figure 6, explains when a node receives a query for transaction T from peers j. First, it adds the transaction to the set unless it already has it. Then it determines if the transaction is currently strongly preferred. If so, the node returns a positive response to peer j or negative.