Summary of Chain Replication Protocol

Salem Alqahtani
5 min readApr 7, 2021

Chain Replication (CR) is a protocol that handles crash failure for replicated storage object on a large scale system for high throughput, high availability, and strong consistency. This storage object is called storage services.

Storage services provides a client view of the object that has two states called history (Hist_objID) and pending (pending_ObjID). Hist_objID stores all history of updates for objID while pending_ObjID stores a set of pending requests for ObjID. The clients can perform query and update operations. The server states should output consistent transactions called T1, T2, and T3 that response to client operations(See the figure below).

Client’s view of an object

Chain replication(CR) replicates the object over N servers where N-1 server can fail. Objects are organized sequentially forming a chain of servers. The first server of the chain called a head and the last server called a tail. See figure 2 in the paper. The server responses to client requests in three functions. Reply Generation: Tail server replies to the client; Query Processing: queries requests forward to the tail of the chain and the tail server processes the request and replies to the client; Update Processing: Update requests direct to the head of the chain. Then the requests forward on the chain until the request handled by the tail.

CR has a strong consistent model that holds because only one server tail processed the request. The Hist_objID values stored by the tail server while pending_ObjID values stored by all servers but not yet processed by the tail. Once it processed, the requests will eliminate from pending set.

Client send requests either to the head of the chain for update operations or for tail of the chain for query operations. Receivers add the client request to a pending set that not yet processed by the tail or history set. The execution causes the request to be removed from the set of pending requests and add it to the History request set.

CR has a master node that does not fail that resides outside the protocol. This master nodes informs all nodes about any change in the system, such as detect failures in head or tail or any server, inform servers about their predecessors and successors, and informs the client about who is the head and tail servers.

CR can tolerate the failures and have three scenarios.

Failure at the head: Removing head server from the chain and make the successor of the chain the new head. By doing so, the transaction should be consistent with the T1,T2,T3, or no-op. This affecting the pending set of requests.

Failure at the tail: removing tail node affected pending and history set of requests which is consistent with T3.

Failure of other servers: The master informs predecessor and successor nodes of the failure. New predecessors has to forward all history requests to successor node. Predecessor has to maintain a list of sent_i requests that forwarded to new node until these requests processed by tail. Then after receiving an ack from tail node, predecessors remove requests form Sent_i list.

To maintain availability, everytime chain got shorter, the master add new servers. For simplicity, the new node added to the tail and the old tail forward its history and pending requests to new tail.

Primary and backup Protocol(p/b). The primary communicates with clients and with backups. Upon receiving the reply from the backup, the primary informes the client. If primary fails, one of the backup takes over.

With CR, the role of primary is divided between head and tail servers. This reduce the primary overhead and as result, the latency in RC is lower than p/b. Both disseminate update on all servers and CR does it in serial order.

The detect failed server is the dominant cost of both protocols. for chain replication, there are three failures:

Head failure: Query processing is uninterrupted and update is unavailable for 2 messages delays, which is master broadcast message to all servers and inform them the new head and broadcast message to all clients.

Middle server failure: Query processing is uninterrupted and update processing can be delayed but update requests are not lost. The message delay is 4. See figure 3.

Tail failure: Query and update processing are both unavailable for 2 message delivery delays while the master sends a message to new tial and then notifies all clients of the new tial using a broadcast.

for primary/backup, there are two failures:

primary failure: There are five messages delay. 1) master detect the failure, and broadcasts a message to all backup, to stop processing request. 2) Each backup replies to the master. 3) Master broadcasts the identity of the new primary to all backups. 4) The new primary processes the large number of updates and forwards any missing update to backups. 5) Then master broadcast message to all clients of the new primary.

backup failure: Query processing continues uninterrupted provided no update request are in progress. If update requests is in progress then a transient outage of at most 1 message delay from master that no message is coming from faulty backup.

The failure of primary node in p/b is worst than CR tail failure. The best case of chain replication (when middle server failure) is shorter than the best case outage for backup failure.

Single chain, No Failure: with p/b and CR, we have weak p/b and CR, where weak means no strong consistency guarantee. Throughput for weak-chain and weak-p/b were found to bee virtually identical. CR outperforms P/B because the workload disseminated on head and tail rather than solely on the primary.

In the experiment evaluations, the two weak variants outperform CR when there are more than 15% update because the load disseminates on all servers. When the update rate increased, CR outperforms weak variants, because all update done at the head, and query done in tail.

Multiple chain, No Failure: servers will host many chains. However, the set of object grouped together into a single volume. Then use a hash function to map every object to a volume which gives a unique chain. The client requests received by dispatcher that computes the hash to determine the volume and forward the object to the corresponding chain. The throughput for all chains and p/b are the same as one chain. See figure 5.

--

--