Chain Replication (CR) is a protocol that handles crash failure for replicated storage object on a large scale system for high throughput, high availability, and strong consistency. This storage object is called storage services.

Storage services provides a client view of the object that has two states called history (Hist_objID) and pending (pending_ObjID). Hist_objID stores all history of updates for objID while pending_ObjID stores a set of pending requests for ObjID. The clients can perform query and update operations. The server states should output consistent transactions called T1, T2, and T3 that response to client operations(See the figure below).

Client’s view of an object

Chain…


In this post, I will summarize the SFT-BFT paper. This paper implemented on Diem project that uses at its core the HotStuff consensus algorithm. SFT-BFT is inspired by Nakamoto Consensus where any block obtains higher assurance as buried deeper in blockchain. Normally, BFT-SMR has a threshold resilient equals to F where N≥3F+1.

The SFT-BFT argues that this threshold might increase during optimistic period where the network is synchronous and has less faulty nodes. After both the block committed and optimistic period, the threshold of tolerating the faulty nodes increased up to 2/3. This means that once the blocks are committed…


In this post, I summarized the Fast-HotStuff paper. This paper published by Mohammad Jalalzai from The University of British Columbia, Jianyu Niu, and Chen Feng.

The Fast-Hotstuff (FHS) is a new consensus algorithm for blockchain. It has the same properties that were proposed in HotStuff: linear view change complexity and responsiveness. FHS, on the other hand, has better latency due to the elimination of one phase in HotStuff. The FHS is resilient against certain attacks such as forking attacks and provides fairness proof-of-stake support. …


A log is a general abstract data structure. It is an ordered list where data changes captured as events and stored in the log. Data changes stored in a sequence of records R with a specific size. A record is appended to the end of the log, and reads proceed left-to-right. Each record is assigned a unique sequential log number. The purpose of the log is to record what happened and when. The below Figure illustrates abstract view of the Log Structure. Read this for more details.

Log Structure: reference(1)

Ziplog is a total ordered shared log that achieves low latency and high…


Before WPaxos, most of us know what Paxos is and how it works. If you have not, please read this link. The single leader-paradigm in Paxos imposes limitations on the protocol scalability. In order to overcome Paxos limitations, WPaxos is a strong candidate to scale in WAN deployment despite many solutions have proposed so far such as leaderless protocol EPaxos.

WPaxos is a consensus protocol for geo-replication systems. WPaxos objectives are low latency, high throughput and availability in WAN setting comparing to EPaxos and others. WPaxos uses a modification version of flexible quorum to reduce the communication cost.

Each leader…


In this post, I will summarize the Mir-BFT paper. The paper presented a consensus protocol called Mir-bFT. The main goals of Mir-BFT are high performance throughput in WAN deployment, robust performance, preventing duplication attack by rotating hash assignment among leaders, remove bandwidth bottleneck that caused by single leader, and client signature verification sharing optimization to reduce computation bottlenecks.

Mir-BFT novelty is not parallel leader but the addressing of two attacks. …


In the state machine replication systems, people try to improve the system throughput by introducing more techniques. These techniques are not helpful all the time. Some techniques increase the system performance in some cases and some techniques hurt the system in other cases.

This paper introduced compartmentalization, that revolves around decoupling and scaling. The decoupling component means to divide the component into a number of subcomponents, then scale up the subcomponents.

There are three heuristics in Compartmentizated consensus protocol.

Heuristics-1: Decouple control flow from the data flow.

Heuristics-2: Decouple unscalable nodes from the scalable nodes.

Heuristics-3: Process independent commands independently.


Paxos is the basic consensus protocol that well-studied and defined in many papers. However, Paxos is not a scalable protocol. That means when the number of nodes in the cluster exceeds a certain number, the Paxos performance is diminished.

This performance bottleneck caused by incurring high communication on the leader node. To address this bottleneck, PigPaxos decouples the communication process from the consensus process in the leader node.

So, what exactly is PigPaxos?

PigPaxos is a round-based consensus protocol. The leader randomly selects nodes from acceptor groups to relay its message to the rest of the acceptors. This is done…


Mesos had launched from AMPlab at Berkeley in 2009 and it became apache project in 2013. Mesos consider popular among other cluster managers. Mesos built based on dynamic scheduling platform to manage cluster resources for many frameworks to drive up resources utilization. Mesos flexibility makes it easy to run many frameworks on top of it, frameworks from different versions, or completely different.

One of the Mesos strengths is data locality, which enables tasks to operate on the same machine that stored input data. Mesos in terms of data locality supports near optimal data locality. Mesos scheduler is distributed and works…


The state machine replication systems have multiple copies of the system states to avoid service unavailabilities due to copies failures. The system needs to keeps all copies synchronous to maintain consistency. This consistency provides by consensus algorithm that tolerate acceptable number of malicious nodes in the system. This post is about one of the consensus algorithm called PBFT, Practical Byzantine Fault Tolerance.

PBFT is a byzantine fault tolerance protocol that tolerate byzantine faults in asynchronous network. The byzantine behaviors such as software error and malicious attack caused serious damages for online services. PBFT assumes asynchronous network model for its safety…

Salem Alqahtani

I am interested in Distributed Systems!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store