Introduction to CAP Theorem
Recently, I came across Mike Kowalski’s talk at the Hydra conference 2021 about the CAP theorem. I liked the talk because it was easy to understand given that I know what the CAP theorem is all about. I also read a Critique of the CAP theorem paper by Martin Kleppmann which is a nice paper. In this post, I decided to write a basic note about the CAP theorem.
The CAP theorem was published by Eric Brewer in 2000. The CAP theorem is one of the impossibility results in distributed systems and stands for Consistency, Availability, and Partition tolerance. These three are the main aspects of the modern distributed data system. The three aspects as shown in the Figure below.
The CAP theorem states that a distributed system cannot simultaneously be consistent, available, and partition tolerant. The CAP theorem has no clear definition, but many authors prefer the following definition: if there is no network partition, a system can be both consistent and available when a network partition occurs, a system must choose between either consistency (CP) or availability (AP). The CP results in bad performance most of the time while AP results in a reading from the past (stale values).
If the system adopts CA, the system does not guarantee partition tolerance. You can think of CA as a database on a single machine, without any notion of distributed systems. Example of CP is Spanner and CockroachDB while the example of AP is NoSQL.
Consistency is not one particular property, but a spectrum of models with varying strengths of guarantee. C is defined as linearizability which is explained in this post. Consistency is a safety property where a read operation that begins after a write operation completes must return that value. Hence, replication techniques are necessary to implement consistency such as RAFT consensus protocol. A strong consistency model like linearizability provides an easy understanding because all operations behave as if they are executed atomically on a single copy of the data. However, this guarantee comes at the cost of reduced performance and fault tolerance. I suggest watching Martin talk.
Availability means that every request received by a non-failing node must result in a (non-error) response. The availability definition does not require an upper bound on operation latency. It only requires requests to eventually return a response within some unbounded, but finite time. The availability is a pure liveness property, not a safety property. In an available system, if our client sends a request to a server and the server has not crashed, then the server must eventually respond to the client. The server is not allowed to ignore the client’s requests.
Network Partition means that there is an issue with the communication. The network partition inherits part of the network. A network partition is a communication fault that splits the network into subsets of nodes such that nodes in one subset cannot communicate with nodes in another set.
The confusing CAP and ACID wording
CAP and ACID share a common vocabulary: Atomic, Consistent, and so on. But there is a catch: the words are the same but…