This paper proposed an implementation and evaluation of a dynamic timestamp allocation algorithm called MaaT. MaaT is a variant of the optimistic concurrency control algorithm. The main idea is to provide a range of timestamps for each transaction instead of a fixed timestamp in CockroachDB. The purpose of MaaT is to reduce a high abort rate under contention workloads. The experiment setup and evaluation show that MaaT was implemented on the CockroachDB and improved CockroachDB’s performance.
Before we get to the MaaT details and the paper contributions, we need to know CockroachDB. Previously, I wrote some simple summaries of CockroachDB. You can find them HERE[1,2,3]. From now on, I will use CRDB in this summary which stands for CockroachDB.
CRDB uses a multi-version timestamp ordering protocol to provide serializability. If you are unfamiliar with a multi-version timestamp, please look for my concurrency presentation HERE and other posts[1,2] for serializability. The below Figure illustrated the multi-version timestamps in RW and WW conflicts.
CRDB uses Serializable Snapshot Isolation(SSI) based technique to guarantee its serializability with both a lock-free multi-version timestamp ordering scheme and a fixed timestamp allocation scheme. Michael from Sydney university invented SSI semantics. I recommend that you read his paper.
The SSI detects and handles read and write conflicts. SSI is one of the strongest isolation levels that prevents all anomalies. However, it comes with a performance cost. The fixed timestamp leads to a high abort rate under high contentions.
In CRDB, the timestamp assigns for each transaction at the beginning of the transaction. CRDB uses the assigned timestamp as a commit time for the transaction. The figure below illustrated the issue with the fixed timestamp.
Unlike a fixed timestamp-based technique, the dynamic timestamp-based technique(MaaT) tries to avoid the aborts in transactions. MaaT technique targets cases where transactions are aborted due to being non-serializable because of a fixed timestamp ordering. Notice, MaaT cannot solve conflict transactions because of the order of accessing the data item(a conflict graph cycle).
CRDB architecture is a little bit hard and I will mention a high-level design in this post. CRDB is an SQL layer on top of distributed key-value store. CRDB is designed with many third-party libraries including RAFT, RocksDB, CLI, and Postgresql. If you wanna know more, I suggest checking THIS out.
Basically, we have a coordinator, that forwards the transaction to distributed sender. The transaction coordinator starts by creating Begin command and sends it to the distributed sender. The Begin will create a transaction record. The following operations in the same transaction will only use the same transaction record. Distributed sender communicates with CRDB nodes and forwards transactions from the transaction coordinator to the leaseholder.
Replicas are responsible for creating transaction records, updating the transaction cache, and performing read and write operations by calling the MVCC layer. RAFT is used to replicate transaction records, write intents, and resolve write intents. Every transaction starts with the initial status called pending. If a transaction is aborted due to data conflicts, the status is changed to aborted. Otherwise, the status is changed to committed.
What is Maat?
It stands for Multi-access as a Transaction. It redesigns the verification phase in OCC protocol to understand the conflict and see if it violates serializability. While performing read and write operations, each transaction maintains and updates valid timestamp ranges. The transaction can commit in the range based on the data items accessed and conflicting operations on those data items.
Example of two transactions with an illustration of MaaT dynamic allocation timestamps:
CockroachDB isolation level is SSI, transactions are aborted in the following cases:
- Conflicts are analyzed at every write, to check whether any later transactions have read or written the data item currently being written by the transaction. If this is the case, the transaction is aborted.
- When reading an item, if there is a write intent created with a lower timestamp, then the SSI approach in CockroachDB will abort one of the transactions. The transaction to abort will be decided based on the priority of the transactions involved.
Contention indicates skew on a subset of data. For example, the contention ratio 70:30 indicates that 70% of the data items will be accessed by 30% of the transaction, while the remaining 30% of the data items will be accessed by the remaining 70% of the transactions. The default contention rate is uniform which equals 50:50. Read-only transaction indicates that x% of the transactions have only read operations (5 per transaction) and the rest of the transitions will be read (2) and write(3) operations.
The below Figure illustrated if we have 50% read-only transactions. Notice, another 50% will have write transactions which have 30% out of 50% read operations.
The default number of concurrent transactions is 50. The benchmarks create 50 threads. If a transaction has 2 writes and 3 reads, it executes sequentially.
By varying the read transaction, the MaaT is still better than fixed timestamp ordering.
End of my Post.