Review: Polyjuice: High-Performance Transactions via Learned Concurrency Control
In this post, I will review and summarize the Polyjuice paper by Jiachen Wang and others, in OSDI’21.
This paper is about tuning the knobs of concurrency control in in-memory database systems by using machine learning techniques to improve system performance. Before taking a deep dive into the review, I would like to inform you that there are good video presentations of the paper [1,2]. Nevertheless, my summary is accessible by any computer science student and could be fun to read.
Machine learning (ML) is an emerging tool for solving computer systems and network knob problems. Polyjuice used Reinforcement Learning (RL) to find the best concurrency control algorithms for the given workload. I know by now you might think that ML in general is hype. You are right, but there are many knobs or hyperparameters that can differentiate the necessity of ML tools in the system work from the hypes. Polyjuice and similar kinds of works are good examples of using ML because the parameters are not trivial to guess/predict and adjust. Previous works tried to manually partition the workload and assigned a single concurrency control algorithm to each partition. This led to a lack of flexibility in customizing concurrency control to workload and systematic solutions to optimize concurrency control for better performance.
Table Of Contents
· The Problem
· The system settings of Polyjuice
· What should we learn?
· The Design of the Polyjuice Framework
· Their Evaluation
· My Questions
The Problem
The performance of a database system is greatly determined by its underlying concurrency control algorithm because it determines the performance degradation in presence of software contentions. Off course, there are many things that can hurt the performance like I/O bounds, network bounds, etc, but in distributed database systems, the concurrency control is the culprit of the performance degradation. Remember, a multi-core system is able to scale beyond what single-node DBMS can support. However, multi-node architecture introduces a new performance bottleneck because of distributed transactions (e.g. the coordination overhead of atomic commit protocols).
If you want to know more about what is 2PL, OCC, and MVCC, please check my previous blogs.
The system settings of Polyjuice
- Polyjuice is for an in-memory database system inside a single multicore machine but is not for a multi-node geo-distributed database system.
- The workload is not dynamic so the learning framework is trained before deployment. It is stored in a form of stored procedures. For example, if you have an SQL query that you write over and over again, save it as a stored procedure, and then just call it to execute it.
- MVCC has opted out of this work, which is the most famous in CC.
- Polyjuice’s workload support read-write transactions and snapshot-based read-only transactions. These are the type of transactions.
What should we learn?
The basic build block is RL: The key ingredients in RL are a policy which is a concurrency control (CC) algorithm, an action to be taken when a state is reached, a reward ( a throughput metric) that defines the optimization goal, and the environment (workload and system setting) under which the learning system operates.
Policy determines what actions to take in order to maximize performance for a given workload. A good CC policy balances how long transactions execute vs. how likely transactions are aborted, resulting in a high reward. Taking different actions in different states allows us to specialize a CC algorithm to optimize for a given workload.
Aside from the CC policy, how long a database backs off before retrying an aborted transaction can also affect the performance.
The policy actions can be classified into two categories:
- Actions that control interleaving transactions.
When executing transaction T, a CC algorithm has the following action choices:
1- Read Control: wait for dependencies and which version to read.
2- Write Control: wait for dependencies and whether or not to make this write visible to the future reads of other transactions. - Actions that control the validation of transactions
1- When to validate. A transaction may validate its accesses at any time during execution, instead of only at commit time. Early validation can abort a transaction quicker to reduce wasted work.
2- How to validate. An OCC-style validation that uses each transaction’s physical commit timestamp as its serialization order. Although such validation is conservative and has false aborts, it is fast.
A summary of CC Algorithms from the Polyjuice framework.
The Design of Polyjuice Framework
A very important design choice is to store the latest committed data and pre-object access list in the main memory. The access list consists of all uncommitted writes that have been made visible and read accesses. The purpose of the access list is to track the dependencies.
Below is Polyjuice’s system architecture.
Polyjuice represents each policy function as a table: there are as many rows in the policy table as there are different states; there are as many columns as there are action dimensions. Then, each cell contains either a binary number for a binary action ( e.g. make writes visible or not) or an integer for a multi-valued action (e.g. how to wait for dependent transactions).
To execute a transaction according to the policy, Polyjuice does the following:
- looks up the policy table at each data access to determine the corresponding set of actions.
- data access → Read/Write/Commit.
- Wait action is determined before data access. - Polyjuice commits a transaction after performing the final validation to ensure serializability.
Polyjuice’s state space contains the following information:
1. The type of transaction being executed. For a given workload whose transactions are specified in stored procedures, the type can be identified by the stored procedure name.
2. Which access of the transaction is being executed? We use an integer access-id to identify each access. Access-id is determined by the static code location that invokes the access. Using static information for access-id provides a good trade-off: it can discriminate most accesses while avoiding blowing up the state space.
Supported actions for interleaving control. There are three classes:
- Wait: This action is invoked before a read or write. How to specify how long the wait should be? First, group transactions by type and measure the execution progress of a transaction type by access-id. The special value NO_WAIT indicates no waiting. Suppose the wait action for transaction type X has access-id a, then transaction T must wait for all T’s dependent transactions of type X to finish execution up to and including a. For a workload with n different types of transactions, the wait action consists of n access ids, one for each transaction type.
- Read-version: This action has a binary choice:
1- CLEAN_READ for reading the latest committed version.
2- DIRTY_READ for reading the latest uncommitted (but visible) version. - Write-visibility: This action is invoked after the write access and is also binary:
1- PRIVATE keeps the write in the private buffer.
2- PUBLIC makes all private writes buffered so far visible by appending them to the access list.
Supported actions for validation:
- Validation happens before the commit and supports early validation after any reads/writes.
- Polyjuice supports the wait action before early validation. Upon failing early validation, Polyjuice retries the transaction from the point of its last successful validation. In order to reduce the cost of the failed validation, we defer appending reads and visible writes to their corresponding access lists until a successful early validation.
Polyjuice uses an OCC-style physical timestamp-based validation in the final commit phase to ensure correctness. To commit a transaction T with validation, a worker takes 4 steps:
- A worker waits for all T’s dependent transactions to commit (or abort).
- A worker locks each record in T’s write set.
- A worker validates each record in the read set by checking two conditions;
1. whether the version-id of the currently committed version in the database is different from that kept in the read set
2. Whether the record is being locked by another transaction. If either condition is true, T is aborted. - If validation succeeds, it applies T’s writes to the database along with their version-ids and releases the locks.
The polyjuice validation algorithm is identical to that of Silo except for two additional mechanisms which are crucial for correctness.
1. Polyjuice uses a unique version-id for committed as well as uncommitted versions, because the latter may be read from the access list.
2. Polyjuice adds the additional first step of waiting for T’s dependent transactions to finish committing.
For learning the backoff time, Polyjuice uses a separate backoff policy table. The rows (state space) of this table enumerate 3 dimensions:
1. The transaction type.
2. The status of the current execution (commit or abort).
3. The number of aborted attempts prior to the current execution with a fixed cutoff: 0, 1, or 2 to indicate whether there have been 0, 1, or 2+ aborts so far. The action space of the backoff policy table allows the worker adjusts the backoff time for each transaction type multiplicatively whenever it commits/aborts a transaction:
Now, let us look at how Polyjuice trains the model. Again, the goal is to find a good policy given the workload. Polyjuice uses offline training and uses Evolutionary algorithms for training.
Their Evaluation
Workloads are TPCC, TPCE, and a micro-tx with ten types.
Aborted transactions were eventually committed.
In the evaluation, there are five CC algorithms 2PL, OCC, IC3, Tebaldi, and CormCC.
The training has 300 iterations by default. After each iteration, the authors pick 8 policies from the current population and each policy generates 4 children. So the total per iteration is 40 policies.
There are only three read-write transactions and two read-only transactions that use the snapshot mechanism.
TPCC: Throughput and Latency:
The benefits of each action start with OCC.
A case study of three transactions with different actions.
Throughput under the different workloads.
My Questions
- When should uncommitted writes be made visible to other transactions?
- Is multi-node system network-bound or concurrency control bound? In a multi-core system, we blame the concurrency control bound while arguing that the disk-bound is not the problem.
Background Knowledge
There are three dependencies:
Read dependency specifies write-read conflicts; a transaction Tj depends on Ti if it reads Ti’s updates.
Anti-dependencies capture read-write conflicts; Tj anti-depends on Ti if it overwrites an object that Ti has read.
Write-dependencies capture write-write conflicts; Tj write-depends on
Ti if it overwrites an object that Ti has also modified.