Firebird for the Database Expert: Episode 5 - Locking and Record Versions
by Ann Harrison
Concurrency control is the mechanism that allows simultaneous users to read and write data as if each user had complete control of the database. This state of bliss is called "serializability". The state of the database after a group of concurrent transactions complete is the same as if each transaction ran alone in some unspecified order. Few, if any, database systems offer serializable transactions as their default mode.
Until recently, the most common concurrency control mechanism was locking. Of course, since transactions are imaginary electronic things, they don't actually put brass padlocks on the bits on a disk. Instead, the database system imposes a discipline on access to records, so each transaction's record use is recorded in memory and no transaction can conflict with another's noted level of use. Transactions acquire locks as they access records but never release any lock until they commit or rollback. The strategy of incrementally locking records and releasing all locks simultaneously at the end of the transaction is called two-phase locking.
Locking
Write locks prevent dirty writes
In a system that relies on locks for concurrency control, when a transaction modifies, inserts, updates, or deletes a record it gets a write lock on that record. Write locks are exclusive only one transaction can hold a write lock at any one time. That lock alone is sufficient to keep two transactions from changing the same record at the same time and satisfies the lowest generally recognized level of concurrency - no "dirty" writes. A dirty write could happen like this:
If the two updates run at the same time, the result without write locks could easily be that the employee gets either the promotion or the promotion, but not both.
Read locks
When a transaction in a locking database reads a record, it gets a read lock on that record. Read locks are compatible with other read locks, but not compatible with write locks. Read locks prevent dirty reads. A dirty read allows a transaction to see the results of an uncommitted concurrent transaction.
Consistent read
Transactions running alone in a database always see the same state of data, plus any changes they make themselves. That state is called "consistent read" if a transaction reads the same record twice, it sees the same data unless it changed the data itself. If a transaction running alone in a database reads all the records in a table once, it will see exactly the same number of records with the same contents the next time it reads the table, give or take changes it makes itself. Write and read locks alone do not produce consistent reads. Consider this case:
To insure that its reads are repeatable, Transaction A has to lock something more than the existing records, something more abstract. Those abstract locks are called predicate or existence locks, locks that keep something new from being added to a result set, either by inserting a new record or modifying an existing record so it meets the criteria for the result set.
Predicate locks can be implemented as locks on the access paths to records.
If the department is an indexed field, Transaction A would acquire a read lock on that part of the index that points to records for department Z. Then when Transaction B tried to create a new index entry for its record, it would find a conflicting lock and wait for A to complete and release its lock.
If the department field is not indexed, Transaction A acquires a read lock on the entire employee table including the ability to add new records to the table. No employee records can be inserted, updated, or deleted until Transaction A completes.
Serializability
Holding two-phase write, read, and predicate locks produces serializable transactions. However, it also produces large lock tables, contention, and deadlocks.
Lock table size
Even though locks are small temporary things, reading a few million records builds up a lot of locks. For that reason, most systems that use read locks employ strategies called Lock Promotion and Demotion.
Contention and Deadlocks
A major reporting transaction that hold two-phase read locks on records and access paths can easily block all writers from the database. In turn, those writers can hold locks that block reports, causing deadlocks. The end result is that performance is often worse when transactions run concurrently using two-phase serializable locking than would be if the transactions were actually run one at a time.
Multi-version Concurrency Control
Firebird uses record versions in place of write locks, read locks, predicate locks, and transaction logs. Using record versions for transaction recovery is described here.
Write locks - dirty writes
Every record version is tagged with the version of the transaction that created it. Every transaction knows what transactions are currently active. No transaction can update or delete a record whose most recent version is not committed. Dirty writes are impossible.
Read locks - dirty reads
Because records are tagged with their version and every transaction knows what transactions are currently active, no transaction can read a record version created by an active transaction. Dirty reads are impossible.
Repeatable read
Here the issue of transaction modes raises its ugly head. Firebird supports three orthogonal modes
- consistency/concurrency/read committed,
- wait/no wait,
- snapshot/no snapshot.
This paper describes the one true Firebird transaction: concurrency, wait, snapshot. Consistency transactions lock tables and are too boring to talk about. Read committed mode does not provide repeatable read because newly committed data becomes available to a running transaction. No wait transactions err as soon as then encounter any type of conflict. No snapshot transactions read only the most recently committed record, and are useful only with read committed mode.
A concurrency, wait, snapshot transaction always provides repeatable read. When the transaction starts, it creates a list of all transactions that were committed when it started, and when it encounters a record, it walks backward through the version until it finds a version whose transaction marker is on the committed list. Changes made by concurrent transactions are ignored.
Serializability
Unlike locking systems, a multi-generational concurrency control system can provide repeatable reads without being completely serializable. Here are two anomalies that affect Firebird.
Exchanges
An exchange occurs when two transactions use data from different records and apply change in inverse order. An example might help.
The problem is to be sure that all employees in the same job class have the same salary, regardless of gender. One solution is to read the records for men in each class and update the records for women with the salary from the men's records. Another, cost saving solution is to read the records for women and update the men's records to the salary from the women's records.
The result is that the salary gap is inverted, but still exists. That result could not occur if the two transactions ran separately. The transactions do not conflict because each record is modified only once. Changes made by the other transaction are not visible because when either transaction attempts to read a record that has been modified, it automatically reads the previous committed version.
The solution is to be aware of the possibility of this error and choose a specific order when copying data from one record to another.
Insert anomalies
Insert anomalies are another problem than can occur during concurrent data modifications. Consider this case:
Create table foo (f1 integer); Commit; Transaction A: insert into foo (f1) select count (*) from foo; Transaction B: insert into foo (f1) select count (*) from foo; Transaction A: insert into foo (f1) select count (*) from foo; Transaction B: insert into foo (f1) select count (*) from foo; Transaction A: insert into foo (f1) select count (*) from foo; Transaction B: insert into foo (f1) select count (*) from foo; Transaction A: commit; Transaction B: commit; Transaction A1: select f1 from foo order by f1; 0 0 1 1 2 2
Each transaction saw only its own changes, so each count ignored records stored by the other transaction. If the transactions were run serially, the results would have been:
0 1 2 3 4 5
The solution is to put a unique index on any data that might be stored containing the count (or max) of values in the table. Unique indexes are correctly enforced even when the transactions involved can not see each other's records.