Transaction Isolation — How Isolated is Isolated?
Database Transaction Isolation Levels — A Concurrency Bug
Running concurrent transactions at different isolation levels allows application developers to trade throughput for correctness. This blog post shows anomalies to defining database isolation levels.
In my previous organisation we ran into a tricky issue where the two bookings for the same room and the same hotel and at the same time are being happened. Although this happened only the fewest times and in this each case we’re able to manually correct it. Clearly, there was something wrong with the code which wasn’t even reproducible on developer’s machine and occurred very rare in the production environment.
I had already verified the block of code was running within a database transaction which should isolate it from other transactions.
After buckling down with the codebase, I was able to figure out that this is a concurrency bug where two concurrent request causing to step on each other’s toes.
The data flow to book a room goes something like this:
But how can this be ? Transactions are the key features for any RDBS that provides ACID guarantees. Shouldn’t the Isolation guarantee prevent two transactions from interfering with each other ?
How isolated is isolated?
Basically there are 3 types of phenomenon in SQL standard:
Dirty reads: Imagine a transaction has written some data to the database, but the transaction has not yet committed or aborted. Can another transaction see that uncommitted data? If yes, that is called a dirty read.
Dirty Writes: Imagine if two transactions concurrently try to update the same object in the database. We don’t know in which order the writes will happen, but we normally assume that the later will overwrite the earlier write. What happens if the earlier write is part of a transaction that has not yet committed, so the later write overwrites an uncommitted value?
Non-repeatable reads: Imagine a read in which data read twice inside the same transaction cannot be guaranteed to contain the same value.
Phantom reads/Write skew: Imagine if you read a collection of rows twice, is it possible that different rows will be returned the second time?
In SQL there are 4 levels of transaction isolation based on which they prevent these phenomenon.
Read Uncommitted: This is the weakest level of isolation in which transaction’s can see Insert or Update queries even before they are committed.
Read Committed: A basic level of transaction isolation that guarantees to prevent from dirty reads and dirty writes.
Repeatable Read: The idea is that each transaction reads from a consistent snapshot of the database that is the transaction sees all the data that was committed in the database at the start of the transaction. Even if the data is subsequently changed by another transaction, each transaction sees only the old data from that particular point in time. This prevents from dirty reads/writes and non-repeatable reads but allows phantom reads.
Serializable: This is the strongest level of transaction isolation. It guarantees that even though transactions may execute in parallel, the end result is the same as if they had executed serially without any concurrency. This prevents from all the phenomenon discussed above.
The default level of isolation in MySQL, Postgres & Oracle is Read Committed which we were using in our application. So, the problem here is definitely a race condition, if the two transactions had run one after another, the duplicate bookings for the same room and at same time would have been prevented. This behaviour is only possible because the transactions ran concurrently. This oddity is called write-skew.
The Solution
There seems to be 2 solutions for these: either you use row locks to prevent concurrent access to the rows or you use stricter transaction isolation. Although in both the cases you take a performance hit. In our case, we used serializable isolation to prevent the phenomena we were seeing. We also didn’t want to use row locks since it’s easy to miss a case where a lock is necessary and need to be used properly to prevent deadlocks.