SERIES: Concurrent Transaction Phenomena
In Books Online (BOL), Microsoft describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.
And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.
These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.
In the next few days, I’d like to illustrate each phenomenon:
- Part 1: The Dirty Read (reading tentative data)
- Part 2: The Non-Repeatable Read (reading changed data)
- Part 3: The Phantom Read (reading new data)
- Part 4: Serializable vs. Snapshot
Part 3: The Phantom Read
Imagine your transaction retrieves rows filtered by some criteria. A phantom read occurs when your transaction retrieves rows using the same criteria and new rows are returned. In this case, either the new rows you see have been inserted or data has changed in existing rows such that these rows now meet the filter criteria. Either way these rows are new to you.
From ISO/ANSI: Process P1 reads the set of rows N that satisfy some search condition. Process P2 then executes statements that generate one or more rows that satisfy the search condition. If P1 repeats the query it obtains a different collection of rows.
Example
It’s cheesy, but I use the following example as a mnemonic. Imagine there’s a database of pop culture, and that transactions can last years: Then we have
use PopCultureDb SELECT Name FROM CarModels WHERE Manufacturer = 'Rolls Royce' -- over 1925 -- query would give new row: "The Phantom I" SELECT Name FROM TonyAwards WHERE AwardName = 'Best Musical' -- over 1988 -- query would give new row: "Phantom of the Opera" SELECT Name FROM Movies WHERE SagaName = 'Star Wars' -- over 1999 -- query would give new row "Star Wars: Episode I - The Phantom Menace" |
In each example if a transaction lasted over the years indicated, the new rows are examples of phantom reads.
Phantom Reads vs. Non-Repeatable Reads
So what’s the difference between phantom reads and non-repeatable reads? This can cause confusion sometimes. Because they both have the property that a repeated query returns results that are different. But the distinction between phantom reads and non-repeatable reads lies in how the query results are different.
- The Non-Repeatable Read is a phenomena specific to a read of a single row. When data has changed in this row, and the row is read again, the changed data is returned. This is a non-repeatable read.
So what about a change that affects multiple rows? This certainly seems like a phenomenon that applies to more than one row. But actually, I think of this as just a set of non-repeatable reads where each non-repeatable read still only affects one row. - The Phantom Read is a phenomenon that deals with queries that return sets. The thing that’s changing in a phantom read is not the data in the rows; it’s the set membership that has changed. *
Isolation Levels that Experience Phantom Reads
Just like non-repeatable reads, the levels READ COMMITTED and READ UNCOMMITTED can experience phantom reads. But REPEATABLE READ can experience phantom reads too. In fact the only isolation level that avoids phantom reads is SERIALIZABLE (and SNAPSHOT, but I’ll get to that in part 4).
*-For rows that have been deleted, if a transaction reads them (or rather fails to read them) it would seem that this is both a non-repeatable read and a phantom read. But for the purposes of the ISO/ANSI standard it is in fact considered a non-repeatable read.