Part 3: The Phantom Read
Imagine your transaction retrieves rows filtered by some criteria. A phantom read occurs when your transaction retrieves rows using the same criteria and new rows are returned. In this case, either the new rows you see have been inserted or data has changed in existing rows such that these rows now meet the filter criteria. Either way these rows are new to you.
From ISO/ANSI: Process P1 reads the set of rows N that satisfy some search condition. Process P2 then executes statements that generate one or more rows that satisfy the search condition. If P1 repeats the query it obtains a different collection of rows.
It’s cheesy, but I use the following example as a mnemonic. Imagine there’s a database of pop culture, and that transactions can last years: Then we have
use PopCultureDb SELECT Name FROM CarModels WHERE Manufacturer = 'Rolls Royce' -- over 1925 -- query would give new row: "The Phantom I" SELECT Name FROM TonyAwards WHERE AwardName = 'Best Musical' -- over 1988 -- query would give new row: "Phantom of the Opera" SELECT Name FROM Movies WHERE SagaName = 'Star Wars' -- over 1999 -- query would give new row "Star Wars: Episode I - The Phantom Menace"
In each example if a transaction lasted over the years indicated, the new rows are examples of phantom reads.
Phantom Reads vs. Non-Repeatable Reads
So what’s the difference between phantom reads and non-repeatable reads? This can cause confusion sometimes. Because they both have the property that a repeated query returns results that are different. But the distinction between phantom reads and non-repeatable reads lies in how the query results are different.
- The Non-Repeatable Read is a phenomena specific to a read of a single row. When data has changed in this row, and the row is read again, the changed data is returned. This is a non-repeatable read.
So what about a change that affects multiple rows? This certainly seems like a phenomenon that applies to more than one row. But actually, I think of this as just a set of non-repeatable reads where each non-repeatable read still only affects one row.
- The Phantom Read is a phenomenon that deals with queries that return sets. The thing that’s changing in a phantom read is not the data in the rows; it’s the set membership that has changed. *
Isolation Levels that Experience Phantom Reads
Just like non-repeatable reads, the levels READ COMMITTED and READ UNCOMMITTED can experience phantom reads. But REPEATABLE READ can experience phantom reads too. In fact the only isolation level that avoids phantom reads is SERIALIZABLE (and SNAPSHOT, but I’ll get to that in part 4).
*-For rows that have been deleted, if a transaction reads them (or rather fails to read them) it would seem that this is both a non-repeatable read and a phantom read. But for the purposes of the ISO/ANSI standard it is in fact considered a non-repeatable read.