"Transaction Isolation Level" | Michael J. Swart Michael J. Swart

April 6, 2010

Transaction Phenomena – Part 4: Serializable vs. Snapshot

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server 2005", "sql server 2008", "sql server", "Transaction Isolation Level", SERIALIZABLE, SNAPSHOT — Michael J. Swart @ 4:09 pm

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 4: Serializable vs. Snapshot

So I’ve finished talking about the types of transaction phenomena defined in the ISO/ANSI standard. There are two isolation levels that SQL Server supports which never experience any of these (no dirty, non-repeatable or phantom reads). They are SERIALIZABLE and SNAPSHOT. They are both made available in order to avoid dirty, non-repeatable or phantom reads, but they do so using different methods. Understanding both is the key to being able to decide whether these are right for your application.

SERIALIZABLE

Serializable is the most isolated transaction level. Basically when a transaction reads or writes data from the database, that’s what it’s going to be until the end of the transaction:

From ISO/ANSI: [Execution of concurrent SERIALIZABLE transctions are guaranteed to be serializable which is] defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

So that’s it! SERIALIZABLE transactions see database data as if there were no other transactions running at the same time. So no dirty, phantom or non-repeatable reads (but maybe some blocking).

It’s interesting that the standard defines SERIALIZABLE as the default level. Microsoft doesn’t subscribe to that notion and makes READ COMMITTED the default level.

The SERIALIZABLE level prevents phantom reads by using range locks. Which I explain at the end of this article.

SNAPSHOT

SNAPSHOT transactions avoid phantom reads, dirty reads and non-repeatable reads, but they do it in quite a different way than SERIALIZABLE transactions do.

While SERIALIZABLE uses locks, instead SNAPSHOT uses a copy of committed data. Since no locks are taken, when subsequent changes are made by concurrent transactions, those changes are allowed and not blocked.

So say you’re using SNAPSHOT transactions and you finally decide to make a change to some data. As far as you know, that data hasn’t changed from the first time you looked at it. But if that data has been changed elsewhere then you’ll get this error message:

Msg 3960, Level 16, State 4, Line 1
Snapshot isolation transaction aborted due to update conflict. You
cannot use snapshot isolation to access table 'dbo.test' directly or
indirectly in database 'snapshottest' to update, delete, or insert
the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the
update/delete statement.

What this update conflict error message is trying to convey is exactly the same concept as Wikipedia’s Edit Conflict error message. Except that Wikipedia explains it better. I suggest looking there.

ANSI-SQL’s SNAPSHOT Definition

There isn’t one. The SNAPSHOT isolation level I’m talking about is a Microsoft thing only. They’re useful, but definitely not part of the SQL standard.

It’s not too hard to see why. The SNAPSHOT isolation level permits the database server to serve data that is out of date. And that’s a big deal! It’s not just uncommitted. It’s old and incorrect (consistent, but incorrect).

Some people place a greater value on consistency rather than timely and accurate. I think it’s nice to have the choice.

Bonus Appendix: Range Locks.

(I was tempted to break out this appendix into its own blog post but ulitmately decided not to.)

So SERIALIZABLE transactions take range locks in order to prevent Phantom Reads. It’s interesting to see what range of values is actually locked. The locked range is always bigger than the range specified in the query. I’ll show an example.

Say we have a table storing integers and insert 6 even numbers:

CREATE TABLE MyTest
(
	id INT PRIMARY KEY
);
INSERT MyTest VALUES (2), (4), (6), (8), (10), (12);

Now lets read a range:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 3 AND 8;
-- do concurrent stuff here.
COMMIT

Books OnLine says: “The number of RangeS-S locks held is n+1, where n is the number of rows that satisfy the query.” We can verify this by looking at sys.dm_tran_locks. I’ve shown the locks that are taken above. Range locks apply to the range of possible values from the given key value, to the nearest key value below it.

You can see that the “locked range” of [2..10] is actually larger than the query range [3..8]. Attempts to insert rows into this range will wait.

What happens if we select a range containing all rows?

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 13;
-- do concurrent stuff here
COMMIT

You can see that everything is selected. That lock at “infinity” has a resource_description value of (ffffffffffff).

Last interesting bit. Ghost records can participate in these ranges!

DELETE MyTest;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 10;
-- do concurrent stuff here
COMMIT

Concurrent transactions are able to insert values above 12, but will wait for values below that!

-- Comments (6)

March 25, 2010

Transaction Phenomena – Part 2: The Non-Repeatable Read

Filed under: Data Cartoons,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", "Transaction Isolation Level", non-repeatable read, Read committed — Michael J. Swart @ 10:31 am

SERIES: Concurrent Transaction Phenomena

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 2: The Non-Repeatable Read

The non-repeatable read occurs when a transaction re-reads a single record and finds that it has been changed or deleted.

From ISO/ANSI: Process P1 reads a row. Process P2 then modifies or deletes that rows and commits the change. If P1 rereads the row it receives the modified value or discovers the row has been deleted.

Changing Data
I like to think of Non-Repeatable Reads as being about reading data that has changed.

Changing Data

(I apologize in advance to my friends at work whose reaction I know will be “Boo”.)

But Non-Repeatable reads are not always a bad thing. In fact they’re often harmless or even required. When data has changed, you have to decide whether you want consistent data or whether you want current data. If you need to rely on consistent data, then a non-repeatable read is detrimental. Here is an example of a non-repeatable read that causes problems:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
 
	IF EXISTS(
		SELECT 1
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23')
	BEGIN
 
		-- imagine a concurrent transaction
		-- sells seat B23 here and removes the record
		-- from table Theatre.AvailableSeats
 
		-- then no rows are returned here:
		SELECT price
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
		-- and zero rows are affected here:
		DELETE FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
	END
COMMIT

Isolation Levels that Experience Non-Repeatable Reads

Just like dirty reads, this phenomenon is possible when using the READ UNCOMMITTED isolation level, but it is also possible when using the READ COMMITTED isolation level. And in SQL Server READ COMMITTED is the default isolation level for transactions.

My experience
To avoid this phenomena, you could use isolation level REPEATABLE READ. But often I find it’s possible to rewrite a transaction and stick with the default READ COMMITTED.

-- Comments (11)

March 23, 2010

Transaction Phenomena – Part 1: The Dirty Read

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "Dirty Reads", "NOLOCK" "READ UNCOMMITTED", "sql server", "Transaction Isolation Level" — Michael J. Swart @ 12:00 pm

SERIES: Concurrent Transaction Phenomena

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 1: The Dirty Read

A dirty read occurs when a transaction reads a record that has not yet been committed. In my own head, I think of this as reading tentative data.

From ISO/ANSI: One process (P1) modifies a row, and another process (P2) then reads that row before it is committed by P1. If P1 then rolls back the change, P2 will have read a row that was never committed and that may thus be considered to have never existed.

Mr. Bean shows the dangers (cue video to 1 minute 40 seconds):

By the way, I love that the BBC has put Mr. Bean clips on YouTube. In the clip, Mr. Bean cheats off his classmate who then proceeds to “rollback” his change by tossing his first answer attempt. If Mr. Bean had access to the test after it was handed in (i.e. committed), he would have made a more effective cheater.

Isolation Levels that Experience Dirty Reads

The only isolation level in which dirty reads are possible is READ UNCOMMITTED. You can also use a NOLOCK hint inside a SELECT query to achieve the same behavior.

My Own Experience

I’m not afraid of the occasional NOLOCK hint or the READ UNCOMMITTED level. It helps on tables that I know are static or tables that are growing (INSERTS and SELECTS but no DELETES or UPDATES). It can be pretty powerful when used correctly.
The NOLOCK hint used to be my favorite deadlock killer. But lately, I’ve used it less often, especially when another solution is available.

Many others have written on this topic. Some important points:

Linchi Shea who points out that NOLOCK SELECT statements can still cause blocking when DDL statements are thrown in (such as the seemingly innocent DBCC REINDEX command statement).
Itzik Ben Gan who describes another NOLOCK danger. That is, data returned may not just be uncommitted, but inconsistent. i.e. missing or duplicated rows!

-- Comments (6)