Michael J. Swart

March 23, 2010

Transaction Phenomena – Part 1: The Dirty Read

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft  describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read

A dirty read occurs when a transaction reads a record that has not yet been committed. In my own head, I think of this as reading tentative data.

From ISO/ANSI: One process (P1) modifies a row, and another process (P2) then reads that row before it is committed by P1. If P1 then rolls back the change, P2 will have read a row that was never committed and that may thus be considered to have never existed.

Mr. Bean shows the dangers (cue video to 1 minute 40 seconds):

By the way, I love that the BBC has put Mr. Bean clips on YouTube. In the clip, Mr. Bean cheats off his classmate who then proceeds to “rollback” his change by tossing his first answer attempt. If Mr. Bean had access to the test after it was handed in (i.e. committed), he would have made a more effective cheater.

Isolation Levels that Experience Dirty Reads

The only isolation level in which dirty reads are possible is READ UNCOMMITTED. You can also use a NOLOCK hint inside a SELECT query to achieve the same behavior.

My Own Experience

I’m not afraid of the occasional NOLOCK hint or the READ UNCOMMITTED level. It helps on tables that I know are static or tables that are growing (INSERTS and SELECTS but no DELETES or UPDATES). It can be pretty powerful when used correctly.
The NOLOCK  hint used to be my favorite deadlock killer. But lately, I’ve used it less often, especially when another solution is available.

Many others have written on this topic. Some important points:

  • Linchi Shea who points out that NOLOCK SELECT statements can still cause blocking when DDL statements are thrown in (such as the seemingly innocent DBCC REINDEX command statement).
  • Itzik Ben Gan who describes another NOLOCK danger. That is, data returned may not just be uncommitted, but inconsistent. i.e. missing or duplicated rows!

6 Comments »

  1. I have been faced with the last point repeatedly ( that nolock may result in data returned that may be inconsistent ) as people who struggle with the idea of persisting data relationally seem to be concerned by this. In my experience, i’m not afraid of this either, unless it’s a CRITICAL thing that the data be returned exactly as it would be in a committed state. In my view, the implementation ( or use of the results ) is flawed if this inconsistent state causes some grief.

    Most scenarios that I can fathom with regards to duplicate rows… could be simply a problem with the implementation. Potentially this scenario could be avoided with temporary “dirty bits”, that allow for data filtering or whatever. Further, whenever I’ve seen the inconsistent result sets cause issues, it’s due to developer error who EXPECTS that the result sets will have the same number of rows or whatever. ( ever heard of intersections? )

    I guess what i’m saying is that in my opinion it’s very rare cases that the “dirty read” cannot be incorporated.

    But my test for you is… is it possible that nolock is actually slower than if you were to retrieve the data in the regular committed state?

    Comment by Richard — March 25, 2010 @ 11:57 am

  2. Awesome comment Richard.
    In reply to your question, I don’t know off the top of my head whether nolock could be slower. The motivation behind nolock is almost always for increased throughput. i.e. less blocking and more concurrent work.

    I don’t know for sure whether NOLOCK could be slower. My guess before you asked the question is that it’s always faster. I would actually be surprised at a scenario where a NOLOCK is significantly slower. I’m all ears.

    Comment by Michael J. Swart — March 25, 2010 @ 12:15 pm

  3. The NOLOCK hint used to be my favorite deadlock killer. But lately, I’ve used it less often, especially when another solution is available.

    Please share the other solution as an alternative for No lock.

    Comment by neeraj tripathi — May 3, 2012 @ 3:34 pm

  4. That’s my thinking too neeraj.

    Comment by Michael J. Swart — May 3, 2012 @ 3:37 pm

  5. In addition to the obvious issue of returning inconsistent data in a resultset that NOLOCK (READ UNCOMMITTED) presents, other less commonly known issues are transient 601 and 605 errors that can occur when reading from data pages that have split and moved before the statement has completed.
    http://msdn.microsoft.com/en-us/library/bb326281(v=sql.105).aspx
    http://msdn.microsoft.com/en-us/library/aa337419(v=sql.105).aspx
    A data reading process under read uncommitted isolation can run for 1/2 hour or more, only to bomb out with an error, because another data writing process has moved a page out from under it. For critical ETL processes, it might be better just to allow occasional blocking than to deal with aborted steps.

    Comment by Eric Russell — May 4, 2012 @ 4:49 pm

  6. […] My own series on transaction isolation levels. […]

    Pingback by Visualizing Transaction Isolations For SQL Server | Michael J. Swart — June 13, 2012 @ 12:00 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress