March, 2010 | Michael J. Swart

March 29, 2010

Transaction Phenomena – Part 3: The Phantom Read

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:57 am

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 3: The Phantom Read

Imagine your transaction retrieves rows filtered by some criteria. A phantom read occurs when your transaction retrieves rows using the same criteria and new rows are returned. In this case, either the new rows you see have been inserted or data has changed in existing rows such that these rows now meet the filter criteria. Either way these rows are new to you.

From ISO/ANSI: Process P1 reads the set of rows N that satisfy some search condition. Process P2 then executes statements that generate one or more rows that satisfy the search condition. If P1 repeats the query it obtains a different collection of rows.

Example

It’s cheesy, but I use the following example as a mnemonic. Imagine there’s a database of pop culture, and that transactions can last years: Then we have

use PopCultureDb
 
SELECT Name
FROM CarModels
WHERE Manufacturer = 'Rolls Royce'
-- over 1925
-- query would give new row: "The Phantom I"
 
SELECT Name
FROM TonyAwards
WHERE AwardName = 'Best Musical'
-- over 1988
-- query would give new row: "Phantom of the Opera"
 
SELECT Name
FROM Movies
WHERE SagaName = 'Star Wars'
-- over 1999
-- query would give new row "Star Wars: Episode I - The Phantom Menace"

In each example if a transaction lasted over the years indicated, the new rows are examples of phantom reads.

Phantom Reads vs. Non-Repeatable Reads

So what’s the difference between phantom reads and non-repeatable reads? This can cause confusion sometimes. Because they both have the property that a repeated query returns results that are different. But the distinction between phantom reads and non-repeatable reads lies in how the query results are different.

The Non-Repeatable Read is a phenomena specific to a read of a single row. When data has changed in this row, and the row is read again, the changed data is returned. This is a non-repeatable read.
So what about a change that affects multiple rows? This certainly seems like a phenomenon that applies to more than one row. But actually, I think of this as just a set of non-repeatable reads where each non-repeatable read still only affects one row.
The Phantom Read is a phenomenon that deals with queries that return sets. The thing that’s changing in a phantom read is not the data in the rows; it’s the set membership that has changed. *

Isolation Levels that Experience Phantom Reads

Just like non-repeatable reads, the levels READ COMMITTED and READ UNCOMMITTED can experience phantom reads. But REPEATABLE READ can experience phantom reads too. In fact the only isolation level that avoids phantom reads is SERIALIZABLE (and SNAPSHOT, but I’ll get to that in part 4).

*-For rows that have been deleted, if a transaction reads them (or rather fails to read them) it would seem that this is both a non-repeatable read and a phantom read. But for the purposes of the ISO/ANSI standard it is in fact considered a non-repeatable read.

-- Comments (8)

March 26, 2010

Listing Tables Ordered By Dependency

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 4:47 pm

Short post for the Weekend.

There’s a number of use cases that may require you to see a list of database tables ordered by dependency.

For example, you may want to know in which order to create tables if you’re dealing with a scripting project. (On the flip side, if you’re looking to delete tables, reverse the order).

I came up with a pretty script that lists the tables you care about in order of how they can be created (i.e. create tables at level zero before moving on to level one tables).

WITH myTableList AS
(
	select object_id, name
	from sys.tables
	--where name like 'stuffICareAbout%'
),
myForeignKeyList AS
(
	select fk.parent_object_id, fk.referenced_object_id
	from sys.foreign_keys fk
	join myTableList mtl_parent
		on mtl_parent.object_id = fk.parent_object_id
	join myTableList mtl_referenced
		on mtl_referenced.object_id = fk.referenced_object_id
),
TablesDependencies AS
(
	SELECT name AS TableName,
		object_id AS TableID,
		0 AS level
	FROM myTableList 
 
	UNION ALL
 
	SELECT mtl.name, mtl.object_id, td.level + 1
	FROM myTableList mtl
	JOIN myForeignKeyList mfk
		ON mfk.parent_object_id = mtl.object_id
		AND NOT mfk.parent_object_id = mfk.referenced_object_id
	JOIN TablesDependencies td
		ON mfk.referenced_object_id = td.TableID
)
SELECT TableName, MAX(level) as level
FROM TablesDependencies
GROUP BY TableName
ORDER BY level, TableName

Modify this to suit your needs, i.e:

Modify the where clause to filter only the tables you care about.
Modify the fields to report on schema if it’s something you care about

Cheers

-- Comments (0)

March 25, 2010

Transaction Phenomena – Part 2: The Non-Repeatable Read

Filed under: Data Cartoons,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", "Transaction Isolation Level", non-repeatable read, Read committed — Michael J. Swart @ 10:31 am

SERIES: Concurrent Transaction Phenomena

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 2: The Non-Repeatable Read

The non-repeatable read occurs when a transaction re-reads a single record and finds that it has been changed or deleted.

From ISO/ANSI: Process P1 reads a row. Process P2 then modifies or deletes that rows and commits the change. If P1 rereads the row it receives the modified value or discovers the row has been deleted.

Changing Data
I like to think of Non-Repeatable Reads as being about reading data that has changed.

Changing Data

(I apologize in advance to my friends at work whose reaction I know will be “Boo”.)

But Non-Repeatable reads are not always a bad thing. In fact they’re often harmless or even required. When data has changed, you have to decide whether you want consistent data or whether you want current data. If you need to rely on consistent data, then a non-repeatable read is detrimental. Here is an example of a non-repeatable read that causes problems:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
 
	IF EXISTS(
		SELECT 1
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23')
	BEGIN
 
		-- imagine a concurrent transaction
		-- sells seat B23 here and removes the record
		-- from table Theatre.AvailableSeats
 
		-- then no rows are returned here:
		SELECT price
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
		-- and zero rows are affected here:
		DELETE FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
	END
COMMIT

Isolation Levels that Experience Non-Repeatable Reads

Just like dirty reads, this phenomenon is possible when using the READ UNCOMMITTED isolation level, but it is also possible when using the READ COMMITTED isolation level. And in SQL Server READ COMMITTED is the default isolation level for transactions.

My experience
To avoid this phenomena, you could use isolation level REPEATABLE READ. But often I find it’s possible to rewrite a transaction and stick with the default READ COMMITTED.

-- Comments (11)

March 23, 2010

Transaction Phenomena – Part 1: The Dirty Read

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "Dirty Reads", "NOLOCK" "READ UNCOMMITTED", "sql server", "Transaction Isolation Level" — Michael J. Swart @ 12:00 pm

SERIES: Concurrent Transaction Phenomena

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 1: The Dirty Read

A dirty read occurs when a transaction reads a record that has not yet been committed. In my own head, I think of this as reading tentative data.

From ISO/ANSI: One process (P1) modifies a row, and another process (P2) then reads that row before it is committed by P1. If P1 then rolls back the change, P2 will have read a row that was never committed and that may thus be considered to have never existed.

Mr. Bean shows the dangers (cue video to 1 minute 40 seconds):

By the way, I love that the BBC has put Mr. Bean clips on YouTube. In the clip, Mr. Bean cheats off his classmate who then proceeds to “rollback” his change by tossing his first answer attempt. If Mr. Bean had access to the test after it was handed in (i.e. committed), he would have made a more effective cheater.

Isolation Levels that Experience Dirty Reads

The only isolation level in which dirty reads are possible is READ UNCOMMITTED. You can also use a NOLOCK hint inside a SELECT query to achieve the same behavior.

My Own Experience

I’m not afraid of the occasional NOLOCK hint or the READ UNCOMMITTED level. It helps on tables that I know are static or tables that are growing (INSERTS and SELECTS but no DELETES or UPDATES). It can be pretty powerful when used correctly.
The NOLOCK hint used to be my favorite deadlock killer. But lately, I’ve used it less often, especially when another solution is available.

Many others have written on this topic. Some important points:

Linchi Shea who points out that NOLOCK SELECT statements can still cause blocking when DDL statements are thrown in (such as the seemingly innocent DBCC REINDEX command statement).
Itzik Ben Gan who describes another NOLOCK danger. That is, data returned may not just be uncommitted, but inconsistent. i.e. missing or duplicated rows!

-- Comments (6)

March 8, 2010

I/O Operations, Arguably the Only Performance Metric You Need

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", i-o operations, top i/o queries — Michael J. Swart @ 11:43 pm

Measuring Performance

Mike Walsh is hosting T-SQL Tuesday this month. And the topic is Input and Output (I/O).

The thing about I/O is that it’s very much a hardware topic and so the category is very very suited for database administrators. The topic is maybe a little more suited to them than us database developers.

So it’s natural to ask: What is it (if anything) about I/O that db developers should know?

I/O is THE Metric For Measuring Performance

Well, here’s my thesis: I/O is the principal metric to watch. Any other metric – such as duration, CPU cycles, network lag etc… – don’t matter nearly as much as I/O.

Here’s why:

(Solid State Drives excepted) Disk operations are one of the few things databases do that rely on moving parts. As such they are slooow (relatively speaking). Save the I/O, save the world.
In theory, OLTP databases should get by with minimal I/O. If particular queries are performing many reads or writes chances are that the query can be improved.
Is it really a memory issue? You’re worried about memory. Low page life expectancy is usually a symptom of too little memory. But if this value nosedives often, it could be in conjunction with a I/O-intensive query.
What about measuring duration? Duration should be treated like a symptom, not a cause. Arguably, excessive I/O is a symptom too, but it’s often one step closer to the cause than duration.
Ease off on the CPU-heavy stuff. I’m a big fan of letting the database server serve data and of letting the application do the thinking. I work in a very developer-centric environment and it’s great: Everyone is comfortable with letting the app do any heavy CPU work.
I’ve found it convenient to deal with only one ruler. I’ve used I/O as my only metric for a few years now and I’ve been extremely happy with the results. (Well, not quite my only metric, but my TOP 20 I/O queries and my TOP 20 CPU queries have about 18 queries in common).
I plan to re-evaluate things if/when Solid State Drives have their day in the sun. It’s not too far into the (exciting) future.

As it turns out, as I write this, the first I/O articles are coming in and it seems that Rob Farley seems to have written a very thorough article about I/O as it relates to the cost of a query. He points out that we shouldn’t ignore other metrics in favor of I/O alone. I may be lucky, but I’ve never seen a performance problem in production that did not have the symptom of high I/O.

Bonus Post Script: My Top 20 I/O Query

Others have created these queries in the past. You can find them everywhere. This one’s mine. It’s a tool that’s been on my belt for a number of years.

DECLARE @SystemIO FLOAT
SELECT @SystemIO = SUM(total_logical_reads + total_logical_writes)
FROM sys.dm_exec_query_stats;
 
SELECT TOP 20 [Row Number] = ROW_NUMBER() OVER (ORDER BY total_logical_reads + total_logical_writes DESC),
    [Query Text] = CASE
        WHEN [sql_handle] IS NULL THEN ' '
        ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
            (CASE
                WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(MAX),ST.text)) * 2
                ELSE QS.statement_end_offset
                END - QS.statement_start_offset) / 2))
        END,
    [Execution Count] = execution_count,
    [Total IO] = total_logical_reads + total_logical_writes,
    [Average IO] = (total_logical_reads + total_logical_writes) / (execution_count + 0.0),
    [System Percentage] = 100 * (total_logical_reads + total_logical_writes) / @SystemIO,
    [Object Name] = OBJECT_NAME(ST.objectid),
    [Total System IO] = @SystemIO,
    [SQL Handle] = [sql_handle]
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text ([sql_handle]) ST
WHERE total_logical_reads + total_logical_writes > 0
ORDER BY [Total IO] DESC

-- Comments (11)

March 3, 2010

Start learning about SQL Server

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 9:58 am

Takeaway: I provide a list of free or near-free resources that novices can look to for an introduction to SQL Server.

I’ve often written articles about topics that are new or interesting to me. What I mean is that I’ve often avoided writing about the basics (e.g. what tables, row or columns are. What a SELECT statement is). A lot of this attitude is from feedback I got once. Someone told me once that most of my articles fell into one of two categories: The category knew that already or the category don’t care. So I’ve always tried to write articles that fall in the category relatively novel or the category relatively relevant*.

But there’s a huge need for 101 level information. I’m thinking about new developers. And I’m thinking about people just getting started in the I.T. industry. This post is for you guys.

But why re-invent the wheel? I’m just going to stand on the shoulders of giants and point you the best sites (in my opinion) for getting started. The best part is that at this level, this information will often be relevant for years. For example, compare the usefulness of a Notification Services article versus that of a SELECT statement article.

T-SQL 101 articles by SQL Server Magazine

Biggest advantage: Tailored to the greenest SQL Developer.

This series is a set of 10 articles written by Bill McEvoy in 2008.

By far the most comprehensive kick start I’ve come across. The only downside is that it’s not free. Personally, I use Google for my table of contents on this series. But the last article http://www.sqlmag.com/article/tsql3/t-sql-101-stored-procedures.aspx has a nice set of links to the previous lessons as well.

(Update May 2010: There’s a $16 downloadable version of this content now at http://sqlmag.com/go/left-brain/tsql)

Pinal Dave’s site SQL Authority

Biggest Advantage: Breadth of topics.

Pinal Dave is a SQL Server MVP who is one of the most prolific bloggers I know. He never shies away from any topic (complicated or uncomplicated).

He’s organized his web site nicely, but I like to browse his intro to topics found here.

SQL University

Biggest Advantage: Comprehensive and good start for SQL admins.

Remember those giants I told you about? The ones whose shoulders I’m standing on? Jorge Segarra (aka SQL Chicken) is one of those guys. He came up with the idea of SQL Server University. A great home base for getting started as a DBA. There is little overlap between this info and the T-SQL 101 articles because of the target audience. T-SQL 101 is aimed at new SQL developers, while SQL University is aimed at new DBAs.

What do you think?

Am I missing any good beginner resources? Leave a comment below. I’ll consider updating this article to include it.

* relatively relevant I love the sound of that phrase. Must try to use it more.

-- Comments (4)