Michael J. Swart

August 17, 2010

Microsoft Connect From Our Point of View

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 12:00 pm

I first heard about Microsoft Connect in early 2008. I was at a conference and a speaker from Microsoft touted the benefits of the site. It’s a site that allows users to submit feedback on Microsoft Products (which include SQL Server of course). We were told that every issue on the site was taken very seriously and that stale or unhandled issues get escalated and are treated like a big deal. I remember feeling excited at having this direct line to Microsoft’s product teams.

Today Buck Woody (of Microsoft) posted another article pushing connect.microsoft.com. He’s had other posts on this topic before and thinks it’s a great thing.

Connect as Seen By Microsoft

And it is a great thing especially for Microsoft. They get great feedback:

  • They get bug reports which help improve quality of the released product tremendously.
  • They get suggestions which go into new features of the product. It helps get a sense of what users are really hoping to see.

Connect as Seen By Users

And it’s a great for users because we know we have a say in the product.

Except when we don’t.

There are too many issues where the submitter doesn’t know what Microsoft’s plans are with the issue.

Maybe here’s a reason. They’re hasn’t been a change to the database engine or Management Studio since 2008. And it looks like we’re about a year away from the next one. So any suggested features or fixes submitted for that time are still pending.

And that’s fine, but it means that Microsoft’s only feedback to us is inside the issue itself. Often an issue will have little or no feedback (349116, 361832) and then our perception that we have a direct line to Microsoft disappears.

But those are two isolated cases right? Yes and No. In one of my very first blog posts two years ago How useful is connect.microsoft.com, I took an arbitrary sample of issues. Today, two years later, I look at the same sample*, and I see that 41% still have an “active” status. I’ll repeat what I said in that post:

“If MS had a better track record than that, connect.microsoft.com would be seen as a real place to be heard. That would encourage even more feedback.”

It’s Not a Perfect System

Buck Woody says:

No, it’s not a perfect system, but it’s more than I’ve seen at most software vendors I deal with.

I agree with that. It is a lot better than other software vendors. Oracle is definitely behind in this respect. As far as I can tell their only feedback is their regular support desk. But all that means is that Microsoft is the leader in the race in which no one else seems to be running. And Microsoft could do even better.

What I’d Like to See Done

I don’t want to downplay the good work that Microsoft has done. When we’re told that “Microsoft really does look at these issues” I tend to believe it. I am confident that the SQL Server product teams are putting these features and fixes into the product as we speak**.

But Microsoft should tell us about it. Update the stale/active issues. Mark it as “not reproducible” or “we’ll consider it” or even “won’t fix” you won’t hurt our feelings. In fact we’ll feel better because we’ll feel listened to.

* The sample I took was issues submitted on the 1st and 2nd of May in 2008.

** Some evidence of product fixes via Todd McDermid.

July 29, 2010

Tricks with User Settable Perfmon Counters

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 3:52 pm

So there’s this underrated SQL Server feature called User Settable Counters and they help SQL Server developers create our own custom perfmon counters.

You use them by using syntax like the following:

EXEC sp_user_counter1 @newvalue

This updates a counter than can be tracked using windows performance monitor etc…

Trick 1

Unfortunately this is not a counter that can be incremented. You have to know the new value to set. It would be great if we had stored procedures  like sp_increment_counterX so that we could simply increment the existing value of the counter by any given value. Well ask and you shall receive!

USE master
GO
 
CREATE PROCEDURE sp_increment_counter1
	@value INT
AS
 
	DECLARE @newValue INT
	SELECT @newValue = @value + cntr_value
	FROM sys.dm_os_performance_counters
	WHERE instance_name = 'User counter 1'
	and [object_name] like '%User Settable%'
	and counter_name = 'Query'
 
	EXEC sp_user_counter1 @newValue
GO

I like this stored procedure because it’s very light weight:

  • The cpu/IO/memory is negligible. Update Feb 8, 2011: Woah! On a performance test, this sproc is seen to take a lot of cpu!
  • It doesn’t take or hold any significant locks.

Trick 2

If you have a lot of spare time, you can the commands sp_user_counter combined with WAITFOR to make some pretty pictures of your favorite skyline!

Toronto Skyline

Like the Toronto Skyline

Based on this this photo from BriYYZ (at Flickr):

Toronto skyline

Toronto skyline

July 12, 2010

Learning Objectives vs. SQL Server

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 9:00 pm

T-SQL Tuesday 008Takeaway: Choose learning activities that increase understanding and not just knowledge.

So It’s T-SQL Tuesday time again. This month it’s hosted by Robert Davis and the topic is Learning. It’s a topic of particular interest to me. I work in a field that writes software for education and now I’m writing about education on software. Here are the ways I learn (BTW, The methods are ordered from shallowest to deepest in terms of understanding)

Knowledge: (Ability to Remember)

  • Attending Conference Sessions
  • Watching webinars
  • Reading (BOL/Books/Blogs/Magazines)

Comprehension: (Ability to Understand)

All of the above… plus:

  • Getting help from StackOverflow and #sqlhelp
  • Conference Sessions (Q&A time). I mentioned conference sessions above, but here, I’m calling out the Q&A time.
  • User Group Discussions

Application: (Ability to Use)

All of the above… plus:

  • Troubleshooting: The best knowledge and understanding comes from troubleshooting, especially in crisis situations. This hard-fought knowledge doesn’t get forgotten.
  • Exploring a new feature installed on a machine.
  • Talking to consultants or mentors: Don’t underestimate this. I often felt like the SQL Server clinic (drop-in consulting provided by Microsoft’s CSS team) at the pass summit were underused.
  • Hands-on Labs.
  • Receiving code reviews.

Analysis: (Ability to Evaluate/Create)

With enough experience, you reach Nirvana*. You’ll know you’ve arrived if you can:

  • Answer questions from StackOverflow and #sqlhelp
  • Conduct code reviews
  • Give Talks/Sessions/Workshops on a topic

* Nirvana comes from Seattle. Coincidence?

But why should you care?

It’s useful to know which methods are most effective. Some lessons we could take from above is that it’s more effective to learn by doing than to attend training. Which kind of makes sense. Being thrown in the deep-end is often more effective swimming instruction than to read about swimming.

And as SQL Server professionals we’ve already got a leg up on academia! Our motivations for learning are often more immediate and pressing than “gotta study for the final”.

It’s my hope that by recognizing what kind of learning objectives you could achieve with each approach, you can make better choices about learning like:

  • Starting in support is an awesome way to shorten the learning curve.
  • The Q&A part of sessions stick in my head more than any other (especially if I asked the question).
  • Have management studio open (or BIDS or whatever) when reading blog articles. Believe me when reading Brad Shulz‘s blog, his examples sink in a lot better this way.
  • When way out of your depth. Consider hiring a consultant: It’s fixed cost, you get your problem solved, and you can learn something too.

June 29, 2010

Did You Just Ask Me What I Did This Weekend?

Filed under: SQLServerPedia Syndication — Tags: , , , — Michael J. Swart @ 4:30 pm

Before I answer that question, let me first respond to a different question:

“Michael, Could you show us an image that tells us how much you’ve written, while simultaneously reminding us what-state-of-the-art graphics was like in the early nineties?”

Why certainly!

100th Blog Post

100th Blog Post

I’ll wait ’til the applause dies down

No seriously. Thank you people. We’ll be here all night if you don’t stop clapping.

Thanks again …  (love you too)

<ahem>

So what did I do this weekend?

Road Trip to SQL Saturday 42!

I hopped in my car and drove down to Columbus Ohio to attend SQL Saturday #42. It was a 7 hour drive not counting border-crossings and totally worth it. I already talked about my expectations and motivation in a previous post and I was not disappointed.  (Google maps route for the trip).

Walking in the Door

The minute I walked in I was welcomed by name by Dave Schutz (@DaveSchutz). I actually met Dave Schutz last year. Not through twitter or through blogs or by any other social network. I met Dave on a bus from the airport in Seattle last November on the way to the PASS summit. A very cool kick-off to that week.

Dave actually helped organize SQL Saturday in Columbus Ohio. From what I understand he was a very busy guy. I wish I could have chatted with him a bit more.

Waiting for the Talks to Begin

While waiting for talks to begin I talked with a lot of people and I eavesdropped on others. I found it interesting that (without fail) conversations went from awkward smalltalk to awesomeness the second the topic got around to any topic that was remotely technical. A sociologist might have something to say about that.

Sessions

Matt Hester started off with the keynote. He’s an evangelist for Microsoft and he talked about the Microsoft’s relationship with the cloud. But it didn’t feel like a commercial at all. A lot of it felt like a Discovery Channel show. (Did you know Microsoft designs data centers without roofs?)

Dave Rodabaugh was up next. I attended two sessions with this guy. He talked about Business Intelligence and how to do it right. He’s an extremely competent speaker and it’s obvious that he knows his stuff. One interesting fact I learned was that Gartner claims that half of Business Intelligence projects fail. BI is hard to do right.

What I liked best about Dave’s speaking style is that he doesn’t sit on the fence about anything. He gives hard rules to follow and his answers don’t contain “it depends”. But he also tells us that he breaks his own rules. In fact I got the impression that he’s broken all of them at one point or another. But he gives this warning about breaking those rules: “You can create your own recipes after you’ve learned to cook”.

Louis Davidson (@DrSql) then talked about Database Design Patterns. He blogged about that talk from his own point of view. Louis did a great job with this talk and I got to ask him some technical questions afterwards.

Jeremiah Peschka (@peschkaj) talked about Free Performance Boosts. He also blogged about that talk from his own point of view. In his article he wonders why this talk was so well attended. I think it’s a combination of two things. The title promises a sort of get-rich-quick deal. And the content of the talk actually delivered on that. There was absolutely no filler in that talk. It was all useful. A couple of the tips I already knew, but it was still nice to have that knowledge affirmed.

Skipping a Session: I skipped the last session to continue conversations. (protip: don’t feel guilty about this!)

Suggestions for SQL Saturday in General

  • There’s the problem of people signing up and not attending. It causes problems for planners and wait-list attendees (i.e. me). I’m not sure how many people decline to sign up because a Sql-Saturday is “full” but one is too many. Solutions? I’m not sure… overbooking maybe?
  • Lunch time sessions. I know that having a vendor session in the cafeteria is good for vendors. But it really cuts down on networking time. (I need to find a better word than “networking” it seems too much like marketing). I put this down to price-of-admission.
  • If a speaker doesn’t show up. Don’t panic. It’s not the end of the world to have three sessions in a time-slot versus four. From an attendee point of view, it’s not too much different. It was fortunate that people were willing and able to fill in.

Columbus, Ohio in General

The people were so so so friendly. I wanted to bring them all back with me to Canada. I get this impression from my small sample size consisting of SQL Saturday attendees, zoo employees and Ohio restaurant staff. Speaking of which: A huge thanks to the morning staff at Bob Evans on Olentangy River Road. They were extra friendly and helpful.

One Regret

I wish I brought my SQL Server Deep Dives book in order to have two more authors sign them, (@DrSql and @SqlRunr)

June 3, 2010

Keeping Track Of Root Nodes in a Hierarchy

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 2:53 pm

Takeaway: I explain an interesting problem I encountered. I explain how I maintain data for a table that stores hierarchy info. The problem might not be relevant to you dear reader, but the technique might.

The Problem

So say that you’re keeping track of a hierarchy in a particular table and to make queries faster, you’ve decided to store the a key to the root level node as a column in each row.

Constraints: For whatever reason, the data in this column is untrustworthy and it’s our job to fix it! It’s a huge table, most of the data is okay, and we’d like to fix everything without blocking too much of the table.

Table Description

  • You’re keeping track of a hierarchies in a table using a foreign key to itself
  • Parent nodes are indicated using a ParentId column (see table below):
  • Root nodes are indicated when ParentId is NULL.
  • BaseObjectId indicates the root node of hiearchies (i.e. this equals ObjectId when ParentObjectId is NULL)
  • You’re using SQL 2005 and you can’t use 2008’s new hierarchy data type (or you don’t want to).

The Table Diagram would look something like this:

Hierarchy Example

Hierarchy Example

And the code to create this table would look like:

CREATE TABLE HierarchyExample (
	ObjectId int NOT NULL IDENTITY(1,1),
	ParentObjectId int NULL,
	BaseObjectId int NULL,
	OtherDataGoesHere nvarchar(max) NOT NULL,
	CONSTRAINT PK_HierarchyExample
		PRIMARY KEY (ObjectId),
	CONSTRAINT FK_HierarchyExample
		FOREIGN KEY (ParentObjectId)
		REFERENCES HierarchyExample (ObjectId)
)

Example: You might store this kind of data useful if you were really interested in finding the head or root node of the hierarchy. Like with vampires:

Vampire Hierarchy

Vampire Hierarchy

A Solution

The query:

declare @rc int;
declare @batch int;
select @rc = 1, @batch = 1000;
 
while @rc > 0
begin
      UPDATE TOP (@batch) HE
      SET BaseObjectId  = HE_Parent.BaseObjectId
      FROM HierarchyExample HE
      JOIN HierarchyExample AS HE_Parent
		ON HE.ParentObjectId = HE_Parent.ObjectId
      WHERE HE.BaseObjectId <> HE_Parent.BaseObjectId;
 
      SET @rc = @@ROWCOUNT
end

This script updates the BaseObjectId a thousand rows at a time which keeps the query relatively quick so it doesn’t hold on to locks too long (at least not x-locks). It also handles hierarchies with multiple levels.

Index for Efficiency: The best indexes for this table are the obvious nonclustered covering ones and really turn this query into something fast (using a merge join no less!):

  • HierarchyExample(ParentObjectId, BaseObjectId, ObjectId)
  • HierarchyExample(ObjectId) Include (BaseObjectId)

The original solution I tried used recursive CTEs where the performance  was less than ideal.

What now?

This is the kind of thing I like working on and I thought I’d share. So what about you? What kind of interesting T-SQL challenges have you come across recently?  Leave something in the comments or blog about it and post back here.

April 22, 2010

Sometimes the grass is greener…

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 5:02 pm

Update July 20, 2011: Read the comments, SQL 2012 Will support these functions!
Today I really wished I was working in an Oracle environment. I came across a problem where I really could have used Oracle’s LAG or LEAD functions.

Greener on the other side

Greener on the other side

The LAG() and LEAD() Functions

They’re functions which return a value based on the previous or next record in an ordered set of  records. While there are ways to mimic the functionality, the performance seems to suffer because either a cursor is used or a set of data is joined to itself (leading to something Jeff Moden calls Triangular Joins).

When I searched for this problem online, I was not surprised to find Itzik Ben Gan’s name. He and Sujata Mehta present their case about why Microsoft should support these features. Download and read it!

Afterwards I encourage readers to visit Itzik’s enhancement request at the Microsoft Connect site and give it another up-vote. This item deserves to be up-voted. Not just for the simplified syntax (i.e. faster development) but for the potential for optimization that it enables (i.e. faster performance).

The current state of the issue request is that Microsoft is “looking into it for a future release” (which seems to be a too-common reply at Connect lately).

April 14, 2010

Forcefully Rename a SQL Server Database

If you’ve ever used Object Explorer (in SSMS)  in order to rename a database, you may have come across this error message from Management Studio:

Unable to Rename

which says:

Unable to rename [database]. (ObjectExplorer)
Additional information:
--> Rename failed for Databse '[databse]'.
	(Microsoft.SqlServer.Smo)
	--> An exception occured while executing a Transact-SQL statement or batch.
		(Microsoft.SqlServer.ConnectionInfo)
		--> The database could not be exclusively locked to perform the operation.
			(Microsoft SQL Server, Error: 5030)

Management Studio is telling you that there are connections that it doesn’t want to mess with.

Or when using sp_renamedb from a query window, you might see this similar error:

Msg 5030, Level 16, State 2, Line 1
The database could not be exclusively locked to perform the operation.

If you really really want to rename the database and don’t care at all about in-flight transactions, then use this script/template I recently came up with:

A Forceful Script:

ALTER DATABASE [old_name]
 
	SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
ALTER DATABASE [old_name]
 
	MODIFY NAME = [new_name]
GO
ALTER DATABASE [new_name]
	SET MULTI_USER
GO

Caveat:

Remember that this script failed kindegarten for not playing well with others. So use this script only on QA and Dev boxes. And if you’re going to do something on a production box, make sure you do this only during scheduled downtime.

One More Interesting Thing:

A friend of mine stored this script in One Note and found that when it came to use it, the script failed because One Note had replaced some of the white space with non breaking spaces!

This issue is a One Note issue and applies to all scripts and code, not just T-SQL scripts.

April 13, 2010

T-SQL Tuesday #005: Focus on Users when Reporting

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:48 pm

Takeaway: Getting feedback from customers (end-users) is crucial for a successful reporting initiative. Success based on a user-adoption metric.

So it’s the second Tuesday of the month again and time for another T-SQL Tuesday. This April Aaron Nelson is hosting T-SQL Tuesday #005 and the topic of the month is Reports.

So this month I want to contrast the approaches we take as data storers and data reporters:

Client Focus

Client focus is more critical for reports and reporting than it is for data collection. Put another way: when writing data collection apps, you might be able to get by without much user feedback but when developing reporting solutions, you cannot get by without consulting users.

Consider these two typical scenarios of data flow between a user and data store.

Data In

Data In

In the data collection stage, data is consumed (and stored) by an application which is in the domain of the database developer (i.e. you). When the developer and the data consumer are the same person… everyone wins!

Data Out

Data Out

In contrast, reporting solutions see end users consuming data. This is about as far removed as you can get from the database developer (i.e. you again). So if there is a disconnect in communication between you two, nobody wins.

I remember seeing a chain of three extra people acting as a communication layer between an end user and a developer. I think it’s because the data collection system was designed and implemented with some success this way.

Bridge the Gap Between Dev and User

Ideally client and analyst can sit down for a 1 on 1 talk. When doing so we need to keep the following in mind:

  1. People who want data aren’t just curious
    They want to solve a problem. And in the rare case they’re not trying to solve a problem, they’re trying to find out whether there’s a problem to solve. So find out what problems users are trying to solve. What are their goals? “Analyze This” or “Improve That” are probably a bit too generic. Find out what decisions get made with this data.
  2. Summary Reports are Almost Never The End Of The Story
    See point 1, People are trying to solve a problem. And if they can’t dig further into data, they’re going to run up against an inevitable roadblock. In the past I’ve seen reporting solutions ditched for the combo solution of in-house-SQL-Guy + copy-and-paste + Excel.
    (As an aside, Excel as a solution isn’t too bad. It can serve the needs of the end user. It’s just that I think reporting solutions have the potential to do better)
  3. Reporting Requirements are Not Report Requirements
    See point 1 and 2. I would rather have requirements that start like “I need to analyze the efficiency of widget x” rather than “I need to view a histogram with data from this table”

I end with a quote from the last paragraph of The Data Warehouse Toolkit by Richard Kimball (a book I’ve plugged before):

In Closing … Sweeping away all the details and techniques, the gold coin for the data warehouse professional is to listen to the business. Consistently listening to the user brings us back to what we are supposed to do.

April 6, 2010

Transaction Phenomena – Part 4: Serializable vs. Snapshot

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft  describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 4: Serializable vs. Snapshot

So I’ve finished talking about the types of transaction phenomena defined in the ISO/ANSI standard. There are two isolation levels that SQL Server supports which never experience any of these (no dirty, non-repeatable or phantom reads). They are SERIALIZABLE and SNAPSHOT. They are both made available in order to avoid dirty, non-repeatable or phantom reads, but they do so using different methods. Understanding both is the key to being able to decide whether these are right for your application.

SERIALIZABLE

Serializable is the most isolated transaction level. Basically when a transaction reads or writes data from the database, that’s what it’s going to be until the end of the transaction:

From ISO/ANSI: [Execution of concurrent SERIALIZABLE transctions are guaranteed to be serializable which is] defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

So that’s it! SERIALIZABLE transactions see database data as if there were no other transactions running at the same time. So no dirty, phantom or non-repeatable reads (but maybe some blocking).

It’s interesting that the standard defines SERIALIZABLE as the default level. Microsoft doesn’t subscribe to that notion and makes READ COMMITTED the default level.

The SERIALIZABLE level prevents phantom reads by using range locks. Which I explain at the end of this article.

SNAPSHOT

SNAPSHOT transactions avoid phantom reads, dirty reads and non-repeatable reads, but they do  it in quite a different way than SERIALIZABLE transactions do.

While SERIALIZABLE uses locks, instead SNAPSHOT uses a copy of committed data. Since no locks are taken, when subsequent changes are made by concurrent transactions, those changes are allowed and not blocked.

So say you’re using SNAPSHOT transactions and you finally decide to make a change to some data. As far as you know, that data hasn’t changed from the first time you looked at it. But if that data has been changed elsewhere then you’ll get this error message:

Msg 3960, Level 16, State 4, Line 1
Snapshot isolation transaction aborted due to update conflict. You
cannot use snapshot isolation to access table 'dbo.test' directly or
indirectly in database 'snapshottest' to update, delete, or insert
the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the
update/delete statement.

What this update conflict error message is trying to convey is exactly the same concept as Wikipedia’s Edit Conflict error message. Except that Wikipedia explains it better. I suggest looking there.

ANSI-SQL’s SNAPSHOT Definition

There isn’t one. The SNAPSHOT isolation level I’m talking about is a Microsoft thing only. They’re useful, but definitely not part of the SQL standard.

It’s not too hard to see why. The SNAPSHOT isolation level permits the database server to serve data that is out of date. And that’s a big deal! It’s not just uncommitted. It’s old and incorrect (consistent, but incorrect).

Some people place a greater value on consistency rather than timely and accurate. I think it’s nice to have the choice.

Bonus Appendix: Range Locks.

(I was tempted to break out this appendix into its own blog post but ulitmately decided not to.)

So SERIALIZABLE transactions take range locks in order to prevent Phantom Reads. It’s interesting to see what range of values is actually locked. The locked range is always bigger than the range specified in the query. I’ll show an example.

Say we have a table storing integers and insert 6 even numbers:

CREATE TABLE MyTest
(
	id INT PRIMARY KEY
);
INSERT MyTest VALUES (2), (4), (6), (8), (10), (12);

illustration
Now lets read a range:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 3 AND 8;
-- do concurrent stuff here.
COMMIT

illustration
Books OnLine says: “The number of RangeS-S locks held is n+1, where n is the number of rows that satisfy the query.” We can verify this by looking at sys.dm_tran_locks. I’ve shown the locks that are taken above. Range locks apply to the range of possible values from the given key value, to the nearest key value below it.

You can see that the “locked range” of [2..10] is actually larger than the query range [3..8]. Attempts to insert rows into this range will wait.

What happens if we select a range containing all rows?

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 13;
-- do concurrent stuff here
COMMIT

illustration
You can see that everything is selected. That lock at “infinity” has a resource_description value of (ffffffffffff).

Last interesting bit. Ghost records can participate in these ranges!

DELETE MyTest;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 10;
-- do concurrent stuff here
COMMIT

illustration
Concurrent transactions are able to insert values above 12, but will wait for values below that!

March 25, 2010

Transaction Phenomena – Part 2: The Non-Repeatable Read

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft  describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 2: The Non-Repeatable Read

The non-repeatable read occurs when a transaction re-reads a single record and finds that it has been changed or deleted.

From ISO/ANSI: Process P1 reads a row. Process P2 then modifies or deletes that rows and commits the change. If P1 rereads the row it receives the modified value or discovers the row has been deleted.

Changing Data
I like to think of Non-Repeatable Reads as being about reading data that has changed.

Changing Data

Changing Data

(I apologize in advance to my friends at work whose reaction I know will be “Boo”.)

But Non-Repeatable reads are not always a bad thing. In fact they’re often harmless or even required. When data has changed, you have to decide whether you want consistent data or whether you want current data. If you need to rely on consistent data, then a non-repeatable read is detrimental. Here is an example of a non-repeatable read that causes problems:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
 
	IF EXISTS(
		SELECT 1
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23')
	BEGIN
 
		-- imagine a concurrent transaction
		-- sells seat B23 here and removes the record
		-- from table Theatre.AvailableSeats
 
		-- then no rows are returned here:
		SELECT price
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
		-- and zero rows are affected here:
		DELETE FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
	END
COMMIT

Isolation Levels that Experience Non-Repeatable Reads

Just like dirty reads, this phenomenon is possible when using the READ UNCOMMITTED isolation level, but it is also possible when using the READ COMMITTED isolation level. And in SQL Server READ COMMITTED is the default isolation level for transactions.

My experience
To avoid this phenomena, you could use isolation level REPEATABLE READ. But often I find it’s possible to rewrite a transaction and stick with the default READ COMMITTED.

« Newer PostsOlder Posts »

Powered by WordPress