Michael J. Swart

April 22, 2010

Sometimes the grass is greener…

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 5:02 pm

Update July 20, 2011: Read the comments, SQL 2012 Will support these functions!
Today I really wished I was working in an Oracle environment. I came across a problem where I really could have used Oracle’s LAG or LEAD functions.

Greener on the other side

Greener on the other side

The LAG() and LEAD() Functions

They’re functions which return a value based on the previous or next record in an ordered set of  records. While there are ways to mimic the functionality, the performance seems to suffer because either a cursor is used or a set of data is joined to itself (leading to something Jeff Moden calls Triangular Joins).

When I searched for this problem online, I was not surprised to find Itzik Ben Gan’s name. He and Sujata Mehta present their case about why Microsoft should support these features. Download and read it!

Afterwards I encourage readers to visit Itzik’s enhancement request at the Microsoft Connect site and give it another up-vote. This item deserves to be up-voted. Not just for the simplified syntax (i.e. faster development) but for the potential for optimization that it enables (i.e. faster performance).

The current state of the issue request is that Microsoft is “looking into it for a future release” (which seems to be a too-common reply at Connect lately).

April 14, 2010

Forcefully Rename a SQL Server Database

If you’ve ever used Object Explorer (in SSMS)  in order to rename a database, you may have come across this error message from Management Studio:

Unable to Rename

which says:

Unable to rename [database]. (ObjectExplorer)
Additional information:
--> Rename failed for Databse '[databse]'.
	(Microsoft.SqlServer.Smo)
	--> An exception occured while executing a Transact-SQL statement or batch.
		(Microsoft.SqlServer.ConnectionInfo)
		--> The database could not be exclusively locked to perform the operation.
			(Microsoft SQL Server, Error: 5030)

Management Studio is telling you that there are connections that it doesn’t want to mess with.

Or when using sp_renamedb from a query window, you might see this similar error:

Msg 5030, Level 16, State 2, Line 1
The database could not be exclusively locked to perform the operation.

If you really really want to rename the database and don’t care at all about in-flight transactions, then use this script/template I recently came up with:

A Forceful Script:

ALTER DATABASE [old_name]
 
	SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
ALTER DATABASE [old_name]
 
	MODIFY NAME = [new_name]
GO
ALTER DATABASE [new_name]
	SET MULTI_USER
GO

Caveat:

Remember that this script failed kindegarten for not playing well with others. So use this script only on QA and Dev boxes. And if you’re going to do something on a production box, make sure you do this only during scheduled downtime.

One More Interesting Thing:

A friend of mine stored this script in One Note and found that when it came to use it, the script failed because One Note had replaced some of the white space with non breaking spaces!

This issue is a One Note issue and applies to all scripts and code, not just T-SQL scripts.

April 13, 2010

T-SQL Tuesday #005: Focus on Users when Reporting

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:48 pm

Takeaway: Getting feedback from customers (end-users) is crucial for a successful reporting initiative. Success based on a user-adoption metric.

So it’s the second Tuesday of the month again and time for another T-SQL Tuesday. This April Aaron Nelson is hosting T-SQL Tuesday #005 and the topic of the month is Reports.

So this month I want to contrast the approaches we take as data storers and data reporters:

Client Focus

Client focus is more critical for reports and reporting than it is for data collection. Put another way: when writing data collection apps, you might be able to get by without much user feedback but when developing reporting solutions, you cannot get by without consulting users.

Consider these two typical scenarios of data flow between a user and data store.

Data In

Data In

In the data collection stage, data is consumed (and stored) by an application which is in the domain of the database developer (i.e. you). When the developer and the data consumer are the same person… everyone wins!

Data Out

Data Out

In contrast, reporting solutions see end users consuming data. This is about as far removed as you can get from the database developer (i.e. you again). So if there is a disconnect in communication between you two, nobody wins.

I remember seeing a chain of three extra people acting as a communication layer between an end user and a developer. I think it’s because the data collection system was designed and implemented with some success this way.

Bridge the Gap Between Dev and User

Ideally client and analyst can sit down for a 1 on 1 talk. When doing so we need to keep the following in mind:

  1. People who want data aren’t just curious
    They want to solve a problem. And in the rare case they’re not trying to solve a problem, they’re trying to find out whether there’s a problem to solve. So find out what problems users are trying to solve. What are their goals? “Analyze This” or “Improve That” are probably a bit too generic. Find out what decisions get made with this data.
  2. Summary Reports are Almost Never The End Of The Story
    See point 1, People are trying to solve a problem. And if they can’t dig further into data, they’re going to run up against an inevitable roadblock. In the past I’ve seen reporting solutions ditched for the combo solution of in-house-SQL-Guy + copy-and-paste + Excel.
    (As an aside, Excel as a solution isn’t too bad. It can serve the needs of the end user. It’s just that I think reporting solutions have the potential to do better)
  3. Reporting Requirements are Not Report Requirements
    See point 1 and 2. I would rather have requirements that start like “I need to analyze the efficiency of widget x” rather than “I need to view a histogram with data from this table”

I end with a quote from the last paragraph of The Data Warehouse Toolkit by Richard Kimball (a book I’ve plugged before):

In Closing … Sweeping away all the details and techniques, the gold coin for the data warehouse professional is to listen to the business. Consistently listening to the user brings us back to what we are supposed to do.

April 6, 2010

Transaction Phenomena – Part 4: Serializable vs. Snapshot

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft  describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 4: Serializable vs. Snapshot

So I’ve finished talking about the types of transaction phenomena defined in the ISO/ANSI standard. There are two isolation levels that SQL Server supports which never experience any of these (no dirty, non-repeatable or phantom reads). They are SERIALIZABLE and SNAPSHOT. They are both made available in order to avoid dirty, non-repeatable or phantom reads, but they do so using different methods. Understanding both is the key to being able to decide whether these are right for your application.

SERIALIZABLE

Serializable is the most isolated transaction level. Basically when a transaction reads or writes data from the database, that’s what it’s going to be until the end of the transaction:

From ISO/ANSI: [Execution of concurrent SERIALIZABLE transctions are guaranteed to be serializable which is] defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

So that’s it! SERIALIZABLE transactions see database data as if there were no other transactions running at the same time. So no dirty, phantom or non-repeatable reads (but maybe some blocking).

It’s interesting that the standard defines SERIALIZABLE as the default level. Microsoft doesn’t subscribe to that notion and makes READ COMMITTED the default level.

The SERIALIZABLE level prevents phantom reads by using range locks. Which I explain at the end of this article.

SNAPSHOT

SNAPSHOT transactions avoid phantom reads, dirty reads and non-repeatable reads, but they do  it in quite a different way than SERIALIZABLE transactions do.

While SERIALIZABLE uses locks, instead SNAPSHOT uses a copy of committed data. Since no locks are taken, when subsequent changes are made by concurrent transactions, those changes are allowed and not blocked.

So say you’re using SNAPSHOT transactions and you finally decide to make a change to some data. As far as you know, that data hasn’t changed from the first time you looked at it. But if that data has been changed elsewhere then you’ll get this error message:

Msg 3960, Level 16, State 4, Line 1
Snapshot isolation transaction aborted due to update conflict. You
cannot use snapshot isolation to access table 'dbo.test' directly or
indirectly in database 'snapshottest' to update, delete, or insert
the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the
update/delete statement.

What this update conflict error message is trying to convey is exactly the same concept as Wikipedia’s Edit Conflict error message. Except that Wikipedia explains it better. I suggest looking there.

ANSI-SQL’s SNAPSHOT Definition

There isn’t one. The SNAPSHOT isolation level I’m talking about is a Microsoft thing only. They’re useful, but definitely not part of the SQL standard.

It’s not too hard to see why. The SNAPSHOT isolation level permits the database server to serve data that is out of date. And that’s a big deal! It’s not just uncommitted. It’s old and incorrect (consistent, but incorrect).

Some people place a greater value on consistency rather than timely and accurate. I think it’s nice to have the choice.

Bonus Appendix: Range Locks.

(I was tempted to break out this appendix into its own blog post but ulitmately decided not to.)

So SERIALIZABLE transactions take range locks in order to prevent Phantom Reads. It’s interesting to see what range of values is actually locked. The locked range is always bigger than the range specified in the query. I’ll show an example.

Say we have a table storing integers and insert 6 even numbers:

CREATE TABLE MyTest
(
	id INT PRIMARY KEY
);
INSERT MyTest VALUES (2), (4), (6), (8), (10), (12);

illustration
Now lets read a range:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 3 AND 8;
-- do concurrent stuff here.
COMMIT

illustration
Books OnLine says: “The number of RangeS-S locks held is n+1, where n is the number of rows that satisfy the query.” We can verify this by looking at sys.dm_tran_locks. I’ve shown the locks that are taken above. Range locks apply to the range of possible values from the given key value, to the nearest key value below it.

You can see that the “locked range” of [2..10] is actually larger than the query range [3..8]. Attempts to insert rows into this range will wait.

What happens if we select a range containing all rows?

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 13;
-- do concurrent stuff here
COMMIT

illustration
You can see that everything is selected. That lock at “infinity” has a resource_description value of (ffffffffffff).

Last interesting bit. Ghost records can participate in these ranges!

DELETE MyTest;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 10;
-- do concurrent stuff here
COMMIT

illustration
Concurrent transactions are able to insert values above 12, but will wait for values below that!

April 5, 2010

New Titles

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 10:15 pm

We’re deep in the heart of spring time, a time to celebrate renewal. And if you’re living in Southwestern Ontario, the weather is making it really easy to do. Also coming out in April is a number of new titles (in more than one sense of the word). Here are some that I’d like to share with everyone.

1. The Data Warehouse Toolkit by Richard Kimball

The Data Warehouse ToolkitOkay, so this isn’t a new book, but it’s new to me. What follows here is a review of that book. What I write will seem uncritical, but I’m not going to make up some bad stuff for the sake of balance. It’s that good.

So about a month ago I asked a friend/B.I. Expert Todd McDermid for a good head start on Business Intelligence and he recommended this book.

And after some more digging, I find that he’s not the only B.I. expert to recommend this book. In SQL Server MVP Deep Dives Erin Welker also writes about Business Intelligence:  “From here my best recommendation is to start the way I started, with a book I cannot recommend enough: The Data Warehouse Toolkit by Ralph Kimball”

So the book has been out for a long time and time has shown that the methods established in the book are proven and accepted. I’ve read a number of chapters so far and I’m impressed. I’m averaging a couple AHA moments a chapter. I now have a pretty good idea why the reporting projects I’ve seen in the past have suffered from poor user adoption.

As I’m about to start on some new B.I. project,  I realize that there’s so much I still don’t know. But the book is a good start.

2. Brad Schulz’s New Title, MVP

Last October I had the fortune to be able to interview someone I thought was underexposed, Brad Schulz. His commitment to the SQL Server community was recognized late last week when he was awarded Microsoft’s Most Valuable Professional.

Brad maintains a blog and his latest post was a very gracious thank you. Congratulations Brad! To everyone not Brad, double check that his blog’s feed is in your RSS reader.

3. Brent Ozar’s New Title, MCM

Brent is a prolific blogger and respected DBA. He’s been sharing his experiences with the grueling Microsoft Certified Master training course.

How do I know the course is grueling? Certainly not from personal experience. I just read it on someone’s blog. Now where did I see that… Oh that’s right. Somehow Brent found time to blog about his own MCM training experiences despite the heavy course workload.

And it doesn’t seem to have cramped his writing style either. Whereas I try to generate interest in my blog through drawing cartoons or by generating nifty images (see I’m doing it again), Brent generates interest the old fashioned way. By writing SQL articles which somehow turn into human interest stories. He’s a human being and by design or not, his emotions (and bacon) show in his posts more often than not.

He took the final lab this weekend and describes the experience.  If any exam can be called an experience I understand this one can. Congratulations Brent.

Powered by WordPress