"sql server" | Michael J. Swart - Part 2 Michael J. Swart

October 26, 2011

Where Are Your Popular Joins?

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", cached queries, joins, query plans — Michael J. Swart @ 8:00 pm

So you know a lot about your databases right? You’re familiar with their schemas and tables and the queries that run on them. Personally I use sys.dm_exec_query_stats to understand what the most popular queries are.

But I recently started wondering about popular table joins.

I was wondering: “What tables in my database are most commonly joined together?” I already have a pretty good idea based on the data model. But I wanted to find out if the popular queries are in sync with my understanding. Unfortunately there’s no system view called sys.dm_exec_join_stats. The whole reason that I was curious is that I wanted to find a set of common table joins whose queries might be improved with a indexed view.

So I wrote something that gives me a bit of an idea. It’s a query that looks at cached query plans and counts nested loop joins (multiplied by execution count).

USE tempdb;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION
 
;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select 
	cp.usecounts as numberOfJoins,
	seeknodes.query('.') as plansnippet
into #my_joins
from sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_query_plan(cp.plan_handle)
	as qp
cross apply query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//SeekKeys/Prefix[@ScanType="EQ"]') 
	as seeks(seeknodes)
where seeknodes.exist('./RangeColumns/ColumnReference[1]/@Database') = 1
	and seeknodes.exist('./RangeExpressions/ScalarOperator/Identifier/ColumnReference[1]/@Database') = 1;
 
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p1)
select sum(numberOfJoins) as [Number Of Joins],
	myValues.lookupTable + '(' + myValues.lookupColumn + ')' as lookupColumn,
	myValues.expressionTable + '(' + myValues.expressionColumn + ')' as expressionColumn
from #my_joins
cross apply plansnippet.nodes('./p1:Prefix/p1:RangeColumns/p1:ColumnReference[1]')
	as rangeColumns(rangeColumnNodes)
cross apply plansnippet.nodes('./p1:Prefix/p1:RangeExpressions/p1:ScalarOperator/p1:Identifier/p1:ColumnReference[1]')
	as rangeExpressions(rangeExpressionNodes)
cross apply (
	select
		rangeColumnNodes.value('@Database', 'sysname') as lookupDatabase, 
		rangeColumnNodes.value('@Schema', 'sysname') as lookupSchema,
		rangeColumnNodes.value('@Table', 'sysname') as lookupTable,
		rangeColumnNodes.value('@Column', 'sysname') as lookupColumn,
		rangeExpressionNodes.value('@Database', 'sysname') as expressionDatabase, 
		rangeExpressionNodes.value('@Schema', 'sysname') as expressionSchema,
		rangeExpressionNodes.value('@Table', 'sysname') as expressionTable,
		rangeExpressionNodes.value('@Column', 'sysname') as expressionColumn	
	) as myValues
where myValues.expressionTable != myValues.lookupTable
group by myValues.lookupTable, myValues.lookupColumn, myValues.expressionTable, myValues.expressionColumn
order by SUM(numberOfJoins) desc;
 
rollback;

USE tempdb; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; BEGIN TRANSACTION ;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') select cp.usecounts as numberOfJoins, seeknodes.query('.') as plansnippet into #my_joins from sys.dm_exec_cached_plans cp cross apply sys.dm_exec_query_plan(cp.plan_handle) as qp cross apply query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//SeekKeys/Prefix[@ScanType="EQ"]') as seeks(seeknodes) where seeknodes.exist('./RangeColumns/ColumnReference[1]/@Database') = 1 and seeknodes.exist('./RangeExpressions/ScalarOperator/Identifier/ColumnReference[1]/@Database') = 1; WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p1) select sum(numberOfJoins) as [Number Of Joins], myValues.lookupTable + '(' + myValues.lookupColumn + ')' as lookupColumn, myValues.expressionTable + '(' + myValues.expressionColumn + ')' as expressionColumn from #my_joins cross apply plansnippet.nodes('./p1:Prefix/p1:RangeColumns/p1:ColumnReference[1]') as rangeColumns(rangeColumnNodes) cross apply plansnippet.nodes('./p1:Prefix/p1:RangeExpressions/p1:ScalarOperator/p1:Identifier/p1:ColumnReference[1]') as rangeExpressions(rangeExpressionNodes) cross apply ( select rangeColumnNodes.value('@Database', 'sysname') as lookupDatabase, rangeColumnNodes.value('@Schema', 'sysname') as lookupSchema, rangeColumnNodes.value('@Table', 'sysname') as lookupTable, rangeColumnNodes.value('@Column', 'sysname') as lookupColumn, rangeExpressionNodes.value('@Database', 'sysname') as expressionDatabase, rangeExpressionNodes.value('@Schema', 'sysname') as expressionSchema, rangeExpressionNodes.value('@Table', 'sysname') as expressionTable, rangeExpressionNodes.value('@Column', 'sysname') as expressionColumn ) as myValues where myValues.expressionTable != myValues.lookupTable group by myValues.lookupTable, myValues.lookupColumn, myValues.expressionTable, myValues.expressionColumn order by SUM(numberOfJoins) desc; rollback;

Some caveats:

Parsing xml takes a lot of time and a lot of CPU (the subtree cost is huge and execution time is measured in seconds or minutes)
It’s only useful on a system that is used a lot (as opposed to a dev database).
It only reports statistics about queries that are found in cached plans. So the stats are only relevant since the last server restart
It only counts loop joins (not hash or merge joins)
If you want, you can adjust the query to include schema and database names in the results

I hope you find it useful. This query gives hints for further investigation into potential indexed views. It worked well for me and so I thought it was worth sharing. I ran this query against a server I know well and I was surprised at some of the results. Good luck.

-- Comments (4)

September 8, 2011

Mythbusting: Concurrent Update/Insert Solutions

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", concurrency, high concurrency, upsert — Michael J. Swart @ 8:00 am

Update: This is a post on the topic of UPSERT. See my more recent post SQL Server UPSERT Patterns and Antipatterns

I’ve recently come across a large number of methods that people use to avoid concurrency problems when programming an Insert/Update query. Rather than argue with some of them. I set out to test out how valid each method by using experiment! Here’s the myth:

Does Method X really perform Insert/Update queries concurrently, accurately and without errors?

… where X is one of a variety of approaches I’m going to take. And just like the Discovery Channel show Mythbusters, I’m going to call each method/procedure/myth either busted, confirmed or plausible based on the effectiveness of each method.

Actually, feel free to follow along at home (on development servers). Nothing here is really dangerous.

Here’s what my stored procedure should do. The stored procedure should look in a particular table for a given id. If a record is found, a counter field on that record is incremented by one. If the given id is not found, a new record is inserted into that table. This is the common UPSERT scenario.

I want to be able to do this in a busy environment and so the stored procedure has to co-operate and play nicely with other concurrent processes.

The Setup

The set up has two parts. The first part is the table and stored procedure. The stored procedure will change for each method, but here’s the setup script that creates the test database and test table:

/* Setup */
if DB_ID('UpsertTestDatabase') IS NULL
    create database UpsertTestDatabase
go
use UpsertTestDatabase
go
 
if OBJECT_ID('mytable') IS NOT NULL
    drop table mytable;
go
 
create table mytable
(
    id int,
    name nchar(100),
    counter int,
    primary key (id),
    unique (name)
);
go

The second thing I need for my setup is an application that can call a stored procedure many times concurrently and asynchronously. That’s not too hard. Here’s the c-sharp program I came up with: Program.cs. It compiles into a command line program that calls a stored procedure 10,000 times asynchronously as often as it can. It calls the stored procedure 10 times with a single number before moving onto the next number This should generate 1 insert and 9 updates for each record.

Method 1: Vanilla

The straight-forward control stored procedure, it simply looks like this:

/* First shot */
if OBJECT_ID('s_incrementMytable') IS NOT NULL
drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

It works fine in isolation, but when run concurrently using the application, I get primary key violations on 0.42 percent of all stored procedure calls! Not too bad. The good news is that this was my control scenario and now I’m confident that there is a valid concurrency concern here. And that my test application is working well.

Method 2: Decreased Isolation Level

Just use NOLOCKS on everything and all your concurrency problems are solved right?

if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
set transaction isolation level read uncommitted
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

I find out that there are still errors that are no different than method 1. These primary key errors occur on 0.37 percent of my stored procedure calls. NOLOCK = NOHELP in this case.

Method 3: Increased Isolation Level

So let’s try to increase the isolation level. The hope is that the more pessimistic the database is, the more locks will be taken and held as they’re needed preventing these primary key violations:

if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
set transaction isolation level serializable
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

Bad news! Something went wrong and while there are no primary key violations, 82% of my queries failed as a deadlock victim. A bit of digging tells me that several processes have gained shared locks and are also trying to convert them into exclusive locks… Deadlocks everywhere

Method 4: Increased Isolation + Fine Tuning Locks

Hmm… What does stackoverflow have to say about high concurrency upsert? A bit of research on Stackoverflow.com lead me to an excellent post by Sam Saffron called Insert or Update Pattern For SQL Server. He describes what I’m trying to do perfectly. The idea is that when the stored procedure first reads from the table, it should grab and hold a lock that is incompatible with other locks of the same type for the duration of the transaction. That way, no shared locks need to be converted to exclusive locks. So I do that with a locking hint.:

if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
set transaction isolation level serializable
begin transaction
if exists (select 1 from mytable with (updlock) where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

Zero errors! Excellent! The world makes sense. It always pays to understand a thing and develop a plan rather than trial and error.

Method 5: Read Committed Snapshot Isolation

I heard somewhere recently that I could turn on Read Committed Snapshot Isolation. It’s an isolation level where readers don’t block writers and writers don’t block readers by using row versioning (I like to think of it as Oracle mode). I heard I could turn this setting on quickly and most concurrency problems would go away. Well it’s worth a shot:

ALTER DATABASE UpsertTestDatabase
SET ALLOW_SNAPSHOT_ISOLATION ON
 
ALTER DATABASE UpsertTestDatabase
SET READ_COMMITTED_SNAPSHOT ON
go
 
if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

Ouch! Primary key violations all over the place. Even more than the control! 23% of the stored procedure calls failed with a primary key violation. And by the way, if I try this with Snapshot Isolation, I not only get PK violations, I get errors reporting “Snapshot isolation transaction aborted due to update conflict”. However, combining method 4 with snapshot isolation once again gives no errors. Kudos to Method 4!

Other Methods

Here are some other things to try (but I haven’t):

Avoiding concurrency issues by using Service Broker. If it’s feasible, just queue up these messages and apply them one at a time. No fuss.
Rewrite the query above as: UPDATE…; IF @@ROWCOUNT = 0 INSERT…; You could try this, but you’ll find this is almost identical with Method 1.

So How Are We Going To Call This One?

So here are the results we have:

Concurrency Method	Status	Notes
Method 1: Vanilla	Busted	This was our control. The status quo is not going to cut it here.
Method 2: Decreased Isolation Level	Busted	NOLOCK = NOHELP in this case
Method 3: Increased Isolation Level	Busted	Deadlocks! Strict locking that is used with the SERIALIZABLE isolation level doesn’t seem to be enough!
Method 4: Increased Isolation + Fine Tuning Locks	Confirmed	By holding the proper lock for the duration of the transaction, I've got the holy grail. (Yay for StackOverflow, Sam Saffron and others).
Method 5: Read Committed Snapshot Isolation	Busted	While RCSI helps with most concurrency issues, it doesn't help in this particular case.
Other Methods: Service Broker	Plausible	Avoid the issue and apply changes using a queue. While this would work, the architectural changes are pretty daunting
Update! (Sept. 9, 2011) Other Methods: MERGE statement	Busted	See comments section
Update! (Feb. 23, 2012) Other Methods: MERGE statement + Increased Isolation	Confirmed	With a huge number of comments suggesting this method (my preferred method), I thought I’d include it here to avoid any further confusion

Don’t try to apply these conclusions blindly to other situations. In another set of circumstances who knows what the results would be. But test for yourself. Like a good friend of mine likes to say: “Just try it!”

So that’s the show. If you have a myth you want busted. Drop me a line, or leave me a message in the comments section. Cheers ’til next time.

-- Comments (50)

September 2, 2011

Me and My Big Mouth (Literally)

Filed under: Miscelleaneous SQL,Tongue In Cheek — Tags: "sql server", mjswart, speaking, sqlsat93, toronto — Michael J. Swart @ 9:20 am

SAAAAAAAAAN guy!.

I First Learned About SQL as a Speaker

One of my jobs in University was to tutor first year computer science students. My friends would tease me and call me a “computer tutor” in a really nasally voice. But it was a good job and a good experience.

I did a large variety of things in that job. I did the usual things like running tutorials, marking papers, and helping students with their assignments. But I also gave campus tours and I gave workshops to other students.These workshops were very very brief introductions to various computer science topics.

I was assigned to give one of those workshops on the topic of something called SQL. It was the first time I had ever encountered anything database related and I was supposed to teach it! I had never even seen the word SQL before and it was years before I got used to pronouncing it sequel instead of ess cue ell. I was nervous then, but I don’t think I needed to be. I learned enough about that subject to teach it well and the preparation paid off. That experience made me comfortable around SQL and when I encountered this “database language” again, I found it easy to pick up where I left off.

During that job, I didn’t learn a lot about computers that I didn’t already know, but I did learn a lot about speaking and teaching. I learned how tricky it was to pace yourself. If you talk too quick, the subject matter goes over everyone’s head. Speak too slow and it sounds like you’re condescending and talking down to people.

I still like talking about SQL Server. At work, I often give lunch-and-learns. These are lunch-hour talks put on by coworkers for coworkers to talk about standards or to teach something that needs explaining. In the past few years, I’ve learned a lot about web development and I hope my colleagues have learned a bit about databases.

In general, I think public speaking is a good skill to have. Some are naturals (or seem to be) and others (like me) need the practice. So having said that …

I’ll Be Giving My First Talk at SQL Saturday #93 in Toronto

And so I’m super excited about giving a talk at SQL Saturday #93 in Toronto (September 17, 2011). SQL Saturday #93 is a free one-day workshop for SQL Server. (Register here!). The talk I’m going to give is called Obliterate Excessive Blocking and Deadlocking As a DB Developer, I think avoiding blocking is one of the most valuable skills to have. I’ll advertise this talk a bit more in a blog post next week. In the meantime…

Name that Caption!

That creepy picture of me up there is screaming for a better caption. Let me know your ideas in the comments, or put it in a tweet (@MJSwart). Let me know by the end of Wednesday (Sept. 7th). I’ll pick my favourite and let you know on Thursday!

Update September 15, 2011: So John Sansom is the lucky winner of the caption contest. John, I’ll buy you a drink next time we’re in the same city.

-- Comments (6)

August 23, 2011

ACID Properties By Example (And Counterexample) Part Four: Durable

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", acid properties, isolation, sql — Michael J. Swart @ 9:00 am

ACID Properties By (Counter) Example

The last ACID property is D, Durability. Again, Haerder and Reuter describe Durability:

“Once a transaction has been completed and has committed its results to the database, the system must guarantee that these results survive any subsequent malfunctions.”

What does this mean exactly? That’s a tall order for a database system! I mean any malfunction whatsoever? I’m pretty sure our database systems are designed to survive a power failure but I don’t expect that they could survive something as severe as the heat death of the universe.

Actually databases don’t have to go that far. When designing a database system, only two kinds of malfunctions are considered: media failure and system failure.

Media Failures

For media failure (e.g. a faulty hard drive) databases are recovered by using backups and transaction logs. And this leads directly to three bits of super-common DBA advice:

Take backups regularly.
Keep your transaction logs and your main database files on different hard drives.
When dealing with a disk failures, step one is backing up the tail of the log

System Failures

System failures (e.g. system crashes, power outages etc…) have to be handled too.

SQL Server does it this way. When SQL Server is processing transactions, it will first write changes to a transaction log and then write the associated changes to the database file. Always always in that order (There’s a bit more too it, but that’s the main part). It’s called the Write-Ahead Transaction Log.

But when there’s a system malfunction, a few things need to be cleaned up the next time the server restarts (to maintain atomicity and consistency). There may be transactions that were interrupted and not yet committed. And some transactions may not have their changes written to disk, or sometimes not written completely to disk. How do you recover from stuff like that?

Well the database recovers from a failure like that during a startup process called (unsurprisingly) “recovery”. It can look at these half-performed transactions and it can roll them back using the info in the logs. Or alternatively it can roll-forward and replay committed transactions that haven’t made it to disk if the conditions are right and there’s enough info in the transaction log to do so. (Further Information at MCM Prep Video: Log File Internals and Maintenance)

So What Does This Mean To You?

If an ACID database system like SQL Server reports that your transaction has committed successfully then because it’s durable, your transaction is truly persisted: You don’t have to worry about buffer flushes or power failures “losing” your work.

Example

So what is interestingly durable? Durability in database systems usually means that something is redundant so that if one thing is lost, the transaction is not lost. So I give a list here of things that are too redundant:

The Hydra‘s heads (Greek Mythology)
Enchanted Brooms from the Sorcerer’s Apprentice.
Autofac (An interesting short story by Philip K. Dick which I finished reading last night).

Counter-Example

I have two examples and they both come from the career of Richard Harris (best known to my family as the first Dumbledore). Did you know he was a one-hit wonder? He had a hit single in the seventies called MacArthur Park. If you’ve never heard the song, skip this article and experience the utter madness that is MacArthur Park. You won’t regret it.

Back to the example. The singer of MacArthur Park would like to have his cake. Unfortunately, it’s been left out in the rain (malfunction). But that’s okay right? He could always get out the recipe (transaction log) and make a new one right? Wrong! He’ll never have that recipe again (durability fail). Had he persisted that recipe, the poor sucker would still have his cake.

Bonus Richard Harris Counterexample

You may remember he played Emperor Marcus Aurelius in the movie Gladiator. (Spoiler alert!) In that movie, he plans to make Maximus his heir instead of his son Commodus. He first tells his plans to Maximus (who is reluctant to rule Rome) and then he tells Commodus who did not take the news well at all. In fact he murdered his father after hearing it! The Emperor’s plans never make it to the public and so Commodus becomes Emperor.

You see, his plans to make Maximus his heir was not durable! Had the Emperor told a bunch of other people first, then his intended heir Maximus would have ruled Rome as he wanted (Not to mention it would have removed the motive for his murder).

That’s The Series

So that’s it. I had fun with it. It gave me a chance to “geek out”. And even though blog post series are a nice way of treating a topic in depth, I still found myself struggling to keep each article to blog-post length. There’s just so much to learn here. I guarantee I learned more writing this series than a reader would reading it 😉

Tell me what you think!

-- Comments (10)

August 10, 2011

ACID Properties By Example (And Counterexample) Part Three: Isolation

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", acid properties, isolation, sql — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

So the third ACID property of database transactions is I for Isolation. This is the only ACID property that deals with behaviour of a transaction with respect to other concurrent transactions (all the other properties describe the behaviour of single transactions). Haerder and Reuter describe it as:

Isolation: Events within a transaction must be hidden from other transactions running concurrently.

It’s not super-rigorous, but I think of it like this: No looking at works-in-progress

(Actually, I don’t always believe in that advice, but it helps the cartoon)

So there are different kinds of database isolation. Even with the the guideline: no looking at other transactions in progress. And now these levels of isolation are well defined. I wrote a series on those earlier, the different levels are READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ and SERIALIZABLE. By the way READ UNCOMMITTED is the only isolation level here that is not really isolated, more on that later.

Isolation in SQL Server

SQL Server supports all of these isolation levels. It enforces this isolation using various locks on data (fascinating stuff actually), processes will wait to maintain isolation. In contrast, Oracle supports only SERIALIZABLE and a kind of READ COMMITTED that is closer in behaviour to SQL Server’s SNAPSHOT isolation. No matter how it’s implemented, READ COMMITTED is the default isolation level in both SQL Server and Oracle.

Unisolated Transactions:

So it is possible for other transactions to see the effects of a transaction in-flight (i.e. as it’s happening, before it’s committed). This is done with NOLOCK hints or with the READ UNCOMMITTED isolation level. In fact, I learned recently that when using NOLOCK hints, you not only can see the effects of an in-flight transaction, but you can see the effects of an in-flight statement. This is an Isolation failure and it boils down to this: SQL Server transactions are atomic, but when using NOLOCK, it might not seem that way. So take care.

Example

Today’s example and counterexample both come from the newspapers headlines of Chicago.

For the example – a fictional example – I explain a situation that’s all about not making assumptions. It’s all about being cautious and not committing to a decision while the jury’s still out. This immediately brought to mind a scene from the movie Chicago [spoiler alert!] :

The movie (and play) is about a court case. The main character Roxie is on trial for murder. It’s a sensational trial and the papers are eager to publish the results of the trial. The papers are so eager in fact that the papers have printed out two editions of their newspapers. One headline read “She’s Innocent” the other headline read “She’s Guilty”. But those two stacks of papers are just sitting there in the van. The man in the newspaper van waits for a signal from the courthouse. Once he got the proper signal, he cracked open the innocent edition and gave them to a paper boy to hand out.

It’s about not acting on information while the jury is still out. The jury is isolated from the world and no one can act on what the jury has to say until they’ve committed to a verdict.

Counter-Example

Our counter-example comes from non-fiction. In reality, the assumptions we make tend to be correct. Our assumptions are only interesting when they turn out to be incorrect. This counter-example comes from the most incorrect newspaper headline I can think of:

“Dewey Defeats Truman”

Click through for Wikipedia’s page on cool piece of newspaper history (Chicago newspaper history). It’s a great example of what can go wrong when we act on tentative (uncommitted) information. The Chicago Tribune published the wrong presidential candidate as the winner.

But the really really cautious reporters would report neither candidate as the winner. They’d be waiting at the Electoral College convention. They’d be keen on seeing how that turns out.

-- Comments (6)

August 3, 2011

ACID Properties By Example (And Counterexample) Part Two: Consistent

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", acid properties, atomic, sql — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

Dr. Jim Gray developed many of the properties that transactions have in well-designed databases. Later, Haerder and Reuter took these properties and used them to coin the acronym ACID. At that time, they defined consistency this way:

Consistency
A transaction reaching its normal end, thereby committing its results, preserves the consistency of the database. In other words, each successful transaction by definition commits only legal results.

Essentially, consistency means that database systems have to enforce business rules defined for their databases.
But it’s interesting. The word consistency (applied to database systems) aren’t always used the same way! For example, in Brewer’s CAP theorem the C, standing for consistency is defined as “All clients have the same view of the data.” (Really computer scientists?? That’s the word you decide to overload with different meanings?). So if you ever hear someone say eventually consistent. They’re using the consistency term from CAP, not the consistency term from ACID.

I guess “C” means something different for everyone.

Consistency in SQL Server

In my own words, consistency means that any defined checks and constraints are satisfied after any transaction:

Columns only store values of a particular type (int columns store only ints, etc…)
Primary keys and unique keys are unique
Check constraints are satisfied
Foreign key constraints are satisfied

Constraints Are Enforced One Record At A Time
Some things you might notice about these constraints. They can all be checked and validated by looking at a single row. Check constraints enforce rules based only on the columns of a single row. One exception is where these constraints might perform a singleton lookup in an index to look for the existence of a row (for enforcing foreign keys and primary keys).
Multi-line constraints are not supported directly because it would be impractical to efficiently enforce consistency. For example, it’s not possible to create a constraint on an EMPLOYEE table that would enforce the rule that the sum of employee salaries must not exceed a specific amount.

Consistency Enforced After Each Statement
Another interesting thing about SQL Server is that while ACID only requires the DBMS to enforce consistency after a complete transaction, SQL Server will go further and enforce consistency after every single statement inside a transaction. It might be nice to insert rows into several tables in any order you wish. But if these rows reference each other with foreign keys, you still have to be careful about the order you do the inserting, transaction or no transaction.

Handling Inconsistencies
When SQL Server finds inconsistencies. It handles it in one of a few ways.

If a foreign key is defined properly, a change to one row can cascade to other other rows.
If a value of a particular datatype is inserted into a column which is defined to hold a different datatype, SQL Server may sometimes implicitly convert the value to the target datatype.
Most often, SQL Server gives up and throws an error, rolling back all effects of that statement.

Inconsistent Data Any Way
It also turns out that it’s very easy to work around these constraints! (Besides the all-too-common method of not defining constraints in the first place). Primary keys, Unique constraints and datatype validation are always enforced, no getting around them. But you can get around foreign keys and check constraints by

using WITH NOCHECK when creating a foreign key or a check constraint. You’re basically saying, enforce any new or changing data, but don’t bother looking at any existing data. These constraints will then be marked as not trusted
using the BULK INSERT statement (or other similar bulk operations) without CHECK_CONSTRAINTS. In this case foreign keys and check constraints are ignored and marked as not trusted.

Example

I’m taking the following example not from I.T., but from the world of medical labs.

When processing medical tests (at least in my part of the world), there’s a whole set of rules that medical professionals have to follow. Doctors and their staff have to fill in a requisition properly. The specimen collection centre has do verify that information, take samples and pass everything on to a lab. The lab that performs the tests, ensures that everything is valid before performing the test and sending back results to the doctor.

Just like a database transaction, the hope is that everything goes smoothly. All patient information is entered properly. Patients and lab techs have followed all appropriate instructions.

Fixing Inconsistent Data: It sometimes happens that information is entered incorrectly or missing (like insurance info, or the date and time of the test). In these cases, often the lab might call back for corrections before continuing with the test. This is similar to the case when SQL Server recognizes that a statement will not leave the database in a consistent state. In some cases, SQL Server can try to do something about it. For example it can do an implicit conversion of a datatype, or it can cascade a delete/update.

Giving Up And Rolling Back: But sometimes a medical test can’t be saved. For example, sometimes a sample arrives clotted when it should have arrived unclotted (or vice versa). In these cases, meaningful results aren’t possible and the whole test has to be rejected to be performed again correctly. SQL Server will do this whenever it’s necessary to maintain consistent data. It will raise an error and the entire statement or transaction is undone (to be corrected and performed again).

Counterexample

Well, this counterexample comes from the world of cheesy Science Fiction. Normally we want our databases to store only consistent and legal data. Any illegal data should be rejected right away. What we don’t want is for our databases to get hung up on some crazy inconsistent data.

But if you’re Captain Kirk and you need to deal with a rogue computer or robot that’s acting up. What do you do? Simple, confuse it with inconsistent information! Those robots won’t know what hit them.

This bit of dialog comes straight from an episode of Star Trek called “I, Mudd” (I’m not even making this up, Google it!)

Kirk: Norman, Everything Harry tells you is a lie, remember that, everything Harry tells you is a lie.
Harry Mudd: Listen to this carefully Norman: I am lying.
[Norman the android starts beeping, his light starts flashing and his ears start smoking]
Norman: You say you are lying but if everything you say is a lie then you are telling the truth but you cannot tell the truth because everything you say is a lie but you lie, you tell the truth but you cannot for you lie … Illogical! Illogical!
[more of the same, more smoke and kaboom]

Honest-to-God smoke from the ears! It’s so classic it gets parodied a lot. (Here’s one of my favourites, a comic from Cyanide and Happiness).

I’m grateful that our databases don’t choke on inconsistent data. They just throw an error and tell clients “Here, you deal with it!”.

-- Comments (11)

July 27, 2011

ACID Properties By Example (And Counterexample) Part One: Atomic

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", acid properties, atomic, sql — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

In the 1800s scientists first used the word atom to describe the smallest bits of an element. They picked that word because it meant un-cuttable or indivisible. The idea was that that’s as far as you can go; you can’t break down these bits further. Fast forward to the early 1900s and they found out that atoms actually can change (through radiation or splitting). But it was too late to change the language. Splittable or not, atoms are atoms.

This process of splitting atoms is so interesting that somehow the word atomic has come to refer this process of dividing atoms (e.g. atomic bomb, atomic energy).

Atomic Transactions in SQL Server

But when we talk about the word atomic as one of the ACID properties of transactions, the word regains its original meaning: indivisible. SQL Server transactions are always atomic. They’re all-or-nothing. To twist Meatloaf’s words, this means that two out of three is bad (It’s got to be three out of three or nothing). It’s often forgotten, but this applies to single statement transactions too; all of a statement (whether an update, delete or insert) will happen or not happen.

To guarantee atomicity, SQL Server uses a Write Ahead Transaction Log. The log always gets written to first before the associated data changes. That way, if and when things go wrong, SQL Server will know how to rollback to a state where every transaction happened or didn’t happen. There’s a lot more to it than that, but as a developer all I care about is that I can trust that my transactions don’t get split.

Example

Here’s an example from outside the I.T. industry. It’s a story about an all or nothing transaction. About two years ago, Samoa switched from driving on the right side of the road to the left (The NYT has a great article on it).
You can imagine the great effort that must go into a switch like this. And it has to happen all or nothing. The switch has to happen everywhere, all at once with no exceptions. Unlike other big projects that can usually be broken down into smaller phases, this one can’t.
Translated into SQL, this might be equivalent to:

BEGIN TRANSACTION
 
UPDATE ROADS
SET TrafficDirection = 'Left'
WHERE Country = 'Samoa';
 
UPDATE TRAFFIC_LIGHTS
SET TrafficDirectionMode = 'Left'
WHERE Country = 'Samoa';
 
UPDATE INTERSECTIONS
SET TrafficDirectionConfigurationMode = 'Left'
WHERE Country = 'Samoa'
 
COMMIT

It’s not an I.T. example, but you get the idea. If this “transaction” were not atomic there would be trouble!

Counter Example

An example of failed atomicity (outside I.T.). One word: Standards.
Say you want to create a universal standard for something (say the Metric system) the main purpose is to create it to be the single standard to replace all others. If you fail in your goal, you’ve added to the problem!
Some more successful universal standards:

http (over gopher etc…) for almost all web traffic
Blueray (over hd-dvd) for hi-def movie formats

But consider the Metric system. It’s mostly successful because of its large adoption. But because there are a few stragglers, it’s not as successful as it could be. Translated into SQL:

UPDATE EVERYTHING
SET Units = 'METRIC',
    Value = fn_Convert(Value, Units, 'METRIC')
-- no where clause!

This “statement” didn’t update everything. The “statement” wasn’t atomic and this continues to cause problems. One problem that comes to mind is the failed Mars Climate Orbiter mission.

Let’s be grateful for the all-or-nothing transactions we have in our databases!

-- Comments (14)

July 20, 2011

ACID Properties By Example (And Counterexample) Part Zero

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", acid properties, sql — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

So I’m introducing a small series about ACID properties as it applies to databases. (For other acid properties, talk to a chemist or Timothy Leary).

When talking about databases, ACID is an acronym that stands for Atomic, Consistent, Isolation and Durable. These are important properties of a database system’s architecture. Specifically these properties refer to how database transactions are designed.

In fact this stuff is important in any transaction processing system (TPS). These systems (not just database systems) use a server-client architecture and they first became popular in the 1960s. These systems are successful because they allow multiple clients to modify and share data concurrently all while enforcing data integrity! Not too shabby.

So most servers (including database servers) were built with this architecture in mind. It’s interesting that NoSQL databases don’t attempt to provide ACID transactions. Each of these NoSQL databases ignore one or more of these properties and attempt to offer something else in its place (but that’s a story for another day).

With SQL Server, these properties are enforced by default. But as it happens, you can relax these ACID properties in SQL Server if you want. We’ll see that it turns out to be easy (maybe too easy?) to write SQL that ignores some of these properties. The hope is that after reading this series, you’ll

be aware of the properties
understand why database transactions behave the way they do,
and be aware of any consequences if you’re tempted to give up any of these properties.

How This Series Is Organized

So I started this series as a single blog post, but it was getting a bit long for a single article. I wanted to come up with some examples (and counterexamples) other than the too common example of a money transfer between two bank accounts.

What you’ll see in this series is

a description of each ACID property.
A bit about how each property is handled in SQL Server,
An example from real life (but not necessarily an I.T. example!)
A counterexample from real life (but again, not necessarily an I.T. example!)

Stay tuned!

-- Comments (3)

July 6, 2011

Make Your Life Easier With Fun Denali Tricks

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", snippets, sql server denali — Michael J. Swart @ 6:37 pm

I’ve given SQL Server Denali a spin and I have to say that I really like it. I want to demonstrate how I use some of the new features to make my life easier. How much easier?

The Scenario

Using the Adventureworks database, I want to perform the following:

SELECT top 30 ModifiedDate, rowguid, EmailAddress 
into myNewTable
from Person.EmailAddress

And I want to distribute a script to other databases so I can’t depend on the Person.EmailAddress table. So the two things I want to do in my script are:

script the creation of the table and
load the data.

SQL Server Denali actually makes this easy and I don’t even need to use SQL Server’s existing scripting features. Here’s how:

Using dm_exec_describe_first_resultset

First I use the new dynamic management function dm_exec_describe_first_resultset. Aaron Bertrand explains this feature well at his post SQL Server v.Next (Denali) : Metadata enhancements. You can use it to describe the columns of a query. This is perfect for what I need here and I base my script on the results of this query (modified a bit from Aaron’s script):

DECLARE @sql NVARCHAR(MAX);
SET @sql = N'SELECT top 30 ModifiedDate, rowguid, EmailAddress 
from Person.EmailAddress
 ';
 
SELECT 
    CASE column_ordinal 
        WHEN 1 THEN '' ELSE ',' END 
        + name + ' ' + system_type_name + CASE is_nullable 
        WHEN 0 THEN ' not null' ELSE '' END
FROM 
    sys.dm_exec_describe_first_result_set
    (
        @sql, NULL, 0
    ) AS f
ORDER BY
    column_ordinal;

But it’s really inconvenient to remember that syntax isn’t it? That’s why I like to use Code Snippets!

Code Snippets

Code snippets are something new in Denali. You may be familiar with SQL Snippets as offered by Mladen Prajdić using his SSMS Tools Pack. Having used both, I would say that Mladen’s SQL Snippets are much much easier to manage than Denali’s Code Snippets (at least as of CTP1). Denali does have code snippets that surround selected text though. I demonstrate with this snippet:

DescribeResultSet.snippet is a code snippet you can download and import into SSMS. It’s an alternative to remembering the syntax above. Here’s the way it works, it’s pretty simple. Select the query you’re curious about so that it’s highlighted. Then use the snippet to incorporate your query into a metadata query as shown in the code sample above.

Like I said, Snippets are pretty powerful, but SQL Server’s Code Snippets have still got a few quirks that will hopefully be ironed out by RTM:

For example according to the docs, the surrounds with shortcut key combo is supposed to be Ctrl+K Ctrl+S, but with CTP1 it’s not hooked up yet.
Writing snippets is a pain in the Denali, If you plan to write many snippets, stick with Mladen Prajdić’s SSMS Tools Pack

Okay, that’s great, now what about getting the data into the table?

Block Editing

Well that requires an Insert statement. That’s easy enough to write, but the VALUES clause is harder to write. It can be pretty finicky creating the literals. But SQL Server Denali runs on Visual Studio 2010! And that means we can use all the nifty tricks that VS2010 offers. One of those is enhanced Box Selection features. You select a rectangular box of text by hold down the alt key while dragging a mouse over a selection. Once you do that and start typing, the things you type will be inserted on each line of text.

This makes formatting blocks of data easy as pie. Thanks Denali!

Seeing The Whole Thing In Action

Great! That’s how to script data into a new table. And it’s so much simpler using Denali.

If you care to see it in action, I’ve got a screen capture showing what I mean. It’s got no audio (yet) or annotations.

Cheers!

-- Comments (2)

June 29, 2011

Poking Around Inside Management Studio

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", management studio, profiler, ssms, trace — Michael J. Swart @ 12:00 pm

When SQL Server Management Studio (SSMS) talks to the database engine it uses the same system objects, tables and views that are available for any other database client to use. (The same can’t be said for system stored procedures which use crazy system-only functions. I mean check out the definitions of any system view or procedure with sp_helptext)

Eavesdropping On SSMS

So we can actually watch SSMS talking to the database using Profiler. We can take a look “behind the curtains” so to speak.

Here are the profiler settings that I always use:

Default trace template
Application Name LIKE ‘Microsoft SQL Server Management Studio%’
Hostname = <my computer’s name>

This lets us see the queries that SSMS is sending to SQL Server. It’s a trick I use often to give me an idea of what system views and objects might be important when managing SQL Server. (And managing SQL Server includes automating management tasks). So now I want to show you some examples.

SSMS Things I Wonder About

Here’s a few things that I wonder about. They’re not necessarily important things on their own, but they’ll help me show how to use profiler to look inside SSMS.

Thing 1: What’s that red down-arrow over a database user. It reminds me of a database that’s been brought “offline”, but I’ve never heard of an “offline” user.

Thing 2: When scripting views, how does SSMS retrieve definitions?

Thing 3: I have a database that is restored, but without recovery. SSMS shows it with a green up-arrow and with appended text (Restoring …). How does it know which databases are in that state?

No Longer Wondering

So here’s what I found.

Question

What’s that red down-arrow over a database user?

Profiler Info

When I refreshed the Users node in Object Explorer, I saw this in profiler.

What this means

Unsurprisingly, the red arrow means user has no db access and it doesn’t mean disabled user. In other terms, SSMS is looking for a list of users and whether or not they have db access. DB Access here is determined by whether users have been granted (database_permissions.state is ‘G’ or ‘W’) permission to connect (database_pemissions.type=’CO’).

Books Online

I couldn't find links for the Users node in Object Explorer, but here’s the documentation for sys.database_permissions.

Question

When scripting views, how does SSMS retrieve definitions?

Profiler Info

When I scripted a view to clipboard, this is what I saw in profiler.

What this means

So in this case, SSMS is looking to sys.sql_modules for the definition of the stored procedure. That’s good and it’s what I expected. I was a little afraid that it would use sys.syscomments (which was so 2000). But I was surprised that it also uses a view called sys.system_sql_modules in case the procedure was a system one. I was unaware of that view.

Books Online

Point for Microsoft, they document how to script objects well.

Question

How does SSMS know which databases are in a Restoring state?

Profiler Info

When I refreshed the Databases node in Object Explorer, I saw this in profiler.

What this means

If you squint at the SQL – a highly effective method for understanding SQL – you'll see that we can determine the restore status of a database by looking at the state column of the table master.sys.databases. In this case, when state=1 the database is in the Restoring state. There was actually nothing too surprising here. In this case, SSMS's queries match my expectations.

Books Online

Kudos to Microsoft again. Here is their docs for sys.databases (with more information on that state table).

This Goes For Any Application

To satisfy curiosity, I’ve profiled lots of other database applications on my development machine and you can too.
If you’re a real keener, and you want a self-guided deep dive into SQL database internals. Try profiling Danny Gould’s Internals Viewer. It’s an eye opener.

-- Comments (5)

« Newer Posts — Older Posts »