Do you remember Watson? It's the A.I. program that IBM developed that answers natural language questions. It kicked butt last year on Jeopardy. Well, IBM recently noticed that there was a large store of SQL Server knowledge to be had by looking through SQL Server questions on stackoverflow and by looking through #sqlhelp tagged questions on twitter. Well it wasn't too long before they had Watson answering SQL Help questions!
I love artificial intelligence. What could go wrong!
IBM planned to deliver Watson's software and hardware to Chicago (but somehow they got lost at Toronto's airport). With one thing and another, I was lucky enough to get involved and I'm proud to say that they let me preview their Watson vs. SQL Server solution that they've (embarassingly) called The SQL Server Help Oracle.
I've given it a spin and I have to say it's in fine form! It gives answers that are just like a DBA's answers. Without further ado... Here it is! Give it a spin! Ask away.
Introducing the SQL Server help oracle:
Your Question:Your questionThe SQL Server help oracle says:Your answer
I’ve recently come across a large number of methods that people use to avoid concurrency problems when programming an Insert/Update query. Rather than argue with some of them. I set out to test out how valid each method by using experiment! Here’s the myth:
Does Method X really perform Insert/Update queries concurrently, accurately and without errors?
… where X is one of a variety of approaches I’m going to take. And just like the Discovery Channel show Mythbusters, I’m going to call each method/procedure/myth either busted, confirmed or plausible based on the effectiveness of each method.
Actually, feel free to follow along at home (on development servers). Nothing here is really dangerous.
Here’s what my stored procedure should do. The stored procedure should look in a particular table for a given id. If a record is found, a counter field on that record is incremented by one. If the given id is not found, a new record is inserted into that table. This is the common UPSERT scenario.
I want to be able to do this in a busy environment and so the stored procedure has to co-operate and play nicely with other concurrent processes.
The Setup
The set up has two parts. The first part is the table and stored procedure. The stored procedure will change for each method, but here’s the setup script that creates the test database and test table:
/* Setup */ifDB_ID('UpsertTestDatabase')ISNULLcreatedatabase UpsertTestDatabase
go
use UpsertTestDatabase
go
ifOBJECT_ID('mytable')ISNOTNULLdroptable mytable;
go
createtable mytable
(
id int,
name nchar(100),
counter int,
primarykey(id),
unique(name));
go
/* Setup */
if DB_ID('UpsertTestDatabase') IS NULL
create database UpsertTestDatabase
go
use UpsertTestDatabase
go
if OBJECT_ID('mytable') IS NOT NULL
drop table mytable;
go
create table mytable
(
id int,
name nchar(100),
counter int,
primary key (id),
unique (name)
);
go
The second thing I need for my setup is an application that can call a stored procedure many times concurrently and asynchronously. That’s not too hard. Here’s the c-sharp program I came up with: Program.cs. It compiles into a command line program that calls a stored procedure 10,000 times asynchronously as often as it can. It calls the stored procedure 10 times with a single number before moving onto the next number This should generate 1 insert and 9 updates for each record.
Method 1: Vanilla
The straight-forward control stored procedure, it simply looks like this:
/* First shot */ifOBJECT_ID('s_incrementMytable')ISNOTNULLdropprocedure s_incrementMytable;
go
createprocedure s_incrementMytable(@id int)asdeclare @name nchar(100)=cast(@id asnchar(100))begintransactionifexists(select1from mytable where id = @id)update mytable set counter = counter +1where id = @id;
elseinsert mytable (id, name, counter)values(@id, @name, 1);
commit
go
/* First shot */
if OBJECT_ID('s_incrementMytable') IS NOT NULL
drop procedure s_incrementMytable;
go
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
begin transaction
if exists (select 1 from mytable where id = @id)
update mytable set counter = counter + 1 where id = @id;
else
insert mytable (id, name, counter) values (@id, @name, 1);
commit
go
It works fine in isolation, but when run concurrently using the application, I get primary key violations on 0.42 percent of all stored procedure calls! Not too bad. The good news is that this was my control scenario and now I’m confident that there is a valid concurrency concern here. And that my test application is working well.
Method 2: Decreased Isolation Level
Just use NOLOCKS on everything and all your concurrency problems are solved right?
ifOBJECT_ID('s_incrementMytable')ISNOTNULLdropprocedure s_incrementMytable;
go
createprocedure s_incrementMytable(@id int)asdeclare @name nchar(100)=cast(@id asnchar(100))settransactionisolationlevelreaduncommittedbegintransactionifexists(select1from mytable where id = @id)update mytable set counter = counter +1where id = @id;
elseinsert mytable (id, name, counter)values(@id, @name, 1);
commit
go
if OBJECT_ID('s_incrementMytable') IS NOT NULL
drop procedure s_incrementMytable;
go
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
set transaction isolation level read uncommitted
begin transaction
if exists (select 1 from mytable where id = @id)
update mytable set counter = counter + 1 where id = @id;
else
insert mytable (id, name, counter) values (@id, @name, 1);
commit
go
I find out that there are still errors that are no different than method 1. These primary key errors occur on 0.37 percent of my stored procedure calls. NOLOCK = NOHELP in this case.
Method 3: Increased Isolation Level
So let’s try to increase the isolation level. The hope is that the more pessimistic the database is, the more locks will be taken and held as they’re needed preventing these primary key violations:
ifOBJECT_ID('s_incrementMytable')ISNOTNULLdropprocedure s_incrementMytable;
go
createprocedure s_incrementMytable(@id int)asdeclare @name nchar(100)=cast(@id asnchar(100))settransactionisolationlevel serializable
begintransactionifexists(select1from mytable where id = @id)update mytable set counter = counter +1where id = @id;
elseinsert mytable (id, name, counter)values(@id, @name, 1);
commit
go
if OBJECT_ID('s_incrementMytable') IS NOT NULL
drop procedure s_incrementMytable;
go
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
set transaction isolation level serializable
begin transaction
if exists (select 1 from mytable where id = @id)
update mytable set counter = counter + 1 where id = @id;
else
insert mytable (id, name, counter) values (@id, @name, 1);
commit
go
Bad news! Something went wrong and while there are no primary key violations, 82% of my queries failed as a deadlock victim. A bit of digging tells me that several processes have gained shared locks and are also trying to convert them into exclusive locks… Deadlocks everywhere
Method 4: Increased Isolation + Fine Tuning Locks
Hmm… What does stackoverflow have to say about high concurrency upsert? A bit of research on Stackoverflow.com lead me to an excellent post by Sam Saffron called Insert or Update Pattern For SQL Server. He describes what I’m trying to do perfectly. The idea is that when the stored procedure first reads from the table, it should grab and hold a lock that is incompatible with other locks of the same type for the duration of the transaction. That way, no shared locks need to be converted to exclusive locks. So I do that with a locking hint.:
ifOBJECT_ID('s_incrementMytable')ISNOTNULLdropprocedure s_incrementMytable;
go
createprocedure s_incrementMytable(@id int)asdeclare @name nchar(100)=cast(@id asnchar(100))settransactionisolationlevel serializable
begintransactionifexists(select1from mytable with(updlock)where id = @id)update mytable set counter = counter +1where id = @id;
elseinsert mytable (id, name, counter)values(@id, @name, 1);
commit
go
if OBJECT_ID('s_incrementMytable') IS NOT NULL
drop procedure s_incrementMytable;
go
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
set transaction isolation level serializable
begin transaction
if exists (select 1 from mytable with (updlock) where id = @id)
update mytable set counter = counter + 1 where id = @id;
else
insert mytable (id, name, counter) values (@id, @name, 1);
commit
go
Zero errors! Excellent! The world makes sense. It always pays to understand a thing and develop a plan rather than trial and error.
Method 5: Read Committed Snapshot Isolation
I heard somewhere recently that I could turn on Read Committed Snapshot Isolation. It’s an isolation level where readers don’t block writers and writers don’t block readers by using row versioning (I like to think of it as Oracle mode). I heard I could turn this setting on quickly and most concurrency problems would go away. Well it’s worth a shot:
ALTERDATABASE UpsertTestDatabase
SET ALLOW_SNAPSHOT_ISOLATION ONALTERDATABASE UpsertTestDatabase
SET READ_COMMITTED_SNAPSHOT ON
go
ifOBJECT_ID('s_incrementMytable')ISNOTNULLdropprocedure s_incrementMytable;
go
createprocedure s_incrementMytable(@id int)asdeclare @name nchar(100)=cast(@id asnchar(100))begintransactionifexists(select1from mytable where id = @id)update mytable set counter = counter +1where id = @id;
elseinsert mytable (id, name, counter)values(@id, @name, 1);
commit
go
ALTER DATABASE UpsertTestDatabase
SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER DATABASE UpsertTestDatabase
SET READ_COMMITTED_SNAPSHOT ON
go
if OBJECT_ID('s_incrementMytable') IS NOT NULL
drop procedure s_incrementMytable;
go
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
begin transaction
if exists (select 1 from mytable where id = @id)
update mytable set counter = counter + 1 where id = @id;
else
insert mytable (id, name, counter) values (@id, @name, 1);
commit
go
Ouch! Primary key violations all over the place. Even more than the control! 23% of the stored procedure calls failed with a primary key violation. And by the way, if I try this with Snapshot Isolation, I not only get PK violations, I get errors reporting “Snapshot isolation transaction aborted due to update conflict”. However, combining method 4 with snapshot isolation once again gives no errors. Kudos to Method 4!
Other Methods
Here are some other things to try (but I haven’t):
Avoiding concurrency issues by using Service Broker. If it’s feasible, just queue up these messages and apply them one at a time. No fuss.
Rewrite the query above as: UPDATE…; IF @@ROWCOUNT = 0 INSERT…; You could try this, but you’ll find this is almost identical with Method 1.
So How Are We Going To Call This One?
So here are the results we have:
Concurrency Method
Status
Notes
Method 1: Vanilla
Busted
This was our control. The status quo is not going to cut it here.
Method 2: Decreased Isolation Level
Busted
NOLOCK = NOHELP in this case
Method 3: Increased Isolation Level
Busted
Deadlocks! Strict locking that is used with the SERIALIZABLE isolation level doesn’t seem to be enough!
Method 4: Increased Isolation + Fine Tuning Locks
Confirmed
By holding the proper lock for the duration of the transaction, I've got the holy grail. (Yay for StackOverflow, Sam Saffron and others).
Method 5: Read Committed Snapshot Isolation
Busted
While RCSI helps with most concurrency issues, it doesn't help in this particular case.
Other Methods: Service Broker
Plausible
Avoid the issue and apply changes using a queue. While this would work, the architectural changes are pretty daunting
Update! (Sept. 9, 2011) Other Methods: MERGE statement
With a huge number of comments suggesting this method (my preferred method), I thought I’d include it here to avoid any further confusion
Don’t try to apply these conclusions blindly to other situations. In another set of circumstances who knows what the results would be. But test for yourself. Like a good friend of mine likes to say: “Just try it!”
So that’s the show. If you have a myth you want busted. Drop me a line, or leave me a message in the comments section. Cheers ’til next time.
One of my jobs in University was to tutor first year computer science students. My friends would tease me and call me a “computer tutor” in a really nasally voice. But it was a good job and a good experience.
I did a large variety of things in that job. I did the usual things like running tutorials, marking papers, and helping students with their assignments. But I also gave campus tours and I gave workshops to other students.These workshops were very very brief introductions to various computer science topics.
I was assigned to give one of those workshops on the topic of something called SQL. It was the first time I had ever encountered anything database related and I was supposed to teach it! I had never even seen the word SQL before and it was years before I got used to pronouncing it sequel instead of ess cue ell. I was nervous then, but I don’t think I needed to be. I learned enough about that subject to teach it well and the preparation paid off. That experience made me comfortable around SQL and when I encountered this “database language” again, I found it easy to pick up where I left off.
During that job, I didn’t learn a lot about computers that I didn’t already know, but I did learn a lot about speaking and teaching. I learned how tricky it was to pace yourself. If you talk too quick, the subject matter goes over everyone’s head. Speak too slow and it sounds like you’re condescending and talking down to people.
I still like talking about SQL Server. At work, I often give lunch-and-learns. These are lunch-hour talks put on by coworkers for coworkers to talk about standards or to teach something that needs explaining. In the past few years, I’ve learned a lot about web development and I hope my colleagues have learned a bit about databases.
In general, I think public speaking is a good skill to have. Some are naturals (or seem to be) and others (like me) need the practice. So having said that …
I’ll Be Giving My First Talk at SQL Saturday #93 in Toronto
And so I’m super excited about giving a talk at SQL Saturday #93 in Toronto (September 17, 2011). SQL Saturday #93 is a free one-day workshop for SQL Server. (Register here!). The talk I’m going to give is called Obliterate Excessive Blocking and Deadlocking As a DB Developer, I think avoiding blocking is one of the most valuable skills to have. I’ll advertise this talk a bit more in a blog post next week. In the meantime…
Name that Caption!
That creepy picture of me up there is screaming for a better caption. Let me know your ideas in the comments, or put it in a tweet (@MJSwart). Let me know by the end of Wednesday (Sept. 7th). I’ll pick my favourite and let you know on Thursday!
Update September 15, 2011: So John Sansom is the lucky winner of the caption contest. John, I’ll buy you a drink next time we’re in the same city.