Mythbusting: Concurrent Update/Insert Solutions

September 8, 2011

Mythbusting: Concurrent Update/Insert Solutions

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", concurrency, high concurrency, upsert — Michael J. Swart @ 8:00 am

Update: This is a post on the topic of UPSERT. See my more recent post SQL Server UPSERT Patterns and Antipatterns

I’ve recently come across a large number of methods that people use to avoid concurrency problems when programming an Insert/Update query. Rather than argue with some of them. I set out to test out how valid each method by using experiment! Here’s the myth:

Does Method X really perform Insert/Update queries concurrently, accurately and without errors?

… where X is one of a variety of approaches I’m going to take. And just like the Discovery Channel show Mythbusters, I’m going to call each method/procedure/myth either busted, confirmed or plausible based on the effectiveness of each method.

Actually, feel free to follow along at home (on development servers). Nothing here is really dangerous.

Here’s what my stored procedure should do. The stored procedure should look in a particular table for a given id. If a record is found, a counter field on that record is incremented by one. If the given id is not found, a new record is inserted into that table. This is the common UPSERT scenario.

I want to be able to do this in a busy environment and so the stored procedure has to co-operate and play nicely with other concurrent processes.

The Setup

The set up has two parts. The first part is the table and stored procedure. The stored procedure will change for each method, but here’s the setup script that creates the test database and test table:

/* Setup */
if DB_ID('UpsertTestDatabase') IS NULL
    create database UpsertTestDatabase
go
use UpsertTestDatabase
go
 
if OBJECT_ID('mytable') IS NOT NULL
    drop table mytable;
go
 
create table mytable
(
    id int,
    name nchar(100),
    counter int,
    primary key (id),
    unique (name)
);
go

The second thing I need for my setup is an application that can call a stored procedure many times concurrently and asynchronously. That’s not too hard. Here’s the c-sharp program I came up with: Program.cs. It compiles into a command line program that calls a stored procedure 10,000 times asynchronously as often as it can. It calls the stored procedure 10 times with a single number before moving onto the next number This should generate 1 insert and 9 updates for each record.

Method 1: Vanilla

The straight-forward control stored procedure, it simply looks like this:

/* First shot */
if OBJECT_ID('s_incrementMytable') IS NOT NULL
drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

It works fine in isolation, but when run concurrently using the application, I get primary key violations on 0.42 percent of all stored procedure calls! Not too bad. The good news is that this was my control scenario and now I’m confident that there is a valid concurrency concern here. And that my test application is working well.

Method 2: Decreased Isolation Level

Just use NOLOCKS on everything and all your concurrency problems are solved right?

if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
set transaction isolation level read uncommitted
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

I find out that there are still errors that are no different than method 1. These primary key errors occur on 0.37 percent of my stored procedure calls. NOLOCK = NOHELP in this case.

Method 3: Increased Isolation Level

So let’s try to increase the isolation level. The hope is that the more pessimistic the database is, the more locks will be taken and held as they’re needed preventing these primary key violations:

if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
set transaction isolation level serializable
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

Bad news! Something went wrong and while there are no primary key violations, 82% of my queries failed as a deadlock victim. A bit of digging tells me that several processes have gained shared locks and are also trying to convert them into exclusive locks… Deadlocks everywhere

Method 4: Increased Isolation + Fine Tuning Locks

Hmm… What does stackoverflow have to say about high concurrency upsert? A bit of research on Stackoverflow.com lead me to an excellent post by Sam Saffron called Insert or Update Pattern For SQL Server. He describes what I’m trying to do perfectly. The idea is that when the stored procedure first reads from the table, it should grab and hold a lock that is incompatible with other locks of the same type for the duration of the transaction. That way, no shared locks need to be converted to exclusive locks. So I do that with a locking hint.:

if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
set transaction isolation level serializable
begin transaction
if exists (select 1 from mytable with (updlock) where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

Zero errors! Excellent! The world makes sense. It always pays to understand a thing and develop a plan rather than trial and error.

Method 5: Read Committed Snapshot Isolation

I heard somewhere recently that I could turn on Read Committed Snapshot Isolation. It’s an isolation level where readers don’t block writers and writers don’t block readers by using row versioning (I like to think of it as Oracle mode). I heard I could turn this setting on quickly and most concurrency problems would go away. Well it’s worth a shot:

ALTER DATABASE UpsertTestDatabase
SET ALLOW_SNAPSHOT_ISOLATION ON
 
ALTER DATABASE UpsertTestDatabase
SET READ_COMMITTED_SNAPSHOT ON
go
 
if OBJECT_ID('s_incrementMytable') IS NOT NULL
	drop procedure s_incrementMytable;
go
 
create procedure s_incrementMytable(@id int)
as
declare @name nchar(100) = cast(@id as nchar(100))
 
begin transaction
if exists (select 1 from mytable where id = @id)
	update mytable set counter = counter + 1 where id = @id;
else 
	insert mytable (id, name, counter) values (@id, @name, 1);
commit
go

Ouch! Primary key violations all over the place. Even more than the control! 23% of the stored procedure calls failed with a primary key violation. And by the way, if I try this with Snapshot Isolation, I not only get PK violations, I get errors reporting “Snapshot isolation transaction aborted due to update conflict”. However, combining method 4 with snapshot isolation once again gives no errors. Kudos to Method 4!

Other Methods

Here are some other things to try (but I haven’t):

Avoiding concurrency issues by using Service Broker. If it’s feasible, just queue up these messages and apply them one at a time. No fuss.
Rewrite the query above as: UPDATE…; IF @@ROWCOUNT = 0 INSERT…; You could try this, but you’ll find this is almost identical with Method 1.

So How Are We Going To Call This One?

So here are the results we have:

Concurrency Method	Status	Notes
Method 1: Vanilla	Busted	This was our control. The status quo is not going to cut it here.
Method 2: Decreased Isolation Level	Busted	NOLOCK = NOHELP in this case
Method 3: Increased Isolation Level	Busted	Deadlocks! Strict locking that is used with the SERIALIZABLE isolation level doesn’t seem to be enough!
Method 4: Increased Isolation + Fine Tuning Locks	Confirmed	By holding the proper lock for the duration of the transaction, I've got the holy grail. (Yay for StackOverflow, Sam Saffron and others).
Method 5: Read Committed Snapshot Isolation	Busted	While RCSI helps with most concurrency issues, it doesn't help in this particular case.
Other Methods: Service Broker	Plausible	Avoid the issue and apply changes using a queue. While this would work, the architectural changes are pretty daunting
Update! (Sept. 9, 2011) Other Methods: MERGE statement	Busted	See comments section
Update! (Feb. 23, 2012) Other Methods: MERGE statement + Increased Isolation	Confirmed	With a huge number of comments suggesting this method (my preferred method), I thought I’d include it here to avoid any further confusion

Don’t try to apply these conclusions blindly to other situations. In another set of circumstances who knows what the results would be. But test for yourself. Like a good friend of mine likes to say: “Just try it!”

So that’s the show. If you have a myth you want busted. Drop me a line, or leave me a message in the comments section. Cheers ’til next time.

-- Comments (50)

50 Comments »

Mladen Prajdic reminded me about the MERGE statement which looks like it’s just meant for Update/Inserts. And he’s absolutely right. He actually treated this subject on his blog here.

We find that the behaviour of the Merge statement insert/update is the same as Method 1.

CREATE PROCEDURE s_incrementMytable(@id INT)
AS
DECLARE @name NCHAR(100) = CAST(@id AS NCHAR(100))
 
BEGIN TRANSACTION
	MERGE mytable AS myTarget
	USING (SELECT @id AS id) AS mySource
		ON mySource.Id = myTarget.id
	WHEN MATCHED THEN UPDATE
		SET counter = myTarget.counter + 1
	WHEN NOT MATCHED THEN 
		INSERT (id, name, counter) 
		VALUES (@id, @name, 1);
COMMIT
GO

Running this through my test suite gives me 0.43 percent primary key violation errors.

Personally, when developing for SQL Server 2008 and later, I use the MERGE statement combined with method 3 or 4 above.

Comment by Michael J. Swart — September 8, 2011 @ 11:15 am

Oracle has SELECT FOR UPDATE. It’s a shame that Microsoft hides the same functionality in a locking hint. Thanks for pointing out this far less discoverable feature of SQL Server!

Comment by Mark Freeman — September 8, 2011 @ 11:40 am
I think you’re right Mark. I’m always hesitant to use locking hints. The SELECT FOR UPDATE syntax seems more natural.

Comment by Michael J. Swart — September 8, 2011 @ 11:48 am
What about the TRY CATCH JFDI pattern?
BEGIN TRY BEGIN TRY INSERT .. END TRY BEGIN CATCH IF ERROR_NUMBER() = 2627 UPDATE ... ELSE ... END CATCH END CATCH
BEGIN TRY BEGIN TRY INSERT .. END TRY BEGIN CATCH IF ERROR_NUMBER() = 2627 UPDATE ... ELSE ... END CATCH END CATCH
Also mentioned here on StackOverflow: http://stackoverflow.com/q/4338984/27535

Cheers
gbn

Comment by gbn — September 8, 2011 @ 1:21 pm
Hey gbn! I recognize you from your prolific stackoverflow participation!
Yep, confirmed! Zero errors with that method which I’ll call “Increased Isolation + <ahem> JFDI”.
It seems a bit clumsy, but you can’t argue with results. Thanks!

Comment by Michael J. Swart — September 8, 2011 @ 1:56 pm
[…] Mythbusting: Concurrent Update/Insert Solutions – Busting myths this week with an excellent look at a common programming pattern from Michael J. Swart (Blog|Twitter). Includes source code for you to follow along and experiment with. Can you find alternative successful methods? […]

Pingback by Something for the Weekend – SQL Server Links 09/09/11 — September 9, 2011 @ 7:43 am
Great post Michael! Thank you, posts like yours go a long way in helping prove out theory with simple scripts all in a centralized location. To that end you might want to add a test case for SERIALIZABLE and MERGE. It was an item that recently came up in the SQLTEAM forums and this post could be a perfect reference in answering those questions. As it was still only in your imagination at the time the post referenced instead was http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx

Comment by Robert Matthew Cook — September 9, 2011 @ 10:10 pm
Hey Robert! Thanks for the great compliment.

I like the scientific method, that “the only test for truth is experiment.” And I love Mythbusters (the reasons why are summed up perfectly here http://xkcd.com/397/).

MERGE + SERIALIZABLE works I believe. I can’t call it confirmed until I can get to my other computer. When I do, I’ll add another comment here and update the post (again). In the meantime, the really curious can try it out themselves at home without the usual worry about the dangers of explosives 🙂

Comment by Michael J. Swart — September 9, 2011 @ 10:29 pm

Hi Michael, I wrote a variation to get rid of the initial select using the OUTPUT clause and reduce table workload. Do you think it would work without the serializable level?

CREATE PROCEDURE s_incrementMytable ( @id int ) 
AS
 
BEGIN
 
	DECLARE @name nchar ( 100 ) = CAST(@id AS nchar ( 100 ));
 
	DECLARE @updated TABLE ( i int ) ;
 
	SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
 
	UPDATE mytable
	SET counter = counter + 1
	OUTPUT DELETED.id
	INTO @updated
	WHERE id = @id;
 
	IF NOT EXISTS ( SELECT i FROM @updated ) 
		INSERT INTO mytable (
			id
		  , name
		  , counter ) 
		VALUES (
			   @id
			 , @name
			 , 1 ) 
		;
END;
 
GO

Comment by Sebastiano Lanza — September 13, 2011 @ 8:35 am

Hi Sebastiano, You know you can find this out for yourself if you are able to compile the source code. 🙂

Actually, the way you’ve written the stored procedure, it fails 2.5% of the time with PK violations because I think you forgot to add BEGIN TRANSACTION and COMMIT. If you remember those, then the procedure works just fine with zero errors.

And without the serializable isolation level or the begin transaction, it does fail the same way Method 1: Vanilla fails: with PK violations.

Hope that helps!

Comment by Michael J. Swart — September 13, 2011 @ 8:49 am

One more variety that I’ve used since…

IF OBJECT_ID('s_incrementMytable') IS NOT NULL 
    DROP PROCEDURE s_incrementMytable;
GO
 
CREATE PROCEDURE s_incrementMytable
(
	@id INT
)
AS 
 
DECLARE @name NCHAR(100) = CAST(@id AS NCHAR(100))
 
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
 
BEGIN TRANSACTION;
 
UPDATE
    dbo.mytable WITH (UPDLOCK, HOLDLOCK)
SET 
    counter = counter + 1
WHERE
    id = @id;
 
IF(@@ROWCOUNT = 0)
BEGIN
	INSERT
		dbo.mytable
		(id
		, name
		, counter)
	VALUES
		(@id
		, @name
		, 1);
END
 
COMMIT
GO

Comment by Mark Storey-Smith — September 14, 2011 @ 8:34 am

Hi Mark,
That works well (Confirmed). I notice that the READ COMITTED transaction level is used instead of the SERIALIZABLE level and so the HOLDLOCK is needed. That’s really good style 🙂

Comment by Michael J. Swart — September 14, 2011 @ 8:44 am

Zero errors with merge. 🙂

CREATE PROCEDURE s_incrementMytable ( @id INT )
AS 
    DECLARE @name NCHAR(100) = CAST(@id AS NCHAR(100))
 
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
 
    BEGIN TRANSACTION
 
    MERGE mytable WITH ( UPDLOCK ) AS myTarget
        USING (SELECT @id AS id) AS mySource
        ON mySource.id = myTarget.id
        WHEN MATCHED 
            THEN UPDATE
               SET [counter] += 1
        WHEN NOT MATCHED 
            THEN
			INSERT  ( id, name, [counter] )
            VALUES( @id , @name , 1 );
 
    COMMIT
GO

Comment by sHWED — September 15, 2011 @ 7:55 am

Yep sHWED,
MERGE combined with method 3 (increased isolation) gives no errors and is my preferred way of doing this (See comments #1, #7, #8).

Comment by Michael J. Swart — September 15, 2011 @ 8:34 am
Just tested it out on a quad-core machine using HOLDLOCK. (READ COMMITED ISOLATION)

Barrier is honored, everything works as expected.

MERGE on MSDN:

http://msdn.microsoft.com/en-us/library/bb510625.aspx

“Specifies one or more table hints that are *** applied on the target table *** for each *** of the insert, update, or delete actions that are performed **** by the MERGE statement”

Test Code:

ALTER PROCEDURE s_incrementMytable(@id INT)
AS
DECLARE @name NCHAR(100) = CAST(@id AS NCHAR(100))

BEGIN TRANSACTION
MERGE mytable WITH ( HOLDLOCK ) AS myTarget
USING (SELECT @id AS id) AS mySource
ON mySource.Id = myTarget.id
WHEN MATCHED THEN UPDATE
SET counter = myTarget.counter + 1
WHEN NOT MATCHED THEN
INSERT (id, name, counter)
VALUES (@id, @name, 1);

MERGE mytable2 WITH ( HOLDLOCK ) AS myTarget2
USING (SELECT @id AS id) AS mySource
ON mySource.Id = myTarget2.id
WHEN MATCHED THEN UPDATE
SET counter = myTarget2.counter + 1
WHEN NOT MATCHED THEN
INSERT (id, name, counter)
VALUES (@id, @name, 1);

COMMIT
GO

p.s

You have a small bug in the test code… it’s:

catch (Exception ex)
{
//counter++;
Interlocked.Increment(ref counter);
Console.WriteLine(ex.Message);
}

Comment by itai — January 18, 2012 @ 7:07 pm
Thanks itai,

You’re one of the few who are keen enough to try-and-see. It’s good to see. Thanks for that.

And thanks for the heads up on the multi-threaded method of incrementing a counter. I write mostly about SQL Server, so it’s nice to know that someone’s looking at my C# stuff. I’ll update the code.

Michael

Comment by Michael J. Swart — January 30, 2012 @ 8:58 am
[…] writes: Concurrency is kind of hard to get right in this […]

Pingback by Collecting Bugs « Desire2Learn Valence — February 27, 2012 @ 11:27 am
[…] Concurrency. If this is important to you, remember to use appropriate locks (usually UPDLOCK) on the target table. […]

Pingback by MERGE Statement Generator | Michael J. Swart — May 30, 2012 @ 12:01 pm
Hi Michael,

I came across your work while researching how to convert a MySQL statement to MSSQL. Thank you for it as it will help with my work.

I have included a (untested) version of the statement to match your post. I expect that I will end up using a version of method 4.

INSERT INTO mytable (id,name,counter) VALUES ($id, $name, 1)
ON DUPLICATE KEY UPDATE name = $name, counter = counter + 1;

Unless you can bust the myth that there is no equivalent way to do this in MSSQL.

Regards, Mike.

Comment by Mike — February 11, 2013 @ 10:23 am
Hey Mike,

Thanks for the feedback. I’m not too familiar with MySQL syntax, but this stackoverflow question seems to match what your issue is:
Possible to simulate the mySQL functionality ON DUPLICATE KEY UPDATE with SQL Server

“ON DUPLICATE KEY UPDATE” seems like a nice feature, but it doesn’t actually exist in SQL Server (like you guessed).

I also expect that a version of method 4 is called for. If I were in your shoes I think I would probably do the same.

Good luck,
Michael

Comment by Michael J. Swart — February 11, 2013 @ 10:29 am
[…] A multi-threaded executable which sends queries asynchronously. A framework I first developed in a post I wrote on concurrency. […]

Pingback by Follow up on Ad Hoc TVP contention | Michael J. Swart — February 28, 2013 @ 12:54 pm
How about server load and speed stats?
It may be that 100% success comes with a price in performance? (there may even be a case where a 0.4% fail rate is acceptable, if it means more processing is done overall, at least would be good to know)

Comment by Merlin — February 4, 2015 @ 10:41 am
Hi Merlin,

Performance was not the question I was testing, it was “Does Method X really perform Insert/Update queries concurrently, accurately and without errors?” I like to joke with others about performance vs accuracy. I tell them I can get the database to perform blazing fast as long as you don’t care about the accuracy 🙂

If you have retry logic in the application that can tolerate a certain amount of failures then you can consider performance. My own practice is to only rely on retry logic as a last resort.

Try the tests out yourself and let me know how they go. I’d be very surprised if the accurate concurrency methods turned out to perform significantly worse than the others.

Comment by Michael J. Swart — February 4, 2015 @ 11:14 am
It’s worth pointing out that MERGE has a ton of other issues, besides also needing the lock hint:
http://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

Comment by SNB — April 23, 2015 @ 11:00 am
Hi SNB,

Some of those issues are fixed since that article some of those issues are not issues.
And one of those issues (basic merge upsert causing deadlocks) recognizes that the MERGE statement needs that lock hint.

I’m not trying to defend the MERGE statement. I’ve written before that “I now feel distrustful and wary of the MERGE statement” based on performance issues I’ve seen. But I wouldn’t say MERGE has a ton of issues. I would say that MERGE statement has a small number of important issues.

Michael

Comment by Michael J. Swart — April 23, 2015 @ 11:43 am
Fair enough, “ton” was the wrong word 🙂
I was just a little shocked to see several of them closed without fixes.

Comment by SNB — April 27, 2015 @ 6:43 am
[…] In fact one of my favorite blog posts is about getting concurrency right. It’s called Mythbusting: Concurrent Update/Insert Solutions. The lesson here is just try […]

Pingback by Don’t Abandon Your Transactions | Michael J. Swart — October 6, 2015 @ 12:08 pm
[…] In fact one of my favorite blog posts is about getting concurrency right. It’s called Mythbusting: Concurrent Update/Insert Solutions. The lesson here is just try […]

Pingback by Don’t Abandon Your Transactions - SQL Server - SQL Server - Toad World — October 6, 2015 @ 12:31 pm
[…] Mythbusting: Concurrent Update/Insert Solutions Does Method X really perform Insert/Update queries concurrently, accurately and without errors? https://michaeljswart.com/2011/09/mythbusting-concurrent-updateinsert-solutions/ […]

Pingback by SQL Server – Locking – Primary Key Violation | Daniel Adeniji's - Learning in the Open — October 13, 2015 @ 9:47 pm
Michael:

Thanks so much for doing such a fine job on this.

The compartmentalization alone, SQL – Stored Procedures and C# modules, makes this so easily approachable.

For the C# Application, Program.cs, I added “using System.Threading;” to cover your slight modification of replacing “counter++”; with a thread safe “Interlocked.Increment(ref counter);”.

And, for the Stored Procedures, I added a manager\driver that will call the individual SPs based on the specific SP that we are testing at the time.

I really liked Dan Guzman’s idea about adding “SET XACT_ABORT ON” to SPs that have their own transaction handling management. The specific post is http://weblogs.sqlteam.com/dang/archive/2007/10/28/Conditional-INSERTUPDATE-Race-Condition.aspx

Again, thanks for leaving some grains behind in your plentiful harvest.

Comment by Daniel Adeniji — October 14, 2015 @ 3:07 am
Thanks Daniel,
I’m really glad that you decided to follow along actively.
It’s funny you should mention Dan Guzman’s advice about XACT_ABORT. I just recently published a post last week, with that same message: https://michaeljswart.com/2015/10/dont-abandon-your-transactions/

Comment by Michael J. Swart — October 14, 2015 @ 8:47 am
[…] To fix, we adjusted the text of the query to be concurrent without using TRY/CATCH for flow control. For the extra curious, we used method 4 from Mythbusting: Concurrent Update/Insert Solutions. […]

Pingback by Case study: Troubleshooting Doomed Transactions | Michael J. Swart — May 26, 2017 @ 8:53 am
[…] some interesting examples such as here but nothing that quite fits my need. I could try the insert first as here but I think I’m […]

Pingback by Upsert with concurrency checking SQL Server – program faq — February 15, 2018 @ 6:24 pm
Another tool to add to the quiver is sp_getapplock. It’s mentioned in several blog posts, but I first came across it when reading Kalen Delaney’s “SQL Server Concurrency”. It can be useful when trying to reduce deadlock frequency without having to get a full table lock.

I don’t believe sp_getapplock should be used instead of a proper locking strategy, but rather it should be used in conjunction with a sound underlying locking and transaction strategy to reduce the frequency of conflicts. My reasoning is that sp_getapplock is advisory, so the code should assume that other access to the underlying tables won’t be using sp_getapplock. Thus, the code should be developed to work properly (albeit possibly with deadlocks) in the face of concurrent execution of code that doesn’t obey the applock.

In my scenario, I have a number of clients making single-record updates in a table, but I have other clients making batch updates in the same table. The batch updates update the first n records that match variable criteria (it differs from client to client). Because the batch update criteria vary from client to client, it’s possible for them to start updating different pages and then collide on each other’s pages. This deadlocking was reducing throughput. Luckily all access is via stored procedures, so I used sp_getapplock to ensure that only one batch update could execute at a time. The result was the elimination of deadlocks, and I didn’t have to block the single-record updates.

Comment by Toby Ovod-Everett — March 22, 2018 @ 2:07 pm
Hi Toby,

I’m glad that works for you.

The use of sp_getapplock will certainly do the trick. It accomplishes concurrency by restricting activity to one query at a time.
Your milage varies, but we once considered using sp_getapplock. But quickly abandoned that idea once we measured performance.

Comment by Michael J. Swart — March 22, 2018 @ 2:16 pm
Hi Mike, It is a nice and excellent example…Awesome…

Comment by Prasad — September 21, 2018 @ 2:53 am
Why is the id column set to equal the name? The upsert transaction in the vanilla case is:

begin transaction
if exists (select 1 from mytable where id = @id)
update mytable set counter = counter + 1 where id = @id;
else
insert mytable (id, name, counter) values (@id, @name, 1);
commit

Why do you insert into the id (primary key identity) column? That will cause the “if” condition to be subject to mutatation errors.

If the ‘name’ doesn’t exist then insert it and have the procedure return the newly generated id.

Something like this should work:

begin transaction
begin try

Comment by Steve Collins — September 30, 2018 @ 6:48 pm
Hi Steve,
For the purposes of the demo, there’s no reason that name and id have to be the same.
Without an identity column, there’s no identity to return.
I’m not sure what other point you’re trying to make.

Comment by Michael J. Swart — September 30, 2018 @ 7:35 pm
Mike, thanks for the reply! I’ve learned a lot about isolation levels here.

>>>”quotes”<<>>”The stored procedure should look in a particular table for a given id. If a record is found, a counter field on that record is incremented by one. If the given id is not found, a new record is inserted into that table. This is the common UPSERT scenario.”<<<

The id to be looked for is the numeric primary key of the table? When looking for a particular numeric id the application (or whatever is calling the sp) generally is passing back an id previously provided by the db, no? Suppose the sp returns a result set (numeric pk 'id', unique nvarchar alternate key 'email'). Select 1 record for upserting. Why not do the look up on the alternate key, then let SQL Server assign a numeric primary key value when inserting new records?

Comment by Steve Collins — October 1, 2018 @ 9:32 am
Hi Steve,
I see what you’re saying. In the example I provided, the id is provided by the application only. You’re suggesting that the whole example be rewritten so that the id is generated by the db instead. Which would be possible if there were an alternate key (you’re suggesting a new column, like email).

You’re pointing out that the example differs from what you usually see as a more general practice of having ids generated by the database.

In real life, it is valuable to revisit the data model to see if a simpler, cleaner refactoring is possible.
But for the purpose of demonstrating UPSERT concurrency problems, I came up with this fictitious example which goes against the general practice you point out.

In my experience, I’ve found this kind of refactoring is sometimes not possible, or not enough of a solution to avoid the concurrency problems mentioned in this article.

I mention later on that readers shouldn’t try to “apply these conclusions blindly to other situations. In another set of circumstances who knows what the results would be.”

Comment by Michael J. Swart — October 1, 2018 @ 2:20 pm
Following along with the samples (and the newer post, where email is pk), here’s my attempt to refactor using a unique alternate key (email) and a numeric pk (e_id) assigned by the database.

if object_id(’emails’) is not null
drop table emails;
go
create table emails(
e_id int identity(1,1) primary key not null,
email nvarchar(512) unique not null,
address_count int not null);
go

insert emails(email, address_count) values
(‘fred@sampleco.com’, 0),
(‘jing@americanexample.org’, 0),
(‘walter@eurpeanexample.org’, 0),
(‘jen@example.gov’, 0),
(‘xiang@storefront.com’, 0),
(‘frederico@embassy.com’, 0),
(‘wendy@embassy.com’, 0),
(‘jose@ambassidor.com’, 0);
go

This creates 8 email addresses each with email_count=0.

The procedure to upsert and increment email_counter:

if object_id(‘s_incrementMytable’) is not null
drop proc s_incrementMytable;
go
create proc s_incrementMytable
@email nvarchar(512),
@test_id int output,
@test_msg varchar(2048) output
as
set nocount on;
begin tran
begin try
declare
@temp_e_id table(e_id int not null);

update emails
set
address_count+=1
output
inserted.e_id into @temp_e_id
where
email=@email;
if @@rowcount=1
begin
select @test_id=e_id from @temp_e_id;
select @test_msg=’Updated’;
end
else
begin
insert emails(email, address_count) values(@email, 1);
select @test_id=scope_identity();
select @test_msg=’Inserted’;
end
commit;
end try
begin catch
select @test_id=0;
select @test_msg=error_message();
rollback tran;
end catch
go

The upsert is accomplished by attempting to update the table and inserting if @@rowcount=0. This tries to mirror the vanilla case.

Test: (the first time the sp is executed for a new email (‘joe@testsp.com’))

declare
@new_e_id int,
@out_test_msg varchar(2048);

exec s_incrementMytable ‘joe@testsp.com’, @test_id=@new_e_id output, @test_msg=@out_test_msg output;
print (‘e_id=’+cast(@new_e_id as varchar(6)));
print (@out_test_msg);

Result:
e_id=9
Inserted

Test: (execute sp again for same email address)
Result:
e_id=9
Updated

Test: select * from emails where email=’joe@testsp.com’;
Result:
e_id email address_count
12 joe@testsp.com 2

Comment by Steve Collins — October 2, 2018 @ 8:50 am
Hi Steve,

I appreciate your efforts to refactor here.
But when I create a program to exercise your example concurrently and run it, I get several errors like this:
Violation of UNIQUE KEY constraint 'UQ__emails__AB6E616423C5ECCC'. Cannot insert duplicate key in object 'dbo.emails'. The duplicate key value is (user978@domain.com).

I’m not sure if you thought your example would be free of concurrency problems.
If you thought it would be free of concurrency problems, then unfortunately, this refactoring is not an improvement.
If you knew it would have concurrency problems, I’m not sure what idea you’re trying to express here.

Comment by Michael J. Swart — October 2, 2018 @ 9:59 am
The upsert attempts to mirror your vanilla case which also has concurrency issues. The idea was to create a roughly equivalent baseline example which might be somewhat ‘real world’. Sorry if that wasn’t clear. Fwiw, I would use a common table expression and a merge statement. Also, in the ‘real world’ the cs program would cast a return value from the sp to determine if the operation was a success (and maybe retry, if it wasn’t).

Comment by Steve Collins — October 2, 2018 @ 10:56 am
Have an issue with the below stored proc when there is parallel execution. Getting duplcates. How to avoid the duplicates?

DECLARE @ReservedIds table(RowId int identity(1,1),seatid int, seatClass varchar(20))
——-Fill Multiple Ids into @ReservedIds with a count———-
—————–

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
IF EXISTS(SELECT * FROM [dbo].[AvailableSeatIds] WITH (UPDLOCK)
WHERE SeatId IN(select seatid from @ReservedIds)
AND [ReservationStatusId]=1 AND [IsChosen]=0)

Update ST Set ST.[IsChosen]=1,
ST.[ReservationStatusId] = @ReservedStatusId,
ST.[DateReserved] =@CurrentDate
FROM [dbo].[AvailableSeatIds] ST
WHERE ST.Id in(select seatid from @ReservedIds)
AND ST.HallId=@ActualHallId
AND ST.[IsChosen]=0;

COMMIT TRANSACTION;

Comment by G Yan — January 25, 2019 @ 11:59 am
Hi G Yan,
For help on specific problems like these, I think http://dba.stackexchange.com will get you the best answers fastest.

Comment by Michael J. Swart — January 25, 2019 @ 12:14 pm
I do not see any difference between Method #3 and Method #4 code. Please suggest which is correct Method #4. Thanks

Comment by Nasir Abbas — September 20, 2020 @ 2:06 am
Hi Michael,

An old but fantastic article that cuts through the hearsay with actual applied testing for a very common line-of-business use case. In my circumstance, I am literally implementing a counter table and want it bullet proof.

A quick heads up though that I suspect your test Program.cs has a race condition issue – it will report its results _before_ the execution of the 100,000 iterations has completed (the loop exits whilst a significant number of asynchronous database requests are still active), leaving the potential that a violation that occurs at the end of batch would go undetected. I found this to be the case whilst expanding upon your test code to ensure that, not only did my SQL counter implementation not raise any PK violations, but it never produced any duplicate output values.

Best regards,

Alexander

Comment by Alexander — December 11, 2020 @ 5:27 am
Thanks for this article Michael.

In response to Comment #46 :

Method #4 adds the text “with (updlock)” in the line after begin transaction.

Comment by Grant Johnson — March 10, 2022 @ 4:52 pm
[…] with MERGE (and as Michael Swart demonstrated here), you will still want to surround any of these methods with proper transactions, error handling and […]

Pingback by Can I use the MERGE statement in SQL Server 2005? – w3toppers.com — October 16, 2022 @ 1:40 pm
[…] : You may also find this article on concurrent update/insert solutions (aka upsert) interesting. It puts several common methods of […]

Pingback by Use transactions for select statements? – Row Coding — May 21, 2023 @ 4:20 pm