Michael J. Swart

July 20, 2017

SQL Server UPSERT Patterns and Antipatterns

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:34 am
Upsert Patterns and Antipatterns
Good ways and bad ways to update or insert rows

In 2015, Postgres added support for ON CONFLICT DO UPDATE to their INSERT statements. Their docs say that using this syntax guarantees an atomic INSERT or UPDATE outcome; one of those two outcomes is guaranteed, even under high concurrency.

Now, that is one amazing feature isn’t it? Postgres worked really hard to include that feature and kudos to them. I would trade ten features like stretch db for one feature like this. It makes a very common use case simple. And it takes away the awkwardness from the developer.

But if you work with SQL Server, the awkwardness remains and you have to take care of doing UPSERT correctly under high concurrency.

Easy things should be easy

I wrote a post in 2011 called Mythbusting: Concurrent Update/Insert Solutions. But since then, I learned new things, and people have suggested new UPSERT methods. I wanted to bring all those ideas together on one page. Something definitive that I can link to in the future.

The Setup

I’m going consider multiple definitions of a procedure called s_AccountDetails_Upsert. The procedure will attempt to perform an UPDATE or INSERT to this table:

CREATE TABLE dbo.AccountDetails
(
    Email NVARCHAR(400) NOT NULL CONSTRAINT PK_AccountDetails PRIMARY KEY,
    Created DATETIME NOT NULL DEFAULT GETUTCDATE(),
    Etc NVARCHAR(MAX)
);

Then I’m going to test each definition by using a tool like SQL Query Stress to run this code

declare @rand float = rand() + DATEPART( second, getdate() );
declare @Email nvarchar(330) = cast( cast( @rand * 100 as int) as nvarchar(100) ) + '@somewhere.com';
declare @Etc nvarchar(max) = cast( newid() as nvarchar(max) );
exec dbo.s_AccountDetails_Upsert @Email, @Etc;

Antipattern: Vanilla Solution

If you don’t care about concurrency, this is a simple implementation. Nothing fancy here:

CREATE PROCEDURE s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
  IF EXISTS ( SELECT * FROM dbo.AccountDetails WHERE Email = @Email )
 
    UPDATE dbo.AccountDetails
       SET Etc = @Etc
     WHERE Email = @Email;
 
  ELSE 
 
    INSERT dbo.AccountDetails ( Email, Etc )
    VALUES ( @Email, @Etc );

Unfortunately, under high concurrency, this procedure fails with primary key violations.

Antipattern: MERGE Statement

The MERGE statement still suffers from the same concurrency issues as the vanilla solution.

CREATE PROCEDURE s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
    MERGE dbo.AccountDetails AS myTarget
    USING (SELECT @Email Email, @Etc etc) AS mySource
        ON mySource.Email = myTarget.Email
    WHEN MATCHED THEN UPDATE
        SET etc = mySource.etc
    WHEN NOT MATCHED THEN 
        INSERT (Email, Etc) 
        VALUES (@Email, @Etc);

Primary Key violations are still generated under high concurrency. Even though it’s a single statement, it’s not isolated enough.

Antipattern: Inside a Transaction

If you try putting the vanilla solution inside a transaction, it makes the whole thing atomic, but still not isolated. You still get primary key violations:

CREATE PROCEDURE s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
  BEGIN TRAN
    IF EXISTS ( SELECT * FROM dbo.AccountDetails WHERE Email = @Email )
 
      UPDATE dbo.AccountDetails
         SET Etc = @Etc
       WHERE Email = @Email;
 
    ELSE 
 
      INSERT dbo.AccountDetails ( Email, Etc )
      VALUES ( @Email, @Etc );
 
  COMMIT

Antipattern: Inside a Serializable Transaction

So this should be isolated enough, but now it’s vulnerable to deadlocks:

CREATE PROCEDURE s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  BEGIN TRAN
    IF EXISTS ( SELECT * FROM dbo.AccountDetails WHERE Email = @Email )
 
      UPDATE dbo.AccountDetails
         SET Etc = @Etc
       WHERE Email = @Email;
 
    ELSE 
 
      INSERT dbo.AccountDetails ( Email, Etc )
      VALUES ( @Email, @Etc );
 
  COMMIT

Antipattern: Using IGNORE_DUP_KEY

I want to mention one more bad solution before I move on to the good solutions. It works and it’s interesting, but it’s a bad idea.

ALTER TABLE dbo.AccountDetails 
  DROP CONSTRAINT PK_AccountDetails;
 
ALTER TABLE dbo.AccountDetails
  ADD CONSTRAINT PK_AccountDetails 
  PRIMARY KEY ( Email ) WITH ( IGNORE_DUP_KEY = ON )
GO
 
CREATE PROCEDURE dbo.s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
  -- This statement will not insert if there's already a duplicate row,
  -- But it won't raise an error either
  INSERT dbo.AccountDetails ( Email, Etc ) 
  VALUES ( @Email, @Etc );
 
  UPDATE dbo.AccountDetails 
     SET @Etc = Etc
   WHERE @Email = Email;

So what are some better solutions?

Pattern: Inside a Serializable Transaction With Lock Hints

Still my prefered solution.

CREATE PROCEDURE s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
  BEGIN TRAN
 
    IF EXISTS ( SELECT * FROM dbo.AccountDetails WITH (UPDLOCK) WHERE Email = @Email )
 
      UPDATE dbo.AccountDetails
         SET Etc = @Etc
       WHERE Email = @Email;
 
    ELSE 
 
      INSERT dbo.AccountDetails ( Email, Etc )
      VALUES ( @Email, @Etc );
 
  COMMIT

This is a solid solution, but every implementation is different so every time you use this pattern, test for concurrency.

Pattern: MERGE Statement With Serializable Isolation

Nice, but be careful with MERGE statements.

CREATE PROCEDURE s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
    SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
 
    MERGE dbo.AccountDetails AS myTarget
    USING (SELECT @Email Email, @Etc etc) AS mySource
        ON mySource.Email = myTarget.Email
    WHEN MATCHED THEN UPDATE
        SET etc = mySource.etc
    WHEN NOT MATCHED THEN 
        INSERT (Email, Etc) 
        VALUES (@Email, @Etc);

For alternative syntax, skip the SET TRANSACTION ISOLATION LEVEL SERIALIZABLE and include a lock hint: MERGE dbo.AccountDetails WITH (HOLDLOCK) AS myTarget.

Pattern: Just Do It

Just try it and catch and swallow any exception

CREATE PROCEDURE s_AccountDetails_Upsert ( @Email nvarchar(4000), @Etc nvarchar(max) )
AS 
 
BEGIN TRY
 
  INSERT INTO dbo.AccountDetails (Email, Etc) VALUES (@Email, @Etc);  
 
END TRY
 
BEGIN CATCH
 
  -- ignore duplicate key errors, throw the rest.
  IF ERROR_NUMBER() IN (2601, 2627) 
    UPDATE dbo.AccountDetails
       SET Etc = @Etc
     WHERE Email = @Email;
 
END CATCH

I wrote a post about it in Ugly Pragmatism For The Win.
And I wrote about a rare problem with it in Case study: Troubleshooting Doomed Transactions.

I have no concurrency concerns here but because of this issue – not to mention performance concerns – it’s no longer my first preference.

Performance and Other Concerns

When describing each pattern so far, I haven’t paid attention to performance, just concurrency. There are many opportunities to improve the performance of these solutions. In particular, pay very close attention to your average use case. If your workload UPDATEs a row 99% of the time and INSERTs a row only 1% of the time, then optimal implementation will look different than if the UPDATE and INSERT frequency is reversed.

Check out these links (thanks Aaron Bertrand)

7 Comments »

  1. Nice work as always, sir. Look at you with the table of contents at the top right! Nifty! Try viewing the post in incognito mode in your browser though – there’s two checkboxes for “Notify me of followup comments via email”.

    Comment by Brent Ozar — July 20, 2017 @ 2:58 pm

  2. (fixed)
    What? You don’t want double the opportunity to be notified that someone on the internet said something!
    Michael

    Comment by Michael J. Swart — July 20, 2017 @ 3:00 pm

  3. Hi,
    I have tried to reproduce the issues, but cannot to do achieve that. Have tried only the vanilla solution.
    Could you please share the .sqlstress file?
    Which version and edition of SQl Server do you use?
    Thanks.
    Martin

    Comment by Martin — July 26, 2017 @ 5:42 am

  4. Hi Martin,

    I think there are a few things I could suggest

    • I updated the query in this post to run inside SQL Query stress, the way emails are generated now, there is a better chance of key collisions
    • I didn’t mention this, but remember to delete all the rows from AccountDetails when you begin each run.
    • When I use query stress, I use 50 threads for 1000 iterations
    • Personally, I’ve had better luck with a program I wrote which invokes tempdb.dbo.s_DoSomething. That’s described under The C# heading on this post: Generating Concurrent Activity. But I recognize that not everyone has a C# compiler on hand.

    Good luck Martin, let me know what works for you, I’ll update the post.

    Comment by Michael J. Swart — July 27, 2017 @ 9:28 am

  5. Great stuff! All SQL devs should read this.

    Comment by Alex Friedman — September 12, 2017 @ 11:20 am

  6. I tend to use MERGE with UPDLOCK, HOLDLOCK, ROWLOCK. (And HOLDLOCK = SERIALIZABLE). That packages everything that’s needed up into one statement.

    Another credible pattern would be using SERIALIZABLE, any other technique, and retry in case of deadlock.

    Comment by tobi — September 12, 2017 @ 1:58 pm

  7. Thanks Tobi,
    The retry by the application in case of deadlock is definitely a good pattern. I wish included it.
    I think I’m changing my point of view on it too. I used to think that retrying a failure was a last-resort solution and amounted to admitting defeat.
    But now I think that retries are a necessary part of the transaction landscape and they becomes almost necessary when considering more distributed solutions, or other high-availability solutions in general.

    Comment by Michael J. Swart — September 12, 2017 @ 2:28 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress