Michael J. Swart

March 18, 2022

UPSERT Requires a Unique Index

Filed under: Miscelleaneous SQL,SQL Scripts,Technical Articles — Michael J. Swart @ 10:11 am

To avoid deadlocks when implementing the upsert pattern, make sure the index on the key column is unique. It’s not sufficient that all the values in that particular column happen to be unique. The index must be defined to be unique, otherwise concurrent queries can still produce deadlocks.

Say I have a table with an index on Id (which is not unique):

CREATE TABLE dbo.UpsertTest(
	Id INT NOT NULL,
	IdString VARCHAR(100) NOT NULL,
	INDEX IX_UpsertTest CLUSTERED (Id)
)

I implement my test UPSERT procedure the way I’m supposed to like this:

CREATE OR ALTER PROCEDURE dbo.s_DoSomething  
AS 
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION 
	DECLARE @Id BIGINT = DATEPART(SECOND, GETDATE());
	DECLARE @IdString VARCHAR(100) = CAST(@Id AS VARCHAR(100)); 
 
	IF EXISTS ( 
		SELECT * 
		FROM dbo.UpsertTest WITH (UPDLOCK) 
		WHERE Id = @Id 
	) 
	BEGIN 
		UPDATE dbo.UpsertTest 
		SET IdString = @IdString 
		WHERE Id = @Id; 
	END 
	ELSE 
	BEGIN 
		INSERT dbo.UpsertTest (Id, IdString) 
		VALUES (@Id, @IdString); 
	END; 
COMMIT

When I exercise this procedure concurrently with many threads it produces deadlocks! I can use extended events and the output from trace flag 1200 to find out what locks are taken and what order.

What Locks Are Taken?

It depends on the result of the IF statement. There are two main scenarios to look at. Either the row exists or it doesn’t.

Scenario A: The Row Does Not Exist (Insert)
These are the locks that are taken:

    For the IF EXISTS statement:

    • Acquire Range S-U lock on resource (ffffffffffff) which represents “infinity”

    For the Update statement:

    • Acquire RangeI-N lock on resource (ffffffffffff)
    • Acquire X lock on resource (66467284bfa8) which represents the newly inserted row

Insert Scenario

Scenario B: The Row Exists (Update)
The locks that are taken are:

    For the IF EXISTS statement:

    • Acquire Range S-U lock on resource (66467284bfa8)

    For the Update statement:

    • Acquire RangeX-X lock on resource (66467284bfa8)
    • Acquire RangeX-X lock on resource (ffffffffffff)

Update Scenario

Scenario C: The Row Does Not Exist, But Another Process Inserts First (Update)
There’s a bonus scenario that begins just like the Insert scenario, but the process is blocked waiting for resource (ffffffffffff). Once it finally acquires the lock, the next locks that are taken look the same as the other Update scenario. The locks that are taken are:

    For the IF EXISTS statement:

    • Wait for Range S-U lock on resource (ffffffffffff)
    • Acquire Range S-U lock on resource (ffffffffffff)
    • Acquire Range S-U lock on resource (66467284bfa8)

    For the Update statement:

    • Acquire RangeX-X lock on resource (66467284bfa8)
    • Acquire RangeX-X lock on resource (ffffffffffff)

Update After Wait Scenario

The Deadlock

And when I look at the deadlock graph, I can see that the two update scenarios (Scenario B and C) are fighting:
Scenario B:

  • Acquire RangeX-X lock on resource (66467284bfa8) during UPDATE
  • Blocked RangeX-X lock on resource (ffffffffffff) during UPDATE

Scenario C:

  • Acquire RangeS-U lock on resource (ffffffffffff) during IF EXISTS
  • Blocked RangeS-U lock on resource (66467284bfa8) during IF EXISTS

Why Isn’t This A Problem With Unique Indexes?

To find out, let’s take a look at one last scenario where the index is unique:
Scenario D: The Row Exists (Update on Unique Index)

    For the IF EXISTS statement:

    • Acquire U lock on resource (66467284bfa8)

    For the Update statement:

    • Acquire X lock on resource (66467284bfa8)

Visually, I can compare scenario B with Scenario D:
Update Two Scenarios

When the index is not unique, SQL Server has to take key-range locks on either side of the row to prevent phantom inserts, but it’s not necessary when the values are guaranteed to be unique! And that makes all the difference. When the index is unique, no lock is required on resource (ffffffffffff). There is no longer any potential for a deadlock.

Solution: Define Indexes As Unique When Possible

Even if the values in a column are unique in practice, you’ll help improve concurrency by defining the index as unique. This tip can be generalized to other deadlocks. Next time you’re troubleshooting a deadlock involving range locks, check to see whether the participating indexes are unique.

This quirk of requiring unique indexes for the UPSERT pattern is not unique to SQL Server, I notice that PostgreSQL requires a unique index when using their “ON CONFLICT … UPDATE” syntax. This is something they chose to do very deliberately.

Other Things I Tried

This post actually comes from a real problem I was presented. It took a while to reproduce and I tried a few things before I settled on making my index unique.

Lock More During IF EXISTS?
Notice that there is only one range lock taken during the IF EXISTS statement, but there are two range needed for the UPDATE statement. Why is only one needed for the EXISTS statement? If extra rows get inserted above the row that was read, that doesn’t change the answer to EXISTS. So it’s technically not a phantom read and so SQL Server doesn’t take that lock.

So what if I changed my IF EXISTS to

IF ( 
	SELECT COUNT(*)
	FROM dbo.UpsertTest WITH (UPDLOCK) 
	WHERE Id = @Id 
) > 0

That IF statement now takes two range locks which is good, but it still gets tripped up with Scenario C and continues to deadlock.

Update Less?
Change the update statement to only update one row using TOP (1)

UPDATE TOP (1) dbo.UpsertTest 
SET IdString = @IdString
WHERE Id = @Id;

During the update statement, this only requires one RangeX-X lock instead of two. And that technique actually works! I was unable to reproduce deadlocks with TOP (1). So it is indeed a candidate solution, but making the index unique is still my preferred method.

9 Comments »

  1. Hi Michael,

    I love the animations – really helps with the explanation.
    Reminds me of this article by Aaron Bertrand – https://sqlperformance.com/2020/09/locking/upsert-anti-pattern
    Wouldn’t serializable hint along with the UPDLOCK help? Quote: “(ensure a row that doesn’t exist continues to not exist)”

    Also, is the IF EXISTS necessary? Wouldn’t updating straight away and checking if any rows were affected achieve this without the extra transaction?

    Comment by Tom — March 18, 2022 @ 10:40 am

  2. Hey Tom,
    I talk a lot about all the different patterns at https://michaeljswart.com/go/upsert (Aaron also links to a bunch of my posts from his as well).

    I just tried it out. The UPDATE-then-INSERT-if-necessary pattern that Aaron mentions still deadlocks when the index isn’t unique. The unique index or top(1) on the update avoids those deadlocks.
    So it’s the same story as above. (I’m not going to animate that 🙂 )

    It’s actually really really valuable to learn how to check this stuff out on your own.
    I have the scripts above that can be used to set up the experiment. You could adjust the sproc with Aaron’s code. And use SQLQueryStress to exercise it concurrently.

    Comment by Michael J. Swart — March 18, 2022 @ 10:58 am

  3. To your other question: “Wouldn’t serializable hint along with the UPDLOCK help?”
    That behavior is equivalent to SET TRANSACTION ISOLATION LEVEL SERIALIZABLE which I used in my example

    Comment by Michael J. Swart — March 18, 2022 @ 11:18 am

  4. Thanks for the reply. That’s absolutely true, I had an intensive day and got lazy (ironically I usually advocate to always test yourself) – which is no excuse.
    I’ll definitely try to reproduce and observe. Thank you for the blog post!

    Comment by Tom — March 18, 2022 @ 11:19 am

  5. The repro code includes the TOP (1) workaround without comment, which might confuse people looking to reproduce.

    The point of running the EXISTS check under SERIALIZABLE isolation is to ensure any row that didn’t exist at the time of the check continues to not exist until the INSERT is performed. There’s no other way to lock a row that does not exist.

    There is no benefit in running the INSERT or UPDATE under SERIALIZABLE isolation. As you saw, it only increases the chances of blocking or deadlocking.

    In that sense, using (UPDLOCK, SERIALIZABLE) hints on the EXISTS test only is superior to running the entire transaction at SERIALIZABLE.

    Comment by Paul White — March 19, 2022 @ 4:37 am

  6. Thanks Paul.
    I removed the TOP (1) from the reproduction example. I don’t know how that got in there.

    Aaron mentions the table hint in his latest upset post. I see that there’s no benefit to having the entire transaction be serializable.
    I’ll see what kind of difference it makes to the test when I’m back at my desk. Even if it doesn’t make a difference in this case, I know of many other procedures that would likely benefit from more targeted serializable hints.

    Comment by Michael J. Swart — March 19, 2022 @ 8:29 am

  7. […] Michael J Swart recommends a unique index: […]

    Pingback by The Upsert Pattern and Unique Indexes in SQL Server – Curated SQL — March 21, 2022 @ 8:10 am

  8. You should find the deadlock goes away completely when you limit the scope of serializable as suggested.

    Comment by Paul White — March 23, 2022 @ 4:32 am

  9. I did indeed.
    I also found that Aaron’s “Update then Insert if required” still needed the top (1) on the update for non-unique indexes.

    I updated the examples I wrote about at http://michaeljswart.com/go/upsert

    Comment by Michael J. Swart — March 23, 2022 @ 8:10 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress