Michael J. Swart

February 11, 2016

Ugly Pragmatism For The Win

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 11:37 am

Update: This is an archived post on the topic of UPSERT. See my more recent and comprehensive post SQL Server UPSERT Patterns and Antipatterns

Yesterday I had my mind changed about the best way to do concurrency. I describe several methods in Mythbusting: Concurrent Update/Insert Solutions. My preferred method is to increase the isolation level and fine tune locks.

At least that was my preference. I recently changed my approach to use a method that gbn suggested in the comments. He describes his method as the “TRY CATCH JFDI pattern”. Normally I avoid solutions like that. There’s a rule of thumb that says developers should not rely on catching errors or exceptions for control flow. But I broke that rule of thumb yesterday.

By the way, I love the gbn’s description for the pattern “JFDI”. It reminds me of Shia Labeouf’s motivational video.

Okay, I’ll tell you the story.

The Original Defect

So there’s this table. It’s defined something like:

CREATE TABLE dbo.AccountDetails
(
  Email NVARCHAR(400) NOT NULL
    CONSTRAINT PK_AccountDetails PRIMARY KEY (Email),
  Created DATETIME NOT NULL
    CONSTRAINT DF_AccountDetails_Created DEFAULT GETUTCDATE(),
  Etc NVARCHAR(MAX)
)

And there’s a procedure defined something like:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
 
  DECLARE @Created DATETIME;
  DECLARE @EtcDetails NVARCHAR(MAX);
 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  SET XACT_ABORT ON
  BEGIN TRAN
 
    SELECT
      @Created = Created,
      @EtcDetails = Etc
    FROM dbo.AccountDetails
    WHERE Email = @Email
 
    IF @Created IS NULL
    BEGIN
      SET @Created = GETUTCDATE();
      SET @EtcDetails = @Etc;
      INSERT INTO dbo.AccountDetails (Email, Created, Etc)
      VALUES (@Email, @Created, @EtcDetails);
    END
 
  COMMIT
 
  SELECT @Email as Email, @Created as Created, @EtcDetails as Etc

Applications executing this procedure were deadlocking with each other. If you’re keen, try to figure out why before reading ahead. It’s pretty close to the problem described in the Mythbusting post. Specifically this was method 3: increased isolation level.

Initial Fix

So I decided to fine tune locks. I added an UPDLOCK hint:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
 
  DECLARE @Created DATETIME;
  DECLARE @EtcDetails NVARCHAR(MAX);
 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  SET XACT_ABORT ON
  BEGIN TRAN
 
    SELECT
      @Created = Created,
      @EtcDetails = Etc
    FROM dbo.AccountDetails WITH (UPDLOCK)    WHERE Email = @Email
 
    IF @Created IS NULL
    BEGIN
      SET @Created = GETUTCDATE();
      SET @EtcDetails = @Etc;
      INSERT INTO dbo.AccountDetails (Email, Created, Etc)
      VALUES (@Email, @Created, @EtcDetails);
    END
 
  COMMIT
 
  SELECT @Email as Email, @Created as Created, @EtcDetails as Etc

Bail Early If Possible

Okay, so this solution works. It’s concurrent and it performs just fine. I realized though that I can improve this further by avoiding the transaction and locks. Basically select the row and if it exists, bail early:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
 
  DECLARE @Created DATETIME;
  DECLARE @EtcDetails NVARCHAR(MAX);
 
  SELECT    @Created = Created,    @EtcDetails = Etc  FROM dbo.AccountDetails  WHERE Email = @Email;   IF (@Created IS NOT NULL)  BEGIN    SELECT @Email as Email, @Created as Created, @EtcDetails as Etc;    RETURN;  END 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  SET XACT_ABORT ON
  BEGIN TRAN
 
    SELECT
      @Created = Created,
      @EtcDetails = Etc
    FROM dbo.AccountDetails WITH (UPDLOCK)
    WHERE Email = @Email;
 
    IF @Created IS NULL
    BEGIN
      SET @Created = GETUTCDATE();
      SET @EtcDetails = @Etc;
      INSERT INTO dbo.AccountDetails (Email, Created, Etc)
      VALUES (@Email, @Created, @EtcDetails);
    END
 
  COMMIT
 
  SELECT @Email as Email, @Created as Created, @EtcDetails as Etc;

Take A Step Back

Okay, this is getting out of hand. The query shouldn’t have to be this complicated.
Luckily I work with a guy named Chris. He’s amazing at what he does. He questions everything without being a nitpicker (there’s a difference). He read through the Mythbusters post and followed all the links in the comments. He asked whether gbn’s JFDI pattern wasn’t better here. So I implemented it just to see what that looked like:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
  BEGIN TRY
    INSERT INTO dbo.AccountDetails Email, Etc
    SELECT @Email, @Etc
    WHERE NOT EXISTS 
      ( SELECT * FROM dbo.AccountDetails WHERE Email = @Email );
  END TRY
  BEGIN CATCH
    -- ignore duplicate key errors, throw the rest.
    IF ERROR_NUMBER() NOT IN (2601, 2627) THROW;
  END CATCH
 
  SELECT Email, Created, Etc
  FROM dbo.AccountDetails 
  WHERE Email = @Email;

Look at how much better that looks! No elevated transaction isolation levels. No query hints. The procedure itself is half as long as it used to be. The SQL is so much simpler and for that reason, I prefer this approach. I am happy in this case to use error handling for control flow.

So I checked in the change and updated my pull request. Chris’s last comment before he approved the pull request was “Looks good. Ugly pragmatism FTW.”

13 Comments »

  1. Sometimes, though, JFDI leads to worse performance overall, depending on what % of calls fail. Raising exceptions has substantial overhead. I showed this in a couple of posts:

    http://sqlperformance.com/2012/08/t-sql-queries/error-handling

    https://www.mssqltips.com/sqlservertip/2632/checking-for-potential-constraint-violations-before-entering-sql-server-try-and-catch-logic/

    Comment by Aaron Bertrand — February 11, 2016 @ 11:49 am

  2. You’re right Aaron, and we did test it.

    It turns out that in our case, the percent of calls that failed was 0 (when rounded to the nearest percent).
    I think you illustrate the point that as much as possible, evaluate things on a case-by-case basis over following rules-of-thumb.

    It’s also why we added the not-strictly-necessary WHERE NOT EXISTS clause.

    Comment by Michael J. Swart — February 11, 2016 @ 11:57 am

  3. i read this and the mythbusting article with great interest. thanks. question, though: in the mythbusting article, the problem at hand had an UPDATE component to it which is not present in this example. was the UPDATE component eliminated for simplicity? the reason i ask is because in the example as provided i’m wondering why you would catch and then literally ignore duplicate key errors rather than simply define the PRIMARY KEY with IGNORE_DUP_KEY=ON? Using IGNORE_DUP_KEY you could still adopt the JFDI ethos and simplify even further by removing even the TRY…CATCH. i remember reading about overhead associated with (ab)use of IGNORE_DUP_KEY, but seem to remember that being associated with lots of duplicate keys in a single large insert and not so much with singletons (unfortunately, i can’t seem to find the source).

    thanks for the article and for a very enjoyable and informative blog all around.

    Comment by mmiike — February 11, 2016 @ 8:14 pm

  4. Thanks for the comment Mike,

    I have to be honest, I come across the IGNORE_DUP_KEY feature so infrequently that I forgot that it existed. It is true that in this case that IGNORE_DUP_KEY would have worked like you describe.

    I actually would (slightly) prefer the TRY/CATCH method over the IGNORE_DUP_KEY method because it keeps the behavior with the code where it seems to belong. But that’s a personal preference.

    Thanks for the feedback!

    Comment by Michael J. Swart — February 11, 2016 @ 8:25 pm

  5. […] Sometimes, Michael J. Swart says, it’s better to just do it: […]

    Pingback by JDFI – Curated SQL — February 12, 2016 @ 8:01 am

  6. Nice Blog, well done.

    Comment by Paul Nielsen — February 15, 2016 @ 11:34 pm

  7. Michael,

    Nice post. I’m curious if a merge (without try catch) would perform as well and also avoid locking. Thanks.

    Comment by Drew — February 16, 2016 @ 10:35 am

  8. Hi Drew,

    Check out the original post I wrote: Mythbusting: Concurrent Update/Insert Solutions. In it I test out the MERGE statement and its concurrency.

    Although the MERGE statement is a single statement, that isn’t enough to avoid concurrency issues.
    In terms of performance, I’ve never heard of MERGE performing better than alternatives. I’m sure examples exist, but it hasn’t been my experience.

    Comment by Michael J. Swart — February 16, 2016 @ 11:22 am

  9. @Drew, not to steal any of Michael’s polite, Canadian thunder, but it is a common misconception that because MERGE is a single statement that it must be a single operation. Behind the scenes, it actually performs the same independent operations that we have traditionally written as separate statements ourselves. There are other concerns you should at least be aware of, I wrote about them here:

    http://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

    Comment by Aaron Bertrand — February 16, 2016 @ 11:22 am

  10. Hey Drew, listen to Aaron, he knows what he’s talking about. 🙂

    Comment by Michael J. Swart — February 16, 2016 @ 11:24 am

  11. Hi Michael. I think there is still a little room for improvement on this topic.

    1) Given that you have enough executions of this procedure at the same time such that you were getting deadlocks, and given that this procedure returns a single-row result set: you should be better off returning those 3 values as OUTPUT parameters instead of a result set. Output params should require fewer resources from both the DB and App layers.

    2) Taking a 2nd step back, looking at what the code does, it always does two SELECTs. While true that the data page is already in the Buffer Pool when that final SELECT is executed (and hence really fast), it is still another query being submitted which requires some amount of server resources to process. In the case of the row already existing (hence skipping the INSERT), that first SELECT is not being used to its fullest potential since the row it receives isn’t being captured. And, in the infrequent case of the NOT EXISTS returning true for two sessions at the exact same time (hence more than one will attempt the same INSERT and only one can win), then the process is actually doing three SELECTs.

    However, we can address both concerns at the same time by rethinking the process in the following manner:

    A) All 3 values to return should be OUTPUT parameters. You can just add OUTPUT to the current two input params as that will just make them IN-OUT params. Then add a third parameter for @Created, also marked as OUTPUT, but defaulted to NULL so that it doesn’t require being passed in (since you don’t want to set the column in that manner anyway).
    B) You never need to select the Email and Etc columns from the table since the values being passed in are always current and correct (at least in this model)
    C) Just select the “Created” column as that is the only unknown.
    D) If you get a row returned, then you now have all desired values and can simply exit (after only doing a single SELECT)
    E) If you do not get a row returned, then do the INSERT.
    F) In the infrequent case of two (or more) sessions running the INSERT with the same value for “email” at the exact same time, one will win and all others will fail. Any sessions that fail will go to the CATCH block which simply does the same SELECT for just the “Created” column (which is guaranteed to exist else the INSERT wouldn’t have failed). And then just exist, after having done only two SELECTs.

    All of that translates to the following adaptation of the “final” code presented above:

    CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
    (
    @Email NVARCHAR(400) OUTPUT,
    @Etc NVARCHAR(MAX) OUTPUT,
    @Created DATETIME = NULL OUTPUT
    )
    AS
    BEGIN TRY
    SELECT @Created = acc.Created
    FROM dbo.AccountDetails acc;

    IF (@@ROWCOUNT = 0)
    BEGIN
    SET @Created = GETUTCDATE();
    INSERT INTO dbo.AccountDetails (Email, Etc, Created)
    VALUES (@Email, @Etc, @Created);
    END;
    END TRY
    BEGIN CATCH
    — ignore duplicate key errors, throw the rest.
    IF (ERROR_NUMBER() NOT IN (2601, 2627))
    BEGIN
    ;THROW;
    END;

    SELECT @Created = acc.Created
    FROM dbo.AccountDetails acc;
    END CATCH;
    GO

    ALSO, you can sleep easier at night knowing that you are, in fact, not “relying on catching errors or exceptions for control flow”. My suggestion here, as well as your final code (due to having the SELECT with WHERE NOT EXISTS) is catching exceptions simply to handle them appropriately. Pragmatism need not be ugly 🙂

    Take care,
    Solomon..

    P.S. this issue, including a reference to this particular post, came up on DBA.StackExchange via the following question:

    http://dba.stackexchange.com/questions/158092/is-a-merge-with-output-better-practice-than-a-conditional-insert-and-select/158335#158335

    Comment by Solomon Rutzky — December 23, 2016 @ 4:19 pm

  12. In most cases, UPDATE first is better, since it happens more often. INSERT can only happen once :p

    Comment by Davi — March 3, 2020 @ 4:09 pm

  13. […] I took a page out of the Ugly Pragmatism handbook. I disabled the foreign key, and set up a job to look for rogue rows […]

    Pingback by T-SQL Tuesday: The Last Ticket I Closed – Darling Data — February 14, 2024 @ 12:21 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress