Only UPDATE Rows That Are Changing, But Do It Carefully

April 1, 2021

Only UPDATE Rows That Are Changing, But Do It Carefully

Filed under: Miscelleaneous SQL,Technical Articles,Tongue In Cheek — Michael J. Swart @ 12:20 pm

If you update a column to the exact same value as it had before, there’s still work being done.

Quite obediently, SQL Server takes out its eraser, erases the old value, and writes the same value in its place even though nothing changed!

But it feels like a real change. It has consequences for locking and an impact to the transaction log just as if it were a real change.

So that leads to performance optimizations that look like this:

Original Update Statement:

UPDATE Users
SET    DisplayName = @NewDisplayName
WHERE  Id = @Id;

Only Update When Necessary:

UPDATE Users
SET    DisplayName = @NewDisplayName
WHERE  Id = @Id
AND    DisplayName <> @NewDisplayName;

But Take Care!

Be careful of this kind of optimization. For example, you have to double check that DisplayName is not a nullable column (do you know why?). There are other things to worry about too, mostly side effects:

Side Effects

This simple update statement can have loads of side effects that can be hard to see. And the trouble with any side effect, is that other people can place dependencies on them! It happens all the time. Here is a list of just some of the side effects I can think of, I’m sure it’s not exhaustive.

Triggers: Ugh, I dislike triggers at the best of times, so check out any triggers that might exist on the table. In the original UPDATE statement, the row always appears in the INSERTED and DELETED tables, but in the improved version, the row does not necessarily. You have to see if that matters.

RowCount: What if the original update statement was part of a batch that looked like this:

UPDATE Users
SET    DisplayName = @NewDisplayName
WHERE  Id = @Id;
 
IF (@@ROWCOUNT > 0)
    RAISERROR ('Could not find User to update', 16, 1);

At least this side effect has the benefit of not being hidden. It’s located right beside the code that it depends on.

Rowversion: A rowversion value changes every time a row changes. Such a column would get updated in the original UPDATE statement, but not in the improved version. I can think of a number of reasonable of use cases that might depend on a rowversion column. ETLs for example that only care about changed data. So this might actually be an improvement for that ETL, but then again, maybe the number of “changed” rows was the important part and that number is now changing with the improvement. Speaking of ETLs:

Temporal Tables: Yep, the UPDATE statement is a “change” in the table that gets tracked in temporal history.

Change Data Capture, etc…: I haven’t bothered to set up Change Data Capture to check, but I assume that an UPDATE statement that updates a row to the same value is still considered a change. Right or wrong, the performance improvement changes that assumption.

People Depend On Side Effects

When I see people do this, I start to feel grouchy: Someone’s getting in the way of my performance improvement! But it happens. People depend on side effects like these all the time. I’m sure I do. XKCD pokes fun at this with Workflow where he notices that “Every change breaks someone’s workflow”. And now I’m imagining a case where some knucklehead is using the growth of the transaction log as a metric, like “Wow, business is really booming today, 5GB of transaction log growth and it’s not even noon!”

Although these are silly examples, there are of course more legit examples I could probably think of. And so in a well-functioning organization, we can’t unilaterally bust other peoples workflows (as much as we might like to).

-- Comments (2)

2 Comments »

Another side-effect worth mentioning can fire-up if application relies on updated value to stay the same in current transaction and using read committed isolation (which is often the case).

I.e. first it’s doing smth like:

>>insert into t1 (a, b) values (0, 1);

>>begin tran;
>>update t1 set b = 1 where a = 0 and b 1; — no update here, b is already = 1

…then meanwhile some other process is like:
>> update t1 set b = 2 where a = 0 and b 2;

…and then if the first process later in the same transaction expects 1 here:
>> select b from t1 where a = 0;
then it’s in trouble, if this “and b 1” is in place; otherwise it will hold the X lock that will not let another process to change b from 1 to 2.

Of course that can be easily avoided, but only if developer considered that situation, that is often not the case.

Comment by Denis — April 2, 2021 @ 8:10 am
Hi Denis,

That’s a really good point Denis. You’re right, holding an exclusive lock on a value that got updated might have some desirable consequences.

I guess there’s a balance. On one hand, holding an exclusive lock allows subsequent queries in a transaction to depend on those values.
On the other hand, holding fewer locks can really help increase performance and concurrency.

Personally, I always value the performance and concurrency I get and I have typically chosen less locking wherever I can.

Just like the other examples, it’s something developers should be mindful of.

Comment by Michael J. Swart — April 5, 2021 @ 10:50 am

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

April 1, 2021