So there’s a lot of good documentation provided by Microsoft for SQL Server. Microsoft in general is the best model I know of for how technical documentation should be provided to the public (If you’re laughing, it’s time to take another look).
But to the unfamiliar, it can still be hard to judge good docs from bad (or good docs from better). If you knew better, then you probably wouldn’t need the docs right? Wouldn’t it be great to have some third party point to a particular article and say “Read this; this is important.”
For example consider this article: Optimizing MERGE Statement Performance. It’s well written and it explains how performance gains are possible using the MERGE statement because source and target datasets can be processed only once. The alternative to the MERGE statement is to process the datasets multiple times by using more than one INSERT, UPDATE or DELETE statements. But then Microsoft goes on to say “… performance gains depend on having correct indexes, joins, and other considerations in place.” That is an understatement dear readers and I’d like to call your attention to it.
I’m not going to repeat Microsoft’s advice (it is like I said, an excellent article), but I am going to add to it by describing some of the dangers of not following their advice.
Why Is MERGE Dangerous?
Here’s why. It’s because there is a risk that if you don’t get the “indexing, joins and other considerations” right, then it is possible to not only lose the performance benefits you hoped for, but to suffer from performance problems much much worse than if you had written the straight-forward INSERT, UPDATE and/or DELETE statements in the first place. Because of their complexity, I believe MERGE statements seem more vulnerable to bad query plans than other DML statements.
My Own Experience
I’ve been bitten by this problem twice in the past month the most recent time being yesterday. So for the curious, you could say that this lesson counts as the latest thing about SQL Server I learned the hard way.
The first time was with a merge statement that was implementing the standard logic of updating an existing set of table rows with using a dataset sent from an application:
- Insert rows that are new
- Update rows that have changed
- Delete rows that are gone
But the logic of the whole thing required a fussy “WHEN NOT MATCHED BY SOURCE AND …” and the resulting query plan was doing something like a full-outer join. The query processed the entire target table even though it didn’t need to. If you’re really really curious, look for more details look at this similar question I found on Stack Overflow: Sql Server Delete and Merge performance.
I was about to rewrite the whole thing using only UPDATE/INSERT/DELETE statements, but then my colleague found a way to make the merge statement work. Woo hoo.
And then just yesterday, I had another problem with a MERGE statement. Some of the facts:
- The source query was a CTE and produced about 4000 rows max. That CTE when run on its own never took more than 3 seconds in either on test or production databases.
- The MERGE’s search condition – the ON [merge_search_condition] clause – used columns that matched a unique non-clustered index. Microsoft recommends a clustered index here, but what can you do?
- Testing with a target table of several thousand rows went just fine (in hindsight, a test db with millions of rows would have saved me).
But testing is testing and production is production. The optimizer crossed a threshold somewhere and said “Whelp, that’s it. I guess a non-clustered index isn’t going to cut it here any more. Let’s scan the whole thing.” Now most of the time the optimizer comes up with a really good plan. But in this case, it just didn’t.
I plan to rewrite the query as separate INSERT and UPDATE statements and early tests are promising.
Where Do We Go Now
Well, as a veteran problem-avoider, I now feel distrustful and wary of the MERGE statement. And maybe I don’t mind the multiple INSERT/UPDATE statements so much any more. If my MERGE statement takes one second to run and my INSERT/UPDATE statements takes two, then maybe two seconds is acceptable when the alternative takes so much extra effort to properly test. What do you think?