Michael J. Swart

August 28, 2012

Sharpshooting Query Plans From The Cache

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

Takeaway: I talk a bit about DBCC FREEPROCCACHE, using the plan_handle parameter to remove exactly one query plan from cache. I also explain why and when you might want to.

This is another post where I share a script of mine. It’s one of my favorite kinds of posts so I’ve created a new category for it: SQL Scripts

The Situation

  • Crisis mode: The database server is slow (all of a sudden) and the performance is unacceptable.
  • You used your troubleshooting checklist and you see CPU is suddenly pegged on the database server.
  • And sp_whoisactive (or other monitoring software) tells you that there’s a particular query that is running frequently. In this example, the query starts with “SELECT TOP (@ResultLimit) M.Name FROM MyTable M JOIN … etc“.
  • The query is called in exactly one place in your application (Oh man! A fix will take time to recompile, QA and deploy).
  • The query usually runs for a few minutes and using up almost all the CPU.
  • You’ve collected the query plan and the query text.
  • But, if you run the query in SQL Server Management Studio, the query completes quickly.

Some Analysis

  • The query plan looks different when run in Management Studio
  • The production database must have picked a plan based on unlucky compile-time parameters.
  • Confirmed! When running the query in Management Studio using the OPTIMIZE FOR clause with similar parameters, you can reproduce the bad plan.
  • To buy some time, it would be good to have SQL Server recompile that plan.

This is what I want to explore in this post. What is the best way to get SQL Server to recompile that plan?

Before SQL Server 2008

How do we get SQL Server to recompile this plan? Before SQL Server 2008, there were a couple things I could do:

  • DBCC FREEPROCCACHE: Holy cow! That’s a bit extreme. We ask SQL Server to recompile all of its plans. (Missed opportunity: Illustration featuring Darth Sidious saying “Wipe them out. All of them.”)
  • UPDATE STATISTICS [tablename]: Find a table that is used in the query and have SQL Server update statistics on it. All plans using the table will be recompiled afterwards (including our troubled plan!) We might not actually need updated statistics; it’s the side effect of recompiled plans that we want here. I talked a bit about that in Updating Statistics Helps, But Not For The Reasons You Think.
  • EXEC sp_recompile [tablename]: This technique is similar to UPDATE STATISTICS in that plans are dropped. The benefit is that we don’t have all the overhead and time spent creating the statistics. It’s still not ideal though. It takes a schema modification lock and based on the server’s current busy state, that will mean at least a couple minutes of blocked processes.

Targeting a Single Query Plan

SQL Server version 2008 and later allows you to take out a single query plan from cache using the FREEPROCCACHE command with a plan handle. This is a sharp-shooter technique compared to the other techniques.

I like this technique because it takes no important locks. Existing calls to this query will continue to execute and complete (using the old bad plan). But in the meantime, new calls to this query will use a recompiled plan based on the different (and hopefully better) parameters. If so you’ve just bought yourself time to find and fix this query correctly.

But you can’t just use FREEPROCCACHE without some preparation. Finding the plan handle is not straightforward. So I’ve got a script that only asks that you identify a query based on its text.

The Script

First find out how bad your query is. I’m arbitrarily defining more than 10 active queries as bad. Change the query text here. Use enough text to uniquely identify the problem query. In my case I used “TOP (@ResultLimit)”.

-- Part 1: 
-- Use this query to look for bad performing queries using "TOP (@ResultLimit)"
select 
       count(1) as [count of running queries, should be less than 10], 
       max(datediff(second, start_time, getdate())) as [longest running such query in seconds],
       min(datediff(second, start_time, getdate())) as [shortest running such query in seconds]
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self
GO

Then run this part to knock out that one query (remembering again to adjust the query text).

-- Part 2:
-- If the above indicates poor performance (i.e. many active MyTable queries), 
-- use the following to clear bad plan from cache.
declare @count int = 0;
declare @planhandle varbinary(64)
 
select 
    @count = count(1), 
    @planhandle = max(plan_handle) -- any arbitrary plan
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self
 
-- this query recompiles the one identified plan.
if (@count > 10)
    --working
    dbcc freeproccache (@planhandle);
GO

Hopefully You’ve Bought Yourself Time

The query is recompiled with a better plan and you can sit back (for now). Your query is still vulnerable to bad plans and you’ve got to fix that, but you’ve got time.

In practice I’ve used this script maybe three or four times successfully. Your mileage may vary.

How Risky Is This Script?

Well, What’s the worst thing that could happen?

To be honest, I can’t think of much that could go wrong other than it doesn’t solve your problem. In the worst case scenario:

  • your analysis was wrong, the plan is recompiled and still performs poorly.
  • Or perhaps somehow the script picks the wrong plan to recompile, that’s not bad news either, it’s just news. I certainly can’t say the same for the other techniques that were available before version 2008.
  • Maybe expectations were set incorrectly. You propose this change, It doesn’t fix anything and now it may look like your guessing (This isn’t a technical issue, but it’s something to keep in mind).

By the way, don’t take my word for it. You need to tell others that the risk is small and you have to back this claim up without my help. So you should understand what’s going on here.

 

August 17, 2012

Be Careful with the Merge Statement

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 8:38 am

So there’s a lot of good documentation provided by Microsoft for SQL Server. Microsoft in general is the best model I know of for how technical documentation should be provided to the public (If you’re laughing, it’s time to take another look).

But to the unfamiliar, it can still be hard to judge good docs from bad (or good docs from better). If you knew better, then you probably wouldn’t need the docs right? Wouldn’t it be great to have some third party point to a particular article and say “Read this; this is important.”

For example consider this article: Optimizing MERGE Statement Performance. It’s well written and it explains how performance gains are possible using the MERGE statement because source and target datasets can be processed only once. The alternative to the MERGE statement is to process the datasets multiple times by using more than one INSERT, UPDATE or DELETE statements. But then Microsoft goes on to say “… performance gains depend on having correct indexes, joins, and other considerations in place.” That is an understatement dear readers and I’d like to call your attention to it.

I’m not going to repeat Microsoft’s advice (it is like I said, an excellent article), but I am going to add to it by describing some of the dangers of not following their advice.

Why Is MERGE Dangerous?

Here’s why. It’s because there is a risk that if you don’t get the “indexing, joins and other considerations” right, then it is possible to not only lose the performance benefits you hoped for, but to suffer from performance problems much much worse than if you had written the straight-forward INSERT, UPDATE and/or DELETE statements in the first place. Because of their complexity, I believe MERGE statements seem more vulnerable to bad query plans than other DML statements.

My Own Experience

I’ve been bitten by this problem twice in the past month the most recent time being yesterday. So for the curious, you could say that this lesson counts as the latest thing about SQL Server I learned the hard way.

Experience #1

The first time was with a merge statement that was implementing the standard logic of updating an existing set of table rows with using a dataset sent from an application:

  • Insert rows that are new
  • Update rows that have changed
  • Delete rows that are gone

But the logic of the whole thing required a fussy “WHEN NOT MATCHED BY SOURCE AND …” and the resulting query plan was doing something like a full-outer join. The query processed the entire target table even though it didn’t need to. If you’re really really curious, look for more details look at this similar question I found on Stack Overflow: Sql Server Delete and Merge performance.

I was about to rewrite the whole thing using only UPDATE/INSERT/DELETE statements, but then my colleague found a way to make the merge statement work. Woo hoo.

Experience #2

And then just yesterday, I had another problem with a MERGE statement. Some of the facts:

  • The source query was a CTE and produced about 4000 rows max. That CTE when run on its own never took more than 3 seconds in either on test or production databases.
  • The MERGE’s search condition – the ON [merge_search_condition] clause – used columns that matched a unique non-clustered index. Microsoft recommends a clustered index here, but what can you do?
  • Testing with a target table of several thousand rows went just fine (in hindsight, a test db with millions of rows would have saved me).

But testing is testing and production is production. The optimizer crossed a threshold somewhere and said “Whelp, that’s it. I guess a non-clustered index isn’t going to cut it here any more. Let’s scan the whole thing.” Now most of the time the optimizer comes up with a really good plan. But in this case, it just didn’t.

I plan to rewrite the query as separate INSERT and UPDATE statements and early tests are promising.

Where Do We Go Now

Well, as a veteran problem-avoider, I now feel distrustful and wary of the MERGE statement. And maybe I don’t mind the multiple INSERT/UPDATE statements so much any more. If my MERGE statement takes one second to run and my INSERT/UPDATE statements takes two, then maybe two seconds is acceptable when the alternative takes so much extra effort to properly test. What do you think?

August 10, 2012

Avoiding Problems

Filed under: SQLServerPedia Syndication,Technical Articles,Tongue In Cheek — Michael J. Swart @ 8:00 am

Takeaway: None really, I just wanted to tell a few stories and draw a bit

A long time ago, I had an interview and I was asked a question that (I’m guessing) was designed to evaluate how I approached problem solving. It went something like this:

Interviewer: “Say you’re a farmer who has a cow in a fenced field. But the fence is broken and you need a nail to fix it. The nail you need is inside the barn and if you retrieve it you leave the cow free to escape. What do you do?”
Me: “I think I would just go fetch the nail. I don’t think the cow will get out”
Interviewer: “But you can’t leave the cow alone”
Me: “I don’t know. I’ve been around cows and they’re pretty stupid. There’s a good chance they don’t know the fence has a hole in it. It will probably take me under two minutes to get the nail.”
Interviewer: “Ohh… You’re a problem-avoider.”
Me: !!!

“Problem Avoider”? That label took me by surprise. And even though it wasn’t meant as a criticism, maybe I did feel a little slighted by it. The implication is maybe that I don’t face problems that ought to be faced.

Since then, when I face a technical problem I often ask myself: “Am I a problem avoider?” “Am I refusing to deal with a particular issue, hoping it will go away?” I’d like to believe that the answer to those questions is yes and no respectively: I am a problem-avoider but I do not have my head buried in the sand.

Unapologetic Problem Avoider

Sometimes taking a step back to try another approach makes the problem go away. I was asked something like this recently:

Friend: For this set of data, how do I get the min, max, median values. As well as the value at the 25th percentile and the value at the 75th percentile.
Me: Oooh, I know this one…

I googled “NTILE” and was soon looking at the books online topic wondering if this was appropriate. After some thinking I decided to maybe avoid the problem:

Me: How many rows are we talking about?
Friend: Umm a couple hundred… maybe up to a thousand max.
Me: You can do this in C# right? Just load the entire dataset, sort it and then do index look ups after doing some arithmetic.
Friend: I was thinking of that, but I wanted to know your opinion.
Me: I think that’s the way to go. SQL Server already has to do the reads. The only thing we’re not saving is the network bandwidth. And that’s not much any way.

Problem avoided. But I think that the important part here is that the problem is still being faced; it’s just being faced in the right place and by the right person.

Taking Responsibility

But there are times when you can’t avoid a problem and you have to take responsibility. I’m not going to preach against procrastination. Mostly because I think Nike already has the best advice.

But I want to talk about procrastination’s second cousin: Doing-fun-work-first. Man, I do that a lot. Probably too much. I still struggle, but lately, my approach has been

  • to prioritize things and then be as objectively honest about the priorities as possible (no kidding).
  • Or deal with crummy tasks first in order to “get them out of the way”.

If you got any tips of your own, send them this way…

Powered by WordPress