Michael J. Swart

August 28, 2012

Sharpshooting Query Plans From The Cache

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

Takeaway: I talk a bit about DBCC FREEPROCCACHE, using the plan_handle parameter to remove exactly one query plan from cache. I also explain why and when you might want to.

This is another post where I share a script of mine. It’s one of my favorite kinds of posts so I’ve created a new category for it: SQL Scripts

The Situation

  • Crisis mode: The database server is slow (all of a sudden) and the performance is unacceptable.
  • You used your troubleshooting checklist and you see CPU is suddenly pegged on the database server.
  • And sp_whoisactive (or other monitoring software) tells you that there’s a particular query that is running frequently. In this example, the query starts with “SELECT TOP (@ResultLimit) M.Name FROM MyTable M JOIN … etc“.
  • The query is called in exactly one place in your application (Oh man! A fix will take time to recompile, QA and deploy).
  • The query usually runs for a few minutes and using up almost all the CPU.
  • You’ve collected the query plan and the query text.
  • But, if you run the query in SQL Server Management Studio, the query completes quickly.

Some Analysis

  • The query plan looks different when run in Management Studio
  • The production database must have picked a plan based on unlucky compile-time parameters.
  • Confirmed! When running the query in Management Studio using the OPTIMIZE FOR clause with similar parameters, you can reproduce the bad plan.
  • To buy some time, it would be good to have SQL Server recompile that plan.

This is what I want to explore in this post. What is the best way to get SQL Server to recompile that plan?

Before SQL Server 2008

How do we get SQL Server to recompile this plan? Before SQL Server 2008, there were a couple things I could do:

  • DBCC FREEPROCCACHE: Holy cow! That’s a bit extreme. We ask SQL Server to recompile all of its plans. (Missed opportunity: Illustration featuring Darth Sidious saying “Wipe them out. All of them.”)
  • UPDATE STATISTICS [tablename]: Find a table that is used in the query and have SQL Server update statistics on it. All plans using the table will be recompiled afterwards (including our troubled plan!) We might not actually need updated statistics; it’s the side effect of recompiled plans that we want here. I talked a bit about that in Updating Statistics Helps, But Not For The Reasons You Think.
  • EXEC sp_recompile [tablename]: This technique is similar to UPDATE STATISTICS in that plans are dropped. The benefit is that we don’t have all the overhead and time spent creating the statistics. It’s still not ideal though. It takes a schema modification lock and based on the server’s current busy state, that will mean at least a couple minutes of blocked processes.

Targeting a Single Query Plan

SQL Server version 2008 and later allows you to take out a single query plan from cache using the FREEPROCCACHE command with a plan handle. This is a sharp-shooter technique compared to the other techniques.

I like this technique because it takes no important locks. Existing calls to this query will continue to execute and complete (using the old bad plan). But in the meantime, new calls to this query will use a recompiled plan based on the different (and hopefully better) parameters. If so you’ve just bought yourself time to find and fix this query correctly.

But you can’t just use FREEPROCCACHE without some preparation. Finding the plan handle is not straightforward. So I’ve got a script that only asks that you identify a query based on its text.

The Script

First find out how bad your query is. I’m arbitrarily defining more than 10 active queries as bad. Change the query text here. Use enough text to uniquely identify the problem query. In my case I used “TOP (@ResultLimit)”.

-- Part 1: 
-- Use this query to look for bad performing queries using "TOP (@ResultLimit)"
       count(1) as [count of running queries, should be less than 10], 
       max(datediff(second, start_time, getdate())) as [longest running such query in seconds],
       min(datediff(second, start_time, getdate())) as [shortest running such query in seconds]
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self

Then run this part to knock out that one query (remembering again to adjust the query text).

-- Part 2:
-- If the above indicates poor performance (i.e. many active MyTable queries), 
-- use the following to clear bad plan from cache.
declare @count int = 0;
declare @planhandle varbinary(64)
    @count = count(1), 
    @planhandle = max(plan_handle) -- any arbitrary plan
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self
-- this query recompiles the one identified plan.
if (@count > 10)
    dbcc freeproccache (@planhandle);

Hopefully You’ve Bought Yourself Time

The query is recompiled with a better plan and you can sit back (for now). Your query is still vulnerable to bad plans and you’ve got to fix that, but you’ve got time.

In practice I’ve used this script maybe three or four times successfully. Your mileage may vary.

How Risky Is This Script?

Well, What’s the worst thing that could happen?

To be honest, I can’t think of much that could go wrong other than it doesn’t solve your problem. In the worst case scenario:

  • your analysis was wrong, the plan is recompiled and still performs poorly.
  • Or perhaps somehow the script picks the wrong plan to recompile, that’s not bad news either, it’s just news. I certainly can’t say the same for the other techniques that were available before version 2008.
  • Maybe expectations were set incorrectly. You propose this change, It doesn’t fix anything and now it may look like your guessing (This isn’t a technical issue, but it’s something to keep in mind).

By the way, don’t take my word for it. You need to tell others that the risk is small and you have to back this claim up without my help. So you should understand what’s going on here.


Powered by WordPress