This month it’s Sankar Reddy’s turn to host T-SQL Tuesday. I wish him luck! I know I had a blast last month. This month we’re busting SQL Server misconceptions wherever we find them.
So today I want to explain something about the practice of updating statistics as a technique to improve or maintain performance. It works and it works well, but often not for the reasons you think and I’m going to explain a simpler equivalent technique.
So you’ve been given the job of finding out why a particular stored procedure is performing poorly. It’s a complicated query plan and one of the first things you notice is that there’s hundreds of thousands of rows coming out of a particular table. After looking closer you figure out that the estimated number of rows is only supposed to be around thirty!
Remember that estimated row counts are based on what the database knows about the data and that info is stored in objects called statistics. As the table grows, these statistics can get out of date. So you try:
UPDATE STATISTICS dbo.MyTable WITH FULLSCAN
which solves the problem. (Good job, take a bow!)
Why It Works
Statistics are used to get an idea of how data is distributed. Based on the data distribution, SQL Server can make better choices on how it chooses query plans. If tables change and stats are out of date, the chosen query plans can be less than optimal. And bad plans have a bad habit of performing orders of magnitude worse than optimal plans.
Why It Really Works
I’ve found that out-of-date statistics can be responsible for bad query plans. But more frequently, I see that the parameter values that are used to compile the plan have just as big an effect.
SQL Server doesn’t just use statistics to estimate the number of rows it’s going to retrieve from a table. It also uses variable values and constants found in the query. And in the case of variables, it always uses the first variable values that it sees when the query gets compiled.
So for example, If I’m looking up a person in the phonebook by last name. The number of people I find is going to depend on whether that last name is common like “Martin” or unique like “Jingleheimer-Schmidt”. Based on the search argument, different plans can get picked which affects database performance.
So even if statistics are already up to date, the act of calling UPDATE STATISTICS can still have an effect. When you update statistics on a table, SQL Server throws out all the query plans that depend on that table (but see comments). And that query gets recompiled with the next parameter values that come along. And these may be different than the original values.
It’s the equivalent of a giving SQL Server a good thumping, or jolt.
So if you’re getting ahead of me, you can see that updating stats won’t always work, but I’ve seen others use it as part of a trial-and-error approach to query tuning. So that when it works sometimes, it becomes part of your mental toolbelt of tricks to try.
Alternative Technique 1
So you’ve used STATS_DATE and you know that statistics are up to date and you want to give SQL Server the same “thump” to build a new query plan without updating stats. You can use:
Or even more targeted:
DBCC FREEPROCCACHE (plan_handle)
But I would only recommend this for acute performance problems, it by no means addresses the root cause.
Alternative Technique 2
If you decide that SQL Server needs help in picking its plan consistently you can use something more permanent. Use query hints like
- OPTIMIZE FOR UNKNOWN,
- OPTIMIZE FOR @LastName = “Martin”
- USE PLAN …
But this means you’re telling SQL Server you know better than it does. So make sure you understand every nook and cranny of your tables’ data distribution and query plans.
The misconception is that STATISTICS has a greater effect on query performance than it actually does. The query plans that SQL Server chooses depend on both statistics and compiled parameter values.
Try to get away from a trial-and-error method of query tuning. Learn how to look at the following before making a plan:
- Find the query plan handle and query plan for a poor performing query.
- Estimated row counts vs. Actual row counts
- What parameter values were used to compile a query plan.
- When statistics were last udpated for a particular index.
- The data distribution (histogram) of statistics.