Updating Statistics Helps, But Not For The Reasons You Think

October 11, 2010

Updating Statistics Helps, But Not For The Reasons You Think

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, update statistics — Michael J. Swart @ 8:12 pm

This month it’s Sankar Reddy’s turn to host T-SQL Tuesday. I wish him luck! I know I had a blast last month. This month we’re busting SQL Server misconceptions wherever we find them.

So today I want to explain something about the practice of updating statistics as a technique to improve or maintain performance. It works and it works well, but often not for the reasons you think and I’m going to explain a simpler equivalent technique.

The Story

So you’ve been given the job of finding out why a particular stored procedure is performing poorly. It’s a complicated query plan and one of the first things you notice is that there’s hundreds of thousands of rows coming out of a particular table. After looking closer you figure out that the estimated number of rows is only supposed to be around thirty!

Remember that estimated row counts are based on what the database knows about the data and that info is stored in objects called statistics. As the table grows, these statistics can get out of date. So you try:

UPDATE STATISTICS dbo.MyTable WITH FULLSCAN

which solves the problem. (Good job, take a bow!)

Why It Works

Statistics are used to get an idea of how data is distributed. Based on the data distribution, SQL Server can make better choices on how it chooses query plans. If tables change and stats are out of date, the chosen query plans can be less than optimal. And bad plans have a bad habit of performing orders of magnitude worse than optimal plans.

Why It Really Works

I’ve found that out-of-date statistics can be responsible for bad query plans. But more frequently, I see that the parameter values that are used to compile the plan have just as big an effect.

SQL Server doesn’t just use statistics to estimate the number of rows it’s going to retrieve from a table. It also uses variable values and constants found in the query. And in the case of variables, it always uses the first variable values that it sees when the query gets compiled.

So for example, If I’m looking up a person in the phonebook by last name. The number of people I find is going to depend on whether that last name is common like “Martin” or unique like “Jingleheimer-Schmidt”. Based on the search argument, different plans can get picked which affects database performance.

So even if statistics are already up to date, the act of calling UPDATE STATISTICS can still have an effect. When you update statistics on a table, SQL Server throws out all the query plans that depend on that table (but see comments). And that query gets recompiled with the next parameter values that come along. And these may be different than the original values.

It’s the equivalent of a giving SQL Server a good thumping, or jolt.

So if you’re getting ahead of me, you can see that updating stats won’t always work, but I’ve seen others use it as part of a trial-and-error approach to query tuning. So that when it works sometimes, it becomes part of your mental toolbelt of tricks to try.

Alternative Technique 1

So you’ve used STATS_DATE and you know that statistics are up to date and you want to give SQL Server the same “thump” to build a new query plan without updating stats. You can use:

sp_recompile @objname='MyTable'

Or even more targeted:

DBCC FREEPROCCACHE (plan_handle)

But I would only recommend this for acute performance problems, it by no means addresses the root cause.

Alternative Technique 2

If you decide that SQL Server needs help in picking its plan consistently you can use something more permanent. Use query hints like

OPTIMIZE FOR UNKNOWN,
OPTIMIZE FOR @LastName = “Martin”
USE PLAN …

But this means you’re telling SQL Server you know better than it does. So make sure you understand every nook and cranny of your tables’ data distribution and query plans.

Conclusion

The misconception is that STATISTICS has a greater effect on query performance than it actually does. The query plans that SQL Server chooses depend on both statistics and compiled parameter values.

Try to get away from a trial-and-error method of query tuning. Learn how to look at the following before making a plan:

Find the query plan handle and query plan for a poor performing query.
Estimated row counts vs. Actual row counts
What parameter values were used to compile a query plan.
When statistics were last udpated for a particular index.
The data distribution (histogram) of statistics.

-- Comments (4)

4 Comments »

I thought about doing my T-SQL Tuesday post on statistics as well. My idea was the perception that SQL Server won’t use statistics that are too old. In the end, I chose something totally different. I think statistics is very misunderstood and a great topic for the subject of misconceptions. You definitely tackled one of the misconceptions well!!

I do see one inaccuracy in the post though. This statement: “When you update statistics on a table, SQL Server throws out all the query plans that depend on that table.”

When you update statistics, it doesn’t cause a recompile of all query plans that reference the table. It only causes a recompile of the plans that reference the statistics (or index owning the statistics) in the query plan. For example, if I have a query plan that is performing a clustered index scan instead of a nonclustered index seek, and I update statistics on the nonclustered index, this query plan does not get recompiled because it does not reference the index. It is recommended to call sp_recompile on the table to force recompile of all query plans that reference the table.

Comment by Robert L Davis — October 12, 2010 @ 10:29 am
You’re right of course, but in my example:

UPDATE STATISTICS dbo.MyTable WITH FULLSCAN

I’m referring to statistics on a table (i.e. all it’s indexes and stats) are updated. For the syntax I mentioned, all stats and indexes are updated (unless I’m wrong on that point, but I don’t think so).

But it’s a fair point. Especially in this post. Even though updating a tables worth of statistics amounts to throwing out the plans that depend on that table, it’s the dependency on the stats or indexes that cause the recompile.

The whole point of my post is to know the correct reasons why things happen in SQL Server. So I should have been clearer on that point.

Thanks for reading and commenting Robert! I like that there’s someone out there reading my posts. Keeping me honest.

Comment by Michael J. Swart — October 12, 2010 @ 10:45 am
Yes, you’re right. In that case, it would recompile all plans using that table!!

Comment by Robert L Davis — October 12, 2010 @ 5:09 pm
[…] Michael talks about one of the myths around statistics in SQL Server. There are so many misconceptions around this topic. He talks about the effects of parameter sniffing and query hint options to control the plan re-use. […]

Pingback by T-SQL Tuesday #11 Round up, Misconceptions in SQL Server | Sankar Reddy, SQL Server Developer/DBA — November 8, 2010 @ 11:58 pm

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

October 11, 2010