Michael J. Swart

December 11, 2009

Tracking Down Elusive Query Timeouts

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 3:00 am

Andy Leonard (fine fellow) recently wrote:

If I solve the same problem twice, I blog about it… for me!

So taking a page from his book I want to talk about a problem I recently had. I’ll explain the symptoms and the root cause. (BTW, the format I’m going for is very similar to the SQL CSS team’s blog. They have good case studies there too.)

The Symptoms

  • Timeouts on one particular type of ad-hoc query were reported.
  • Our custom error logging framework told us that these timeouts were coming in stretches and always on the same type of query.
  • These stretches of timeout errors were not based on any particular hour of the day/day of the week.
  • But these stretches were elusive, most of the time the queries are sub-second.
  • These ad-hoc queries were not heavy hitters. They never returned more than 50 rows.
  • Ruled out blocking and bad query plans. The query read data from one table only… no self joins, no scans.
  • However this table was the largest in the db, (both by rows and by space).
  • The query itself filtered on a number of columns including on a particular ntext column. (Now we’re getting somewhere).
  • We finally caught the queries in a “bad stretch”:
    • The queries were waiting on Page IO.
    • The estimated query plan wouldn’t return.
    • And finally, the server trace showed millions of reads when updating statistics on the ntext column.

So there you have it. That last point most closely gives away what’s going on. SQL Server was (auto)-updating statistics but couldn’t complete before the timeout. This meant stats weren’t updated. Setting up the same situation for the next sucker query to come along.

Our Setup, the Perfect Storm

Our particular database was:

  • an OLTP database that slowly grew into its huge size (i.e. no large loads).
  • using an older schema that hadn’t been refactored in a while (hence the ntext columns).
  • doing queries which filtered on this ntext columns. (Causing the optimizer to create/maintain stats on this column). It doesn’t matter that the majority of these ntext values were less than 30 characters.
  • a db that had auto-create and auto-update stats on.
  • a db that had auto-update-stats async off.

The Solution

Turn the database setting AUTO_UPDATE_STATISTICS_ASYNC to ON. Even though we had nightly maintenance jobs that updated these stats, sometimes SQL Server sometimes decided that it needed newer ones.

In our case this is the answer. You have to be careful with this setting, especially in Data Warehouse scenarios as Adam Machanic found out (via Chad Boyd).

One More Thing To Learn

SQL Server has a really hard time updating stats on text or ntext columns. I mean a really, really hard time. I wrote the following script to illustrate the difference between ntext and nvarchar(max) in this respect.

--first we create a large table:
create table big
(
	id uniqueidentifier default (newid()) not null primary key,
	nvc_max_value nvarchar(50) null,
	ntext_value ntext null
);
 
set nocount on
go
 
-- this next part takes a while (what with the page-splitting)
insert big default values
go 1000000
 
update big
set nvc_max_value = CAST(id as nvarchar(max)),
	ntext_value = CAST(CAST(id as nvarchar(max)) as ntext);
 
alter index all on big rebuild;
 
-- start profiler here (with appropriate filters):
dbcc dropcleanbuffers
select COUNT(1) from big where nvc_max_value LIKE N'%abcd%'
dbcc dropcleanbuffers
select COUNT(1) from big where ntext_value LIKE N'%abcd%'
/*
                     CPU    Duration  Reads    Writes
-------------------  ------ --------- -------- ---------
nvarchar(max) stats   656       2082     2799  29
ntext stats          2906      12265  3319065  32
*/

Updating the ntext stats requires over 1000 times the reads of the corresponding nvarchar(max)!!! And even though many of these reads were not physical reads, enough of them were so that the ntext case had duration and cpu values that were around five times larger than the nvarchar(max) case.

This is likely because all ntext values are not stored in-row but in LOB pages. Even so, I didn’t expect that big a difference.

Is that it?

Not entirely but the other details aren’t relevant to you dear blog reader. Until next time…

3 Comments »

  1. […] Tracking Down Elusive Query Timeouts – Have you ever needed to use the AUTO_UPDATE_STATISTICS_ASYNC database setting? Do you know why you would perhaps want to? Here’s an excellent real world walk-through from the The Database Whisperer showing how he tackled a problem using AUTO_UPDATE_STATISTICS_ASYNC. […]

    Pingback by Something for the weekend – SQL Server Links 11/12/09 | John Sansom - SQL Server DBA in the UK — December 12, 2009 @ 5:53 am

  2. I really enjoyed your post as it is very pertinent to an issue with “Update Statistics” that I am having.

    In your post, you said “And finally, the server trace showed millions of reads when updating statistics on the ntext column”…

    Do you refer to the “Default Trace Files”, SQL Server Profiler Trace or which trace and how exactly could you tell apart reads attributed to normal query servicing from reads attributed to “Update Stats” reads.

    Comment by Daniel Adeniji — March 30, 2011 @ 11:38 am

  3. I’m really really glad you found this useful Daniel.

    SQL Server Profiler trace is what I meant here. And in this case, you can’t easily tell the reads from stats servicing reads from normal query reads.

    But if you look for SP:StmtStarting and SP:StmtCompleted events with textdata like ‘SELECT StatMan’ you can see something interesting. You can see the query that sql server performs for gathering stats.

    But I would caution you about running traces on production systems. You really have to step lightly. I’ve been burned too many times by that.

    Comment by Michael J. Swart — March 30, 2011 @ 11:56 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress