Andy Leonard (fine fellow) recently wrote:
If I solve the same problem twice, I blog about it… for me!
So taking a page from his book I want to talk about a problem I recently had. I’ll explain the symptoms and the root cause. (BTW, the format I’m going for is very similar to the SQL CSS team’s blog. They have good case studies there too.)
- Timeouts on one particular type of ad-hoc query were reported.
- Our custom error logging framework told us that these timeouts were coming in stretches and always on the same type of query.
- These stretches of timeout errors were not based on any particular hour of the day/day of the week.
- But these stretches were elusive, most of the time the queries are sub-second.
- These ad-hoc queries were not heavy hitters. They never returned more than 50 rows.
- Ruled out blocking and bad query plans. The query read data from one table only… no self joins, no scans.
- However this table was the largest in the db, (both by rows and by space).
- The query itself filtered on a number of columns including on a particular ntext column. (Now we’re getting somewhere).
- We finally caught the queries in a “bad stretch”:
- The queries were waiting on Page IO.
- The estimated query plan wouldn’t return.
- And finally, the server trace showed millions of reads when updating statistics on the ntext column.
So there you have it. That last point most closely gives away what’s going on. SQL Server was (auto)-updating statistics but couldn’t complete before the timeout. This meant stats weren’t updated. Setting up the same situation for the next sucker query to come along.
Our Setup, the Perfect Storm
Our particular database was:
- an OLTP database that slowly grew into its huge size (i.e. no large loads).
- using an older schema that hadn’t been refactored in a while (hence the ntext columns).
- doing queries which filtered on this ntext columns. (Causing the optimizer to create/maintain stats on this column). It doesn’t matter that the majority of these ntext values were less than 30 characters.
- a db that had auto-create and auto-update stats on.
- a db that had auto-update-stats async off.
Turn the database setting AUTO_UPDATE_STATISTICS_ASYNC to ON. Even though we had nightly maintenance jobs that updated these stats, sometimes SQL Server sometimes decided that it needed newer ones.
In our case this is the answer. You have to be careful with this setting, especially in Data Warehouse scenarios as Adam Machanic found out (via Chad Boyd).
One More Thing To Learn
SQL Server has a really hard time updating stats on text or ntext columns. I mean a really, really hard time. I wrote the following script to illustrate the difference between ntext and nvarchar(max) in this respect.
--first we create a large table: create table big ( id uniqueidentifier default (newid()) not null primary key, nvc_max_value nvarchar(50) null, ntext_value ntext null ); set nocount on go -- this next part takes a while (what with the page-splitting) insert big default values go 1000000 update big set nvc_max_value = CAST(id as nvarchar(max)), ntext_value = CAST(CAST(id as nvarchar(max)) as ntext); alter index all on big rebuild; -- start profiler here (with appropriate filters): dbcc dropcleanbuffers select COUNT(1) from big where nvc_max_value LIKE N'%abcd%' dbcc dropcleanbuffers select COUNT(1) from big where ntext_value LIKE N'%abcd%' /* CPU Duration Reads Writes ------------------- ------ --------- -------- --------- nvarchar(max) stats 656 2082 2799 29 ntext stats 2906 12265 3319065 32 */
Updating the ntext stats requires over 1000 times the reads of the corresponding nvarchar(max)!!! And even though many of these reads were not physical reads, enough of them were so that the ntext case had duration and cpu values that were around five times larger than the nvarchar(max) case.
This is likely because all ntext values are not stored in-row but in LOB pages. Even so, I didn’t expect that big a difference.
Is that it?
Not entirely but the other details aren’t relevant to you dear blog reader. Until next time…