Michael J. Swart

September 11, 2013

Three More Tricky Tempdb Lessons

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

Takeaway: Tempdb contention is a common DBA problem. I want to share three lesser known surprises about everyone’s favorite bottleneck.

If my professional life was made into a series, tempdb contention would have a recurring role as the antagonist. It provides challenges on a regular basis and I finally believe I understand it’s behavior well enough to have contention licked for good. But even recently, tempdb still holds some surprises for me.

You Can Still Get SGAM Contention After TF1118

You may know what trace flag TF1118 does. It affects the allocation of pages in all databases. But it’s most often used for its effect in tempdb. The idea is that TF1118 will cause SQL Server to store all table data in dedicated extents rather than mixed extents. Without mixed extents, the need to access SGAM pages is reduced and that leads to less contention as described in this KB article.

But the use of SGAM isn’t reduced to zero. What?! How can that be? Who’s still using mixed extents? It turns out that IAM pages are always stored on mixed extents. Every time any object is created (or destroyed) that will require at least one latch on an SGAM page. On an extremely busy system, you can still see contention here.

Equally Sized Data Files Can Become Unbalanced After Autogrow

That’s not the surprising thing. The surprising thing is that it can impact the effectiveness of having multiple data files and it needs to be kept in mind. Once again from Microsoft:

If data files [have] unequal sizes, the proportional fill algorithm tries to use the largest file more for GAM allocations instead of spreading the allocations between all the files.

This makes multiple data files less effective than it could be when tackling contention. But don’t disable auto-grow, just size your tempdb data files large enough (Tip: google “instant file initialization”)

Before we move on, if your tempdb data files have auto-grown, do you know what the initial sizes were?

Tempdb Contention Can Be Sudden and Catastrophic

I hadn’t seen this before. Most contention I’m familiar with looks like a “sluggish” database. The response times on particular queries just aren’t what they ought to be – getting gradually worse until the impact is felt hard. But just recently I have seen tempdb contention show up all-of-a-sudden making the database nearly useless. Especially after excessive blocking kicks in. Here’s how that can happen:

  • Imagine a super-frequent procedure creates a temp table.
  • Several of these temp tables are cached. (There can be several execution contexts for the procedure and so several temp tables).
  • A table used in the procedure has enough modified rows to have it’s statistics auto-updated.
  • The procedure’s query plan is invalidated and is dropped from the cache.
  • The cached temp tables that were used for the procedure are marked to be destroyed
  • The procedure is recompiled and new temp tables are created
  • A system process destroys the marked temp tables that need to be cleaned up

With a very busy system, these things can happen in only a few seconds. In fact those two last items describe actions where hundreds of threads can get into a fight over allocation pages.

The following query may give a list of query plans for frequently called procedures as described:

with frequentSprocs as 
(
    select top 10 count(1) as [memory objects], cp.plan_handle from sys.dm_exec_cached_plans cp
    cross apply sys.dm_exec_cached_plan_dependent_objects(cp.plan_handle) do
    join sys.dm_os_memory_objects mo
        on do.memory_object_address = mo.memory_object_address
    where objtype = 'Proc'
    group by cp.plan_handle
    order by 1 desc
)
select fs.*, qp.query_plan
from frequentSprocs fs
cross apply sys.dm_exec_query_plan(fs.plan_handle) qp
option (recompile)

What Next?

I’m lucky to work with software that runs on systems that are crazy powerful and crazy busy. It’s unlikely that you’ll run into tempdb problems in the exact same way that we did. So the typical advice for PFS/GAM/SGAM contention is still valid:

  • Enable trace flag TF 1118
  • Create multiple (equally sized) data files for tempdb
  • Reduce how frequently you use temp tables and table variables

 

February 28, 2013

Follow up on Ad Hoc TVP contention

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

Last month I posted PAGELATCH Contention on 2:1:103. It described a troubleshooting experience I had. I was troubleshooting a performance problem that was felt when applications issued very frequent ad hoc queries that used Table Valued Parameters (TVPs).

I’ve recently learned some extra things from the SQL Server team at Microsoft that I wanted to pass on. So this post acts as a kind of update to that one.

I know. I've drawn Columbo before, but it's my blog.

Reproduction

One thing I did was to come up with an application that reproduces the contention on 2:1:103 in the same way as production. I share that here now.

What this reproduction does is launch fifty processes. Each process uses multiple threads to hit a db server with an ad hoc query that uses a TVP.  For the curious here it is: Download AdHocTVPContention.zip and then open readme.txt. This reproduction is a mashup of

  • A simple batch script which launches an executable 50 times at once. A technique I gleaned from Paul Randal in a demo of his.
  • A multi-threaded executable which sends queries asynchronously. A framework I first developed in a post I wrote on concurrency.
  • The ad hoc TVP query itself. It was a bit of a modification of my example at Table Valued Parameters, A Short Complete Example. The only difference is that the command is a query instead of a stored procedure.

What I’ve Learned Since Then

The Microsoft team helped me understand a few things. The main thing is that the particular version of SQL Server (including service pack level) matters a lot:

SQL Server 2005 and Earlier

TVPs weren’t supported then. (This post isn’t for you).

SQL Server 2008 and R2

In my earlier post, I dismissed a fix provided by Microsoft. Fix: Poor performance in SQL Server 2008 R2 when table-valued functions use many table variables. At the time, I dismissed it for a number of reasons:

  • The workaround listed on that kb article page did not help us.
  • The symptoms that were described at that issue did not match my own scenario (i.e. I wasn’t using table-valued functions).
  • The title mentioned R2 and I thought (mistakenly) that this fix was not applicable to SQL Server 2008. (Update April 5, 2013: The fix used to be available as a hotfix for 2008 SP3, but as of March 2013, this fix is now included in a cumulative update. Upgrade to SP3 CU10 if you’re using 2008 and want this fix. Microsoft has also updated the title of the fix to mention 2008)

Microsoft helped me learn that although the symptoms in that fix differ from mine, the improvement in that fix involves a change to code that also improves my scenario. So with the fix applied, on my test environment, I saw that SQL Server could handle at least 6 times the volume of ad hoc queries than it could before (your mileage may vary).

SQL Server 2012

But there’s one other thing I noticed. When I ran the reproduction on a SQL Server 2012 environment, the process just flew through the workload! Temp table creation was no longer a bottleneck; there was no more contention on temp table creation. The Microsoft Engineer I worked with noticed this too and eventually it prompted a CSS blog entry. You can read all about this 2012 performance improvement at Temp table caching improvement for table valued parameters in SQL Server 2012

Partner vs MVP

I’m very lucky to have many investigation options available to me when I’m digging into a particular quirk of SQL Server behaviour. There’s the public options like #sqlhelp or dba.stackexchange or Microsoft support. And for me there’s some private options like the MVP mailing list. But this month, I feel lucky to work as a Microsoft partner. It’s through that connection that I learned so much.

Too Long; Didn’t Read

If you’re experiencing 2:1:103 contention because of too frequent ad hoc TVP queries then:

January 22, 2013

PAGELATCH_EX Contention on 2:1:103

Filed under: SQLServerPedia Syndication — Tags: — Michael J. Swart @ 12:00 pm
Update February 28, 2013: Please don’t forget to look at my follow up post here Follow up on Ad Hoc TVP Contention for some important notes on some promising fixes/changes found in later SQL Server versions.

This blog post is meant to help people troubleshoot page latch contention on 2:1:103. If that’s what brought you to this page, then hopefully you find it useful. But first…

A textbook entitled "Don't Panic"

Initial Investigation

Last week I was asked to help tackle a production outage. Queries were slow enough that the system was considered unavailable. And just like any other performance problem I started by having SQL Server tell me what was wrong with itself.

  • I first started with sp_whoisactive but it took about 30 seconds for it to return. Several queries were blocked by others and some of the lead blockers were waiting for a PAGELATCH_EX on 2:1:103
  • Then I checked on the most common waits using a query found at Paul Randal’s article: Wait statistics, or please tell me where it hurts. (I don’t have that page bookmarked, every time I’m in trouble, I google “Tell me where it hurts Paul Randal”). I learned that PAGELATCH_EX contention was our most serious wait type. Paul mentions that he sees this kind of contention most commonly on tempdb.
  • I followed a link from Paul’s article to Robert Davis’s Breaking Down TempDB Contention. A script there helped me to discover that while there was lots of tempdb contention, it was all on page 2:1:103, which is not PFS, GAM or SGAM.

Finding More Info

So I looked for more help.

  • Web Search: My work colleague found and then pointed me to Latch waits on 2:1:103? You are probably creating too many temp tables in SQL Server by Microsoft’s Matt Wrock who faced the same problem. If you found this page because you’re facing the same issue then stop reading this article and go read Matt’s first. He explains what’s going on better than I could. Think of this article as a kind of sequel to his article.
  • StackOverflow: (serverfault in this case) has TempDB contention on sysmultiobjrefs SQL 2005. The best answer there is Matt pointing back to his blog article.
  • Microsoft Support who helped us out, but maybe not as quick as we’d like. To be fair, the turnaround time that we would have been happy with was measured in nanoseconds.

With that information it became clear to us that we were creating temp tables too often. And we were creating them in a way that made it impossible for SQL Server to cache. Did you know that? That SQL Server can cache temp tables? When a query is done with a temp table, SQL Server can truncate that table and give it to the next query to avoid having to create it again. Cached temp tables reduce tempdb contention including contention on this page. But as it turns out, SQL Server cannot cache temp tables from ad hoc queries.

But who creates temp tables that often? We did, just by using a table valued parameter in a parameterized query. Since SQL Server began supporting table valued parameters (introduced in SQL Server 2008), we have been gradually moving towards this practice in lieu of sending xml to be shredded.

… And More Info Including Some Other Links

I can’t help including some extra articles I found on 2:1:103 contention.

Robert Davis blogged about 2:1:103 contention in Tempdb Contention That Can’t Be Soothed. His advice is to remove statements in code like “SELECT … INTO”.  However, I believe that such statements contribute to PAGELATCH contention only if the statement is not part of a stored procedure (i.e. can’t be cached). I also think there are more common causes of 2:1:103 PAGELATCH contention than SELECT … INTO statements.

But Robert did link to a demo by Paul Randal (an absolutely amazing 5 minute demo linked from a 2011 issue of his newsletter). Paul tells us

  • That “the SQL Team knows about this. It’s a known issue. Hopefully something will be done about it in one of the future releases.”
  • But unlike SGAM or GAM contention, there’s absolutely nothing you can do about this to spread the contention around.

From a DBA point of view there’s not much that can be done. So as DB Developers, we have to find a workaround. But before I get to that, I want to mention some things that were not so useful.

… And Some Bad Info

There was some red herrings out there…

  • Add more tempdb files (nope, wrong kind of tempdb contention).
  • Many resources suggested enabling trace flag T1118. It’s a trace flag that eases tempdb contention, but not this kind.
  • Get faster tempdb disks? No, PAGELATCH contention is for in-memory copies of pages, not disk (that’s PAGEIOLATCH).
  • Tune the queries in question? Not a bad idea, but this problem is about the number of problem queries, not the performance of each.
  • A fix from Microsoft Support: Fix: Poor performance in SQL Server 2008 R2 when table-valued functions use many table variables. Oooh… so close, but we weren’t using table-valued functions. And the workarounds they list (disabling AUTO_UPDATE_STATISTICS) did not help. But maybe it might help you?
  • The Object:Created event. Whether traced with Profiler or collected with an Extended Events session, this event can report on created temp tables. Maybe I can use this event to tell me which queries are creating the tempdb tables. Nope, not this time! This event  has two drawbacks which make it useless for troubleshooting 2:1:103 contention:
    • The Object:Created event reports the creation of temp tables even when they’re cached (which don’t need a latch on 2:1:103).
    • The Object:Created event doesn’t report the declaration of any table variables (which may need a latch on 2:1:103).

Strategy: So What Do We Do?

Knowing is half the battle right? But that means we still have a lot of work to do. I’m going to recap what we know so far. We know that we have trouble when there are queries that:

  • are executed frequently
  • create temp tables (either explicitly or by declaring table valued parameters).
  • are not cached (Microsoft explains when temp tables are not cached. In my case, it was because they were ad-hoc queries)
  • require a page latch on 2:1:103.

There is a performance counter that can track the all of the above (except maybe for that last bullet). It’s called Temp Tables Creation Rate and it’s found in the perf counter category “General Statistics”. Now this is a metric you can trust. We found that a high temp table creation rate was tightly correlated to the trouble we were seeing. So when troubleshooting, look at this performance counter (and leave the “Object:Create” event alone).

So now what do we do? First, we must find the ad hoc queries that create these temp tables. Then, we have to put them in stored procedures so that the temp tables can be cached. Alternatively, we could reduce the need for creating them. It’s a workaround, but it’s what we’ve got.

Finding Such Queries Is Difficult

But here’s the hard part. In a high volume system, it’s difficult to identify exactly which queries are causing the most trouble. Microsoft support can go through tons of collected trace and performance data to try to find such queries, but it’s a long process. On our side, we looked at sys.dm_os_waiting_tasks:

Select *
From sys.dm_os_waiting_tasks

And we saw all the contention on 2:1:103, but when we tried to look up the SQL text for it

Select wt.*, st.text
From sys.dm_os_waiting_tasks wt
left join sys.dm_exec_requests er
    on er.session_id = wt.session_id
outer apply fn_get_sql(er.sql_handle) st
where wt.resource_description is not null

The text was often unavailable. Basically, I’m guessing maybe the dmv’s I was using weren’t quick enough to tell me which queries were suffering from (or causing) contention on 2:1:103. So I decided to look through the cache for query candidates that might create temp tables. here’s what I came up with. It’s not a comprehensive list and there might be false positives, but it might be enough to go on. If you know your applications well, you can tailor the filters below to something more relevant for you.

select cp.plan_handle, sql_handle, text, refcounts, usecounts
from sys.dm_exec_cached_plans cp
join sys.dm_exec_query_stats qs
    on cp.plan_handle = qs.plan_handle
cross apply sys.dm_exec_sql_text(qs.sql_handle) st
where (
       (
        st.text like N'%READONLY%'  -- possible table valued parameter (prepared queries)
        or st.text like N'%TABLE %' -- possible table creation
        or st.text like '%#%'       -- possible use of temp table
        or st.text like '%INTO%'    -- possible table creation
       )
       and objtype in ('prepared', 'ad hoc') -- temp tables not cachable with these
)
order by usecounts desc

Going Forward

Personally after helping identify and implement the workarounds. I’m doing a couple things:

  • I’m recommending that developers not create temp tables or declare table variables that cannot be cached. For now, this means we use stored procedures for any query that uses table variables or temp tables.
  • We now have thresholds on our performance tests which look at the performance counter Temp Tables Creation Rate.
  • I created a Microsoft Connect item. If you’re troubleshooting the same problem, head over there and let Microsoft know you’re having trouble too.

Powered by WordPress