Michael J. Swart

May 28, 2013

Swart’s Ten Percent Rule

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

A picture of coffee captioned: "Success: ninety percent hard work, ten percent coffee"“Just because you can, doesn’t mean you should.” SQL Server has a number of limits and restrictions (many of the limits are described here). Most of the time these restrictions don’t feel restrictive at all. I can’t think of a single scenario where I’d want more than 128 characters to name a column. And I don’t think I’ll ever need to select more than 4096 columns at a time. I get the feeling that if you need over 10% of what Microsoft restricts you to, then you’re probably doing it wrong.

So I turn that thought into Swart’s Ten Percent Rule:

If you’re using over 10% of what SQL Server restricts you to, you’re doing it wrong.

Always? No not always. It depends of course on what restriction we’re talking about. Sometimes the rule applies more than other times. I’d like to evaluate a number of restrictions here and judge whether the rule applies or not. I’ll be assigning each 10% rule a “Risk that you’re doing it wrong”. The levels are Severe, High, Elevated, Guarded or Low.


SQL Statement Length

Max Size: 256 megabytes (Using the default network packet size.)
10% of restriction: 25 megabytes
Risk of doing it wrong: Severe

Let’s take a step back. Why do you want to execute a statement that’s 25 megabytes long? The only scenario I can think of is that you’re sending data to the database to be parsed and stored. There’s much better ways to send data to the database then this (See Erland Sommarskog’s definitive Arrays and Lists in SQL Server).

Or maybe you’re creating a one-off huge IN clause to pass to SQL. Better to join to a temp table or a table valued parameter.


Identifier Length

Max Size: 128 characters
10% of restriction: 12 characters.
Risk of doing it wrong: Low

Twelve characters is not a lot to work with when coming up with names for tables or columns. If you had a column or table name with more than 12 characters, I wouldn’t be at all concerned. For comparison, Adventurework’s average column is 11 characters long. Their largest column is “HomeAddressSpecialInstructions” at 30 characters. My friend, Karen Lopez, told me she once came across this beast of a column name: “RetailTransactionLineItemModifierEventReasonCode” at 48 characters. That’s getting extreme (despite intellisense), but 12 is fine.


Number of Columns in a Table

Max Size: 1024 Columns
10% of restriction: 102 columns
Risk of doing it wrong: High

Maybe you need a lot of columns because you’re treating a table like a spreadsheet. The worst offender I’ve ever seen had a whopping 96 columns. It was designed by someone who didn’t know any better. By the way, the most notable modeling smell coming from this table was the columns named [Int01] to [Int30]). In this case, the database designer was definitely doing it wrong.

Maybe you need a lot of columns to support growing functionality. As an example, say you have a table like

CREATE TABLE UserProfile
(
    UserId int not null primary key,
    PhotoFileId int null,
    PreferenceTags varchar(1000) not null,
    CanEditWidget bit not null,
    CanViewWidget bit not null,
    CanAddWidget bit not null,
    IsMonitored bit not null,
    ...,
    IsApprovedForFastTrack bit not null
)

Then the thing to ask yourself is why is this list of columns changing so much? In this example, we see a growing list of boolean attributes (which will probably continue to grow as the application evolves). In this case it makes sense to redesign your schema using:

CREATE TABLE UserProfile
(
    UserId int not null primary key,
    PhotoFileId int null,
    PreferenceTags varchar(1000) not null
)
 
CREATE TABLE UserPermissions
(
    UserId int not null 
        references UserProfile(UserId),
    PermissionId int not null
        references Permissions(PermissionId),
    IsGranted bit not null
)

But maybe you really do need this number of columns, you’re probably storing semi-structured data. By semistructured data, I always think of how one would store cascading style sheets or health records.

Semistructured data is really difficult to model. As an alternative to these difficult-to-manage columns, consider storing the values in an xml column (accepting that you can fetch, but probably no longer want to filter). If you need something more radical, maybe consider a schema-less json document database like couchdb (I can’t believe I said that).

SQL Server has something called wide tables which use sparse columns but you’re on your own here. It’s not something I could recommend.


Simultaneous Server Connections

Max Size: 32,767
10% of restriction: 3,277.
Risk of doing it wrong: Guarded

The number of connections your SQL Server instance can support can be found by this query

SELECT @@MAX_CONNECTIONS as [this many]

and the number of open connections on the server is basically this

SELECT count(1) as [this many] FROM sys.dm_exec_sessions

It’s not that unusual to have 1000 or 2000 connections open. It is rarer to see that many active queries. But it’s not unusual to see 1000s of connections, most of whom are sleeping. It’s possible you have a large number of services and web servers connecting to a variety of databases on your server. With ado.net connection pools, it’s common to have a bunch of idle connections.

The best advice I can give is to get a baseline. Watch the connections regularly in order to understand what is normal. Maybe start with

select 
    count(1) as [connections], 
    count(case [status] when 'sleeping' then null else 1 end) as [active connections],
    db_name([dbid]) as [database name],
    [hostname], 
    [loginame],
    [program_name]
from sys.sysprocesses
where [dbid] NOT IN (0)
group by [hostname], [loginame], [dbid], [program_name]
order by connections desc

Then go from there. Once you know what’s normal, you can then watch for changes to the status quo.


Database Size

Max Size: 512 Petabytes
10% of restriction: 51 Petabytes

51 Petabytes is unwieldy in so many ways I’m not going to list them here. I’m not saying such big data sizes are impossible with a centralized RDBMS like SQL Server. But I am saying that if you’re not sure about whether you can scale SQL Server to that size and you’re reading my blog looking for advice then…

Risk of doing it wrong: Severe


Other metrics

I think Swart’s Ten Percent Rule applies to a lot of things. Try to apply it the next time you come across a SQL Server restriction and let me know if you find anything interesting.

May 14, 2013

Myself in 2004

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication — Michael J. Swart @ 11:53 am

This is me in 2004

9 years ago

I’m in the center with the glasses

This picture was taken in an electronics factory in Zhongshan, China.

I’m not a big traveler and I never have been.  I thought the opportunity to see China was a once in a lifetime thing. In my case, I found myself making the trip three times courtesy of my employer. It wasn’t the adventure I thought it would be. Unlike most tourists to China, my memories don’t include much else other than hotels, ferries and factory locations like the one you see here. I remember counting the days until each trip was over.

When I look at this picture, I remember a lot of things. I remember the anti-static lab coats and I’m reminded that I used to have more hair than I do now. The machine in front of me is meant to test a motherboard for the upcoming XBox 360. The guy on the right was an engineer, a hardware guy. My job was to install and support the software that collected test results and store them in a SQL Server database.

In Over My Head?

I remember the other people in this photo. It was an amazing assembly of talent. The people here were smart. But not just smart, they were smart and competent. They were ambitious and passionate about what they do and it’s rare to see that. I felt a little overwhelmed and I felt a little out of place. Everyone seemed so confident about what needed to be done. I guess I was no different. I knew what I had to do, but the confidence I showed was half acting.

For the most part, I was lucky. I installed the software and waited for a problem to support. It’s a testament to the company I worked for that I was able to wait more than work. But it wasn’t all super-smooth sailing.

Eventually I was asked to troubleshoot a burning issue and I wasn’t used to the pressure; hopefully it didn’t show. I felt like I was thrown into the deep end in order to learn how to swim. The head guy asked me to resolve something (and by the way Michael, the factory can’t operate until you do). The issues I faced were new to me and the pressure was on. Here’s a small subset of the kind of things I was asked to tackle:

  • Replication configuration issues. <bleah!>
  • Log files filling up (because replication’s log reader wasn’t operating properly)<ouch>
  • Concurrency bottlenecks. <just the beginning of the rest of my life>

In my career up until then, I was always able to ask a more experienced colleague for help. Here was the first time where I was it. There were no other colleagues to call on. If I couldn’t crack this nut, it wasn’t going to get cracked. In this case it was me and books online vs. SQL Server 2000. I did eventually get through those days and it felt amazing to beat those issues. But during the crisis itself, there was some anxiety.

Out of the Deep End

I came away from those experience with a bit more confidence than I started with. It was the first time I thought, “I’m fine, I can handle this”. By the end of my trips to China, the confidence I was showing wasn’t acting any more. I wasn’t just solving such crises, I was avoiding them. And back in Canada, I was asked to be on-call overnight in order to field questions from others in China.

I don’t believe there exists a training course anywhere that is equivalent to the confidence gained by solving these FIX IT NOW crises.

And it didn’t stop there, these other things helped boost my confidence even further:

  • SQL Server released 2005 with features that made a DBAs life so much easier than when supporting 2000.
  • I left my software development job for a different job focusing on databases full time.
  • twitter and #sqlhelp happened. It made me feel like I had the world on call. I don’t use it as much as I used to, but it’s nice to now it’s still there if I need it.

The Guy in the Picture

I’ve showed this picture a couple times to others in the past year. Each time I was encouraged to blog about it. This is me doing that. I have a lot of stories I could tell about these trips. In fact, I wish now that I kept a journal. So dear reader, if sometime in the future, we find ourselves hanging out and have nothing to chat about, ask me about 2004 Michael. Until then…

April 17, 2013

The Sch-M lock is Evil

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

A necessary evil, but still evil. Why? Because it it won’t share with Sch-S and Sch-S is taken and held for absolutely everything (Yes, even your NOLOCK queries). And that can lead to some interesting concurrency problems. Let me explain.

Sch-M

Sch-M is an abbreviation for Schema Modification. It refers to a kind of lock that is taken on a table/index or other object whenever you want to modify that object. SQL Server allows only one Sch-M lock on an object at a time. So if you want to modify a table, your process waits to takes a Sch-M lock on that table. Once that lock is granted, the modification is completed and then the lock is released.

Sch-S

Sch-S is an abbreviation for Schema Stability. It is a kind of lock that is taken on an object when a process doesn’t want that object to change its definition. It makes sense. If I’m reading a set of rows through a table, I don’t want a column to disappear on me half way through. SQL Server allows many Sch-S locks on a table.

Sch-S vs. Sch-M

But Sch-S locks are incompatible with Sch-M locks. This means that when you want to modify a table, you’re not granted a Sch-M lock immediately. You have to wait for everyone using that table to finish using it. You’re essentially put on hold until the existing queries complete and their existing Sch-S locks are released. This also means that while you’re waiting, every query who wants to begin using that table is waiting in line behind you too. Basically “Everybody outta the pool while the lifeguards change shifts.” But that’s usually acceptable right? Database schema modifications are a big enough change to require a maintenance window.

Index Rebuilds Are Table Modifications

It’s true, if you have the luxury of maintenance windows for your DB changes, you’ll be alright. But you also have to consider your database maintenance plans (automated or otherwise). Those plans can launch index rebuilds while the database is online. And all index rebuilds also count as table modifications and take Sch-M locks. An index rebuild has syntax like this:

ALTER INDEX [PK_MyTable] ON [MyTable] REBUILD WITH (ONLINE=ON)

Hopefully you’ve remembered that ONLINE=ON part. When you use that part, the index is rebuilt in the background and at the end of that processing time, a Sch-M lock is taken and released very quickly.

But maybe you’re not so lucky. Maybe you’re not running 2012 yet and have an index that includes blobs. Or maybe you’re running on Standard Edition. In those cases you won’t be able to use the ONLINE=ON feature. In that case, the Sch-M lock is taken by the rebuild process and it’s held the entire time that index is rebuilt. During the rebuild, that index is now truly offline. No access for you.

You Can Get Stuck

Just like I did. A while ago, I was asked to help with this exact situation. An index rebuild had been running for hours it was offline and the Sch-M lock that was held was preventing anybody from using or even looking at that table. I was stuck between a rock and a hard place. I had to choose between letting the index rebuild complete (which could take hours) or cancelling the job (whose rollback could take hours). There was nothing I could do to avoid additional hours of downtime. There was another bit of irony in my situation. We didn’t care about the existing data in that table. A truncate table or drop/recreate table would have suited us just fine.

… Like Really Stuck

It occurred to me to try something out. What if I created an identical empty table with a different name. We didn’t need any data in the locked table. So using a new table could work. And because the locked table is only accessed by stored procedures, I can modify those procedures to use the new table instead of the locked one.

Nope!

For some reason, the ALTER PROCEDURE requires a Sch-S lock on the old table, the table it no longer refers to. The sprocs can’t operate or be modified without a Sch-S lock on that locked table. This sketch illustrates my situation.

Make sure your index plans never attempt to rebuild big indexes offline.

Extras

Some relevant DBAReactions gifs:

April 8, 2013

T-SQL Tuesday #41 – Presenting and loving it?

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication — Tags: — Michael J. Swart @ 8:00 pm

T-SQL Tuesday LogoBob Pusateri hosts this months T-SQL Tuesday. The topic is Presenting and Loving it! You’ll notice that the topic of this blog post takes the exclamation mark and turns it into a question mark. The thing is that I’m not much of a presenter. I’ve presented twice in my life to groups who were not coworkers. Three times if you count my best-man speech for my brother.

In Bob’s TSQL Tuesday invitation, he asks us “How did you come to love presenting?” and “When was the first time you presented and really loved it?” I’ll let you know when and if that ever happens.

Take my SQL Server 2000 instances ... Please!

But guess what? It turns out that I’m giving a talk to the Toronto SQL Server User Group tonight which will let me grow my presentation CV from two SQL talks to three. I bet no other blogger can brag that they’ve given 33% of their SQL Server Presentations on the same day as Bob’s T-SQL Tuesday about presentations.

Advice

So I’ve got a very short list of advice to give you today. I can only describe some of the things that I’ve done to prepare:

  • I follow Paul Randal’s advice at Configuring SSMS for presenting. I follow it to the letter. It’s easy to set up and makes SSMS readable.
  • I install Zoomit. Just in case. I don’t think I have any content that requires it, but you never know.
  • I installed SSMS Toolspack by Mladen Prajdić. The SQL Snippets feature (which are still way better than SSMS’s native snippets feature). Are great for keeping demo scripts at your fingertips. I have snippets for “d1″, “d2″, “d3″ and “d4″. Which correspond to the demos I plan to show.

In Toronto?

So are you in Toronto today? Got plans? Come on out to the user group tonight. I expect it will be pretty fun. I’m really comfortable with the topic and so I think it will be a blast. So when was the first time I presented and really loved it? Ask me again tomorrow.

April 3, 2013

Splitting Time Between Hobbies

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

Google reader is dying, but I don’t think it’s the end of RSS and blogs. Blogs, I think, will always be a great medium for technical communities. But it has got me thinking about where and how I spend my time.

For those who care, I’m going to begin splitting time between SQL blogging and drawing. For the past couple years, you may notice that I’ve always combined both.

In practical terms, that probably means simpler or fewer illustrations here at MichaelJSwart.com.

But it also means I’ll begin posting stuff like the following over at deviantArt: http://mjswart.deviantart.com/ rather than here.  Or at least all the digital stuff and any analog stuff I feel like scanning.

Cheers!

Arwen Undomiel

Altering Text Columns: Only a Metadata Change?

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

Say you want to change the type of a text column using the ALTER TABLE … ALTER COLUMN syntax. It is valuable to know how much work SQL Server will have to do to fulfill your request. When your tables are large, it can mean the difference between a maintenance window that lasts five minutes, or one that lasts five hours or more.

I give a list of exactly when you’ll feel that pain and when you won’t.

A joke about char(max)

(BTW, CHAR(MAX) columns are impossible, you’ll get the gag if you figure out why)

When is the Whole Table Processed?

Here are conditions which require processing the entire table:

  • switching from unicode to non-unicode or vice versa.
  • changing a column from nullable to not nullable.
  • going from fixed length field to variable length field or vice versa.
  • decreasing the maximum length of a field.
  • increasing the maximum length of a fixed length field.
  • converting from limited length columns to unlimited or vice versa. (e.g. varchar(10) to varchar(max)).
  • collation modifications on non-unicode columns that change character set or code page. (See Collation Hell Part 3 by Dan Guzman)

On large tables, any of the above alterations will be a heavy hitter and will take time and fill transaction log (except that a shrink of fixed length fields seems to only require a scan).

What Changes are Metadata Only Changes?

That’s a lot of conditions! What’s alterations are left?

Not much:

  • Increasing the maximum length of a variable length column.
  • Changing the type from text to varchar(max).
  • Changing the type from ntext to nvarchar(max).
  • Any of the above while making a non-nullable field nullable.
  • Any of the above with a change in collation (with some big caveats, see Collation Hell Part 3 by Dan Guzman).

These changes are metadata only changes which means SQL Server doesn’t have to touch any of the actual data. So the size of the table will not impact the time it takes SQL Server to process the ALTER TABLE command. But see some notes about concurrency below.

Some Notes

Some notes about the above:

  • I ignored and make no claims about migrations where text or ntext is the target column type because the exceptions are strange and that scenario seems to fall under “Why would you want to do that?”
  • The above applies to only versions I’ve tested. Specifically 2008, and 2012.
  • The metadata-only changes I described above is not entirely on-line. There are still concurrency concerns to watch out for. These ALTER statements still request Schema modification (Sch-M) locks on the table, and once granted, only hold them briefly. But if you try to alter a column on a live environment and some long running query blocks your ALTER TABLE statement, then other queries that need access to the table will be blocked as well. 
  • Terms I used
    • fixed length: char(x), nchar(x)
    • variable length: varchar(x), nvarchar(x)
    • unlimited length: varchar(max), nvarchar(max), text, ntext
    • unicode: nchar(x), nvarchar(x), nvarchar(max), ntext
    • non-unicode: char(x), varchar(x), varchar(max), text

March 19, 2013

Checking Out Amazon Redshift

In order to refresh my memory about what I learned in University, I’ve been watching a course that UC Berkeley made available on Youtube. It’s been a good course so far. Recently I got to the topic of logical fallacies. And so I’m reminded that the following is not a valid argument: Jimmy advises X; Jimmy profits if you follow advice X; Therefore, Not X. It’s not valid in the academic sense. In the practical sense, I have to remember to not distrust all commercials and marketing videos.

But it’s hard, especially when Googling for “Big Data” or “Analytics”, I find it very difficult to find anything to read that’s both unbiased and useful. Maybe it’s because I’m still skeptical about any solution that is promoted by the people who stand to profit from following their advice (I’m trying not to discount their advice, I just squint at it a little).

So when Amazon came out with a publicly available Beta for their new Redshift Datawarehouse service (accompanied by a slick marketing video), I decided to kick the tires. Here’s some notes I made:

Amazon’s Redshift Commercial

Their commercial is here. And I have to say it reminded me of an infomercial. You know the kind where they try to sell fancy mop gadgets by showing how ridiculously clumsy people can be with regular mops. The Amazon commercial is light on technical details but I don’t think it was meant for an audience like me and you. I made a conscious effort not to hold that against them.

Warehouse in the cloud

Having a warehouse in the cloud makes a lot of sense in some ways. Pay-as-you-go pricing is what you want when you prefer operational costs over capital costs. Many businesses who don’t have I.T. as a core competency will find this a pretty attractive alternative to running servers or an appliance themselves. But it can get expensive quick, Amazon advertises less than $1000 / Terabyte / year. But with upfront costs for reserved pricing and a two terabyte minimum. The smallest rate you can get away with is $3000 per year for three years. In practice it will likely be significantly more. I can’t see anyone choosing Redshift without doing due diligence on the pricing, but it’s probably not going to be dirt cheap.

Connecting to Redshift
Star Trek's Jean Luc Picard says "Very well Data, open a channel"
Connections are made to Redshift only through ODBC (or JDBC) using Postgres drivers. Some challenges:

  • Picking my way through 32 bit vs 64 bit was tricky. Getting and using the right driver took some care.
  • Uploading through this connection is slow. So don’t try it this way. I mean it’s possible, but it’s simply not recommended. I learned this lesson not by reading through Amazon docs, but by attempting to run an SSIS job to funnel data into an ODBC destination. It was actually working, but it was impractical because it was so slow.

Creating the warehouse schema wasn’t too hard: I scripted a subset of tables from an existing warehouse. I stripped indexes, constraints and identities. There were a couple syntax differences (int becomes integer, bit becomes boolean, and GO becomes semicolon) but it went pretty smooth. For Redshift, in lieu of indexes, you choose columns on each table for the distribution key and the sort key. Underneath the covers, Redshift uses ParAccel and so if you’re familiar with that, you’ve got a great head start. Otherwise, Amazon’s Developer Guide is a good start. I’d like to hear more from others who know more about modeling for Redshift; it seems like a regular star schema will work well here.

Loading data is best through S3

I created a SSIS package that takes data from my relational warehouse. It takes that data and dumps it into delimited files (Hat tip to @peschkaj for advice on delimiter values). Then I gzipped the whole thing. I loaded those gzipped files into Amazon S3 and loaded data into Redshift using those files. Initially, I was afraid of Amazon S3 (What? I have to sign up for and learn about another service?) but working with Amazon S3 was dead simple.

Consuming data

I connected a Tableau client to Redshift using an ODBC connection. This Tableau discussion tells me that’s the current way to connect Tableau and Redshift. There are quite a few SQL limitations imposed by the ODBC protocol. So the Tableau experience was not too smooth. Tableau has a new Redshift connector coming out in a matter of weeks which should clear these limitations.
The point is that Amazon claims Redshift integrates seamlessly with a list of reporting apps (including Tableau). I believe it really will by the time Redshift is released, it’s just right now there’s a pretty big seam.

Next?

I’m going to get around to evaluating Google Big Query and of course Microsoft’s HDInsight. You might notice that lately, I find myself in a phase of learning (fun!) and so this post and the next few will be a description of my exploration.

February 28, 2013

Follow up on Ad Hoc TVP contention

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Last month I posted PAGELATCH Contention on 2:1:103. It described a troubleshooting experience I had. I was troubleshooting a performance problem that was felt when applications issued very frequent ad hoc queries that used Table Valued Parameters (TVPs).

I’ve recently learned some extra things from the SQL Server team at Microsoft that I wanted to pass on. So this post acts as a kind of update to that one.

I know. I've drawn Columbo before, but it's my blog.

Reproduction

One thing I did was to come up with an application that reproduces the contention on 2:1:103 in the same way as production. I share that here now.

What this reproduction does is launch fifty processes. Each process uses multiple threads to hit a db server with an ad hoc query that uses a TVP.  For the curious here it is: Download AdHocTVPContention.zip and then open readme.txt. This reproduction is a mashup of

  • A simple batch script which launches an executable 50 times at once. A technique I gleaned from Paul Randal in a demo of his.
  • A multi-threaded executable which sends queries asynchronously. A framework I first developed in a post I wrote on concurrency.
  • The ad hoc TVP query itself. It was a bit of a modification of my example at Table Valued Parameters, A Short Complete Example. The only difference is that the command is a query instead of a stored procedure.

What I’ve Learned Since Then

The Microsoft team helped me understand a few things. The main thing is that the particular version of SQL Server (including service pack level) matters a lot:

SQL Server 2005 and Earlier

TVPs weren’t supported then. (This post isn’t for you).

SQL Server 2008 and R2

In my earlier post, I dismissed a fix provided by Microsoft. Fix: Poor performance in SQL Server 2008 R2 when table-valued functions use many table variables. At the time, I dismissed it for a number of reasons:

  • The workaround listed on that kb article page did not help us.
  • The symptoms that were described at that issue did not match my own scenario (i.e. I wasn’t using table-valued functions).
  • The title mentioned R2 and I thought (mistakenly) that this fix was not applicable to SQL Server 2008. (Update April 5, 2013: The fix used to be available as a hotfix for 2008 SP3, but as of March 2013, this fix is now included in a cumulative update. Upgrade to SP3 CU10 if you’re using 2008 and want this fix. Microsoft has also updated the title of the fix to mention 2008)

Microsoft helped me learn that although the symptoms in that fix differ from mine, the improvement in that fix involves a change to code that also improves my scenario. So with the fix applied, on my test environment, I saw that SQL Server could handle at least 6 times the volume of ad hoc queries than it could before (your mileage may vary).

SQL Server 2012

But there’s one other thing I noticed. When I ran the reproduction on a SQL Server 2012 environment, the process just flew through the workload! Temp table creation was no longer a bottleneck; there was no more contention on temp table creation. The Microsoft Engineer I worked with noticed this too and eventually it prompted a CSS blog entry. You can read all about this 2012 performance improvement at Temp table caching improvement for table valued parameters in SQL Server 2012

Partner vs MVP

I’m very lucky to have many investigation options available to me when I’m digging into a particular quirk of SQL Server behaviour. There’s the public options like #sqlhelp or dba.stackexchange or Microsoft support. And for me there’s some private options like the MVP mailing list. But this month, I feel lucky to work as a Microsoft partner. It’s through that connection that I learned so much.

Too Long; Didn’t Read

If you’re experiencing 2:1:103 contention because of too frequent ad hoc TVP queries then:

February 8, 2013

Some Failed Blog Topics

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 5:08 pm

I list some blog topics that I don’t plan to complete.

I have a rule of thumb that I try to follow when writing. If I can write something in a simpler way I do. If I can cut out words, sentences or whole paragraphs without altering the meaning, I do. Sometimes the thing I’m left with is so short, simple and straightforward that I wouldn’t even bother tweeting it.

For example, a couple weeks ago, after a shortening revision, I wound up with “Index your tables properly while designing databases”. That’s not the title, that’s the whole thing. Something so short and simple doesn’t deserve a blog post and so neither did the long version of that post!

So I’m cleaning house. I want to post a bunch of ideas that were never good enough on their own to make the cut.

Knowledge Philanthropy

Those two words sound great together don’t they? I’ve always felt like more of a knowledge consumer. In fact this blog you’re reading is an effort on my part to “give back”. But no matter what, I’ve always felt like I’ve received way more than I’ve given.

Your Database is Not Perfect

But it doesn’t need to be. It needs to be stable, scalable, and easy to maintain. It needs to meet any requirements you care to define. Anything after that is going to be a harder sell.

Measuring the Effects of Fragmentation

An interesting experiment I started, but it led me past DBA-land into SAN Admin land and I was not comfortable saying anything with authority here.

Q & A for questions I made up:

I must have written the following while half asleep and I have absolutely no idea what it could mean:

ala Esquire: Fake… subtly bad advice “Jesus Bless You” Generic advice that gets suddenly personal: Why do delis always give pickles with sandwiches?” “Ordering Pheasant”

Quotestring — Safe way to use Data as Code?

No, I wouldn’t count on it.

Hey bud!

The following was inspired by comments at “Last Word On Scans” It was drawn for drawing’s sake.

Wanna buy a query plan operator?

Range Scans on Multicolumn Keys

Say you have a table with an index on (LastName, FirstName) and you want to select all the rows which sort between “Lucas, George” and “Spielberg, Stephen”. I’ve never been able to find straightforward syntax that results in a query plan with a simple range scan.

I’m a Paul White Fan

Not very many of my friends and family hold up their side of the conversation when I talk about databases. But at the PASS Summit, there are hundreds of people who will! And I think that’s why I love it. I always intended to write about my experience at 2012′s summit, but I couldn’t think of anything meaningful to write. I knew that Paul White’s session was a highlight for me, but I didn’t have much else to say.

paulwhitewanted

On Being Competent

Billy Joel described himself once like this:

“… I consider myself to be an inept pianist, a bad singer, and a merely competent songwriter. What I do, in my opinion, is by no means extraordinary. I am, as I’ve said, merely competent. But in an age of incompetence, that makes me extraordinary.”

That particular quote probably struck me at a time when I wasn’t feeling particularly perfect enough at what I do.

UNDELETE

Is here! Oh actually, april fools. It’s actually called Point in time restore, and it’s been available for a while. And the syntax is a little different…

January 22, 2013

PAGELATCH_EX Contention on 2:1:103

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm
Update February 28, 2013: Please don’t forget to look at my follow up post here Follow up on Ad Hoc TVP Contention for some important notes on some promising fixes/changes found in later SQL Server versions.

This blog post is meant to help people troubleshoot page latch contention on 2:1:103. If that’s what brought you to this page, then hopefully you find it useful. But first…

A textbook entitled "Don't Panic"

Initial Investigation

Last week I was asked to help tackle a production outage. Queries were slow enough that the system was considered unavailable. And just like any other performance problem I started by having SQL Server tell me what was wrong with itself.

  • I first started with sp_whoisactive but it took about 30 seconds for it to return. Several queries were blocked by others and some of the lead blockers were waiting for a PAGELATCH_EX on 2:1:103
  • Then I checked on the most common waits using a query found at Paul Randal’s article: Wait statistics, or please tell me where it hurts. (I don’t have that page bookmarked, every time I’m in trouble, I google “Tell me where it hurts Paul Randal”). I learned that PAGELATCH_EX contention was our most serious wait type. Paul mentions that he sees this kind of contention most commonly on tempdb.
  • I followed a link from Paul’s article to Robert Davis’s Breaking Down TempDB Contention. A script there helped me to discover that while there was lots of tempdb contention, it was all on page 2:1:103, which is not PFS, GAM or SGAM.

Finding More Info

So I looked for more help.

  • Web Search: My work colleague found and then pointed me to Latch waits on 2:1:103? You are probably creating too many temp tables in SQL Server by Microsoft’s Matt Wrock who faced the same problem. If you found this page because you’re facing the same issue then stop reading this article and go read Matt’s first. He explains what’s going on better than I could. Think of this article as a kind of sequel to his article.
  • StackOverflow: (serverfault in this case) has TempDB contention on sysmultiobjrefs SQL 2005. The best answer there is Matt pointing back to his blog article.
  • Microsoft Support who helped us out, but maybe not as quick as we’d like. To be fair, the turnaround time that we would have been happy with was measured in nanoseconds.

With that information it became clear to us that we were creating temp tables too often. And we were creating them in a way that made it impossible for SQL Server to cache. Did you know that? That SQL Server can cache temp tables? When a query is done with a temp table, SQL Server can truncate that table and give it to the next query to avoid having to create it again. Cached temp tables reduce tempdb contention including contention on this page. But as it turns out, SQL Server cannot cache temp tables from ad hoc queries.

But who creates temp tables that often? We did, just by using a table valued parameter in a parameterized query. Since SQL Server began supporting table valued parameters (introduced in SQL Server 2008), we have been gradually moving towards this practice in lieu of sending xml to be shredded.

… And More Info Including Some Other Links

I can’t help including some extra articles I found on 2:1:103 contention.

Robert Davis blogged about 2:1:103 contention in Tempdb Contention That Can’t Be Soothed. His advice is to remove statements in code like “SELECT … INTO”.  However, I believe that such statements contribute to PAGELATCH contention only if the statement is not part of a stored procedure (i.e. can’t be cached). I also think there are more common causes of 2:1:103 PAGELATCH contention than SELECT … INTO statements.

But Robert did link to a demo by Paul Randal (an absolutely amazing 5 minute demo linked from a 2011 issue of his newsletter). Paul tells us

  • That “the SQL Team knows about this. It’s a known issue. Hopefully something will be done about it in one of the future releases.”
  • But unlike SGAM or GAM contention, there’s absolutely nothing you can do about this to spread the contention around.

From a DBA point of view there’s not much that can be done. So as DB Developers, we have to find a workaround. But before I get to that, I want to mention some things that were not so useful.

… And Some Bad Info

There was some red herrings out there…

  • Add more tempdb files (nope, wrong kind of tempdb contention).
  • Many resources suggested enabling trace flag T1118. It’s a trace flag that eases tempdb contention, but not this kind.
  • Get faster tempdb disks? No, PAGELATCH contention is for in-memory copies of pages, not disk (that’s PAGEIOLATCH).
  • Tune the queries in question? Not a bad idea, but this problem is about the number of problem queries, not the performance of each.
  • A fix from Microsoft Support: Fix: Poor performance in SQL Server 2008 R2 when table-valued functions use many table variables. Oooh… so close, but we weren’t using table-valued functions. And the workarounds they list (disabling AUTO_UPDATE_STATISTICS) did not help. But maybe it might help you?
  • The Object:Created event. Whether traced with Profiler or collected with an Extended Events session, this event can report on created temp tables. Maybe I can use this event to tell me which queries are creating the tempdb tables. Nope, not this time! This event  has two drawbacks which make it useless for troubleshooting 2:1:103 contention:
    • The Object:Created event reports the creation of temp tables even when they’re cached (which don’t need a latch on 2:1:103).
    • The Object:Created event doesn’t report the declaration of any table variables (which may need a latch on 2:1:103).

Strategy: So What Do We Do?

Knowing is half the battle right? But that means we still have a lot of work to do. I’m going to recap what we know so far. We know that we have trouble when there are queries that:

  • are executed frequently
  • create temp tables (either explicitly or by declaring table valued parameters).
  • are not cached (Microsoft explains when temp tables are not cached. In my case, it was because they were ad-hoc queries)
  • require a page latch on 2:1:103.

There is a performance counter that can track the all of the above (except maybe for that last bullet). It’s called Temp Tables Creation Rate and it’s found in the perf counter category “General Statistics”. Now this is a metric you can trust. We found that a high temp table creation rate was tightly correlated to the trouble we were seeing. So when troubleshooting, look at this performance counter (and leave the “Object:Create” event alone).

So now what do we do? First, we must find the ad hoc queries that create these temp tables. Then, we have to put them in stored procedures so that the temp tables can be cached. Alternatively, we could reduce the need for creating them. It’s a workaround, but it’s what we’ve got.

Finding Such Queries Is Difficult

But here’s the hard part. In a high volume system, it’s difficult to identify exactly which queries are causing the most trouble. Microsoft support can go through tons of collected trace and performance data to try to find such queries, but it’s a long process. On our side, we looked at sys.dm_os_waiting_tasks:

Select *
From sys.dm_os_waiting_tasks

And we saw all the contention on 2:1:103, but when we tried to look up the SQL text for it

Select wt.*, st.text
From sys.dm_os_waiting_tasks wt
left join sys.dm_exec_requests er
    on er.session_id = wt.session_id
outer apply fn_get_sql(er.sql_handle) st
where wt.resource_description is not null

The text was often unavailable. Basically, I’m guessing maybe the dmv’s I was using weren’t quick enough to tell me which queries were suffering from (or causing) contention on 2:1:103. So I decided to look through the cache for query candidates that might create temp tables. here’s what I came up with. It’s not a comprehensive list and there might be false positives, but it might be enough to go on. If you know your applications well, you can tailor the filters below to something more relevant for you.

select cp.plan_handle, sql_handle, text, refcounts, usecounts
from sys.dm_exec_cached_plans cp
join sys.dm_exec_query_stats qs
    on cp.plan_handle = qs.plan_handle
cross apply sys.dm_exec_sql_text(qs.sql_handle) st
where (
       (
        st.text like N'%READONLY%'  -- possible table valued parameter (prepared queries)
        or st.text like N'%TABLE %' -- possible table creation
        or st.text like '%#%'       -- possible use of temp table
        or st.text like '%INTO%'    -- possible table creation
       )
       and objtype in ('prepared', 'ad hoc') -- temp tables not cachable with these
)
order by usecounts desc

Going Forward

Personally after helping identify and implement the workarounds. I’m doing a couple things:

  • I’m recommending that developers not create temp tables or declare table variables that cannot be cached. For now, this means we use stored procedures for any query that uses table variables or temp tables.
  • We now have thresholds on our performance tests which look at the performance counter Temp Tables Creation Rate.
  • I created a Microsoft Connect item. If you’re troubleshooting the same problem, head over there and let Microsoft know you’re having trouble too.
Older Posts »

Powered by WordPress