Michael J. Swart

December 22, 2016

That Time We Fixed Prod Without Admin Credentials

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:00 am

Merry Christmas readers! It’s story time. This is about a problem I encountered a few weeks ago. We were looking at a production site using sp_whoisactive and we noticed a lot of blocking on one particular procedure. I’m going to explain how we tackled it.

In this case, I think it’s interesting that we were able to mitigate the problem without requiring sysadmin access.

The Symptoms

Using sp_whoisactive and other tools, we noticed several symptoms.

  • SQLException timeout errors were reported by the app when calling one procedure in particular.
  • Many sessions were executing that procedure concurrently. Or at least they were attempting to.
  • There was excessive blocking and the lead blocker was running the same procedure.
  • The lead blocker had been running the longest (about 29 seconds)
  • The blocking was caused by processes waiting on Sch-M locks for a table used by that query

Here’s what was going on:
SQL Server was struggling to compile the procedure in time and the application wouldn’t let it catch its breath. The query optimizer was attempting to create statistics automatically that it needed for optimizing the query, but after thirty seconds, the application got impatient and cancelled the query.

So the compilation of the procedure was cancelled and this caused two things to happen. First, the creation of the statistics was cancelled. Second, the next session in line was allowed to run. But the problem was that the next session had already spent 28 seconds blocked by the first session and only had two seconds to try to compile a query before getting cancelled itself.

The frequent calls to the procedure meant that nobody had time to compile this query. And we were stuck in an endless cycle of sessions that wanted to compile a procedure, but could never get enough time to do it.

Why was SQL Server taking so long to compile anyway?
After a bunch of digging, we found out that a SQL Server bug was biting us. This bug involved

  • SQL Server 2014
  • Trace flag 2389 and 2390
  • Filtered Indexes on very large base tables

Kind of a perfect storm of factors that exposed a SQL Server quirk that caused long compilation times, timeouts and pain.

What We Did About It

Well, in this case, I think that the traceflag 2389, 2390 kind of outlived its usefulness (the cure is worse than the disease and all that). So the main fix for this problem is to get rid of those traceflags. But it would be some time before we could get that rolled out.

So for the short term, we worked at getting that procedure compiled and into SQL Server’s cache.

We called the procedure ourselves in Management Studio. Our call waited about thirty seconds before it got its turn to run. Then it spent a little while to compile and run the procedure. Presto! The plan is in the cache now! And everything’s all better right? Nope. Not quite. The timeouts continued.

If you’ve read Erland Sommarskog’s Slow in the Application, Fast in SSMS you may have guessed what’s going on. When we executed the procedure in SSMS, it was using different settings. So the query plan we compiled couldn’t be reused by the application. Remember, all settings (including ARITHABORT) need to match before cached plans can be reused by different sessions. We turned ARITHABORT off in SSMS and called the procedure again.

After a minute, the query completed and all blocking immediately stopped. Whew! The patient was stable.

The whole experience was a pain. And an outage is an outage. Though the count of the snags for the year had increased …

the Grinch had warded off downtime at least

December 1, 2016

Learn About SQL Server in the Waterloo Region

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 9:34 am

Speaking

photo:Hope Foley


I want to promote a couple free talks coming up here in Kitchener Ontario.

Drawing Conclusions, December 12, 2016

At D2L, we created a new free monthly meetup called D2L Night School that helps us give back to the local tech community. This is how we describe it on the site:

Since 1999, D2L has been growing in Kitchener while transforming the way the world learns. We don’t know it all, but along the way we’ve learned some things about technology and management that we want to share with the wider community. We’re excited to launch D2L Night School to share what we’ve learned, network with other innovators and collectively build a stronger tech community in KW.

On December 12, I’ll be giving my Drawing Conclusions talk, a fun and sometimes irreverent look at the SQL Server and Data world through illustrations.

Register here. Then come drink beer, socialize and talk tech. It should be fun.

High Concurrency in SQL Server, January 25, 2017

Earlier this year, the local .Net User Group CTTNUG asked members what topics they’d like to learn more about. It turns out that SQL Server was high on that list and so Ken Cenerelli reached out to me. And so I’ll be talking about SQL Server concurrency, a topic I care a lot about.

Concurrency is a critical part to scalability. Come by and learn how to tackle common enemies of concurrency. Get this right and you’ll have one less problem to worry about. And one less problem lets you get back to more important things.

Register at the meetup site. I’ll see you there in January.

Powered by WordPress