Michael J. Swart

December 22, 2011

A simple message: Merry Christmas

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Tongue In Cheek — Michael J. Swart @ 10:00 am

From my family to yours, Merry Christmas.

And just to throw in some SQL, here is a query that will return the complete lyrics to Feliz Navidad, (the most repetitive Christmas song in the world):

select [processing-instruction(complete_lyrics)] = REPLICATE(REPLICATE(REPLICATE('
Feliz Navidad, ', 3) + 'próspero año y felicidad', 2) + REPLICATE(REPLICATE('
I wanna wish you a Merry Christmas, ', 3) + 'from the bottom of my heart', 2),3)
for xml path(''),type

December 20, 2011

Write Better

Filed under: Miscelleaneous SQL — Tags: — Michael J. Swart @ 11:20 pm

I have a confession to make. I suck at writing. In high school, I was never at the top of my English class and my University accepted my application despite my English marks. But even though I majored in Math and Computer Science, my essay-writing days weren’t over. The University I went to required that all students “demonstrate a proficiency in English” before graduating. To demonstrate that, we were required to write an exam, a single essay. I thought I could get by because English is my first language. So I was surprised when I found out that I failed that exam!  Ugh…

Then I made a choice which in hindsight turned out to be one of the best things I could have done. In my second year, I signed up for an English course as an elective. It was an introduction to essay writing. I worked hard and did well. In that course I learned a few things I should have learned many years earlier. Those things can be boiled down into:

  • Have something to write
  • Write it with the reader in mind
  • Don’t write anything else

Have something to write

Or in other words have a point. I’m going to repeat that because it’s a lesson I find myself relearning often: Have a point. I need to have something to write more than I need to write something (if that makes sense).

Corollary for bloggers: Don’t feel guilty about writer’s block.

Write it with the reader in mind

If I’m writing a blog article, I try to ask myself “who’s the reader?” Some common readers include these people:

  • A keen SQL professional googling for a solution. I love writing posts for this person. It usually starts with myself googling for a problem and not finding anything (or being disappointed with what I do find). I like to think that I’m helping people in the same situation I was in. (Examples: Searching Inside Strings: CPU is Eight Times Worse For Unicode Strings, Eliminated Null Values)
  • Myself: I used to write a lot of articles for myself. They were quick scripts that I could quickly get access to as long as I had internet access. I still use them even today (Examples: Indexing Foreign KeysDisowning Your Relatives)
  • Potential employers, clients or trainees: A perfectly valid set of readers, but writing for them is tricky. You’re bragging (which is okay) but you don’t want to appear like you’re bragging (which is not okay) so keep it subtle. Keep the audience in mind. It’s better if the message is “I love this stuff” or “I can help you” rather than “Look how smart I am.”
  • RSS Readers and Link Followers: Yep, that’s you! (both of you). You enjoy keeping up with SQL Server industry news by following various SQL Server blogs including this one. Something piqued your interest about the title and you started reading (btw, thanks for reading this far!).

Don’t write anything else

This is Mark Twain’s “Employ a simple and straightforward style.” It’s also George Orwell’s “If it is possible to cut a word out, always cut it out.”

Writing clearly goes back to having a point. If a sentence, word or paragraph does not help your point then it probably doesn’t belong. When you omit the unimportant stuff, what’s left is packed with meaning.

One trick I use is to do a brain dump. I quickly type an outline of what I want to write so that I don’t forget anything. Often this simple outline gets included into the post verbatim.

 

December 8, 2011

CXPACKET? What’s That? And What’s Next?

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 8:54 pm

Takeaway: Is the wait type CXPACKET bothering you? It means parallelism. Here’s how to find the queries that are giving you this grief.

The Wait Type CXPACKET

Since 2005, SQL Server provides dynamic management views (DMVs). These views help monitor database health and they’re the first place I go to when troubleshoot problems. For example the view sys.dm_os_wait_stats gives an aggregated way to tell you where the waits are. Often you’ll come across the wait type CXPACKET. Microsoft explains:

“[This wait type] Occurs when trying to synchronize the query processor exchange iterator.”

Hmmm… I know what those words mean individually… Any way, if you keep searching, you’ll see others explain: “CXPACKET means parallelism”. We’re getting a bit closer now. If a query is run with multiple threads, and one thread completes before the others, then SQL Server will report the wait type CXPACKET for that thread.

It’s like several chefs are making soup. If one chef’s only job is to add the cilantro, there’s going to be some waiting involved. So is the extra help worth the trouble?

SQL Server thought it could use more cooks for a particular “recipe”… Maybe it’s time to make simpler recipes. (The answer is almost always yes, especially for OLTP systems). SQL Server doesn’t use parallelism often. It only comes up with a multi-threaded query plan when it needs to, i.e. when the plan is estimated to be costly enough. It’s usually those queries with the high estimated cost that need to be fixed, the CXPACKET wait type is a symptom of that.

Those High Cost Queries

So which queries are those? Again, I like to go to the DMVs. The one I like to look at is sys.dm_exec_query_stats except that it doesn’t have a flag called uses_parallelism. For the definitive way to find parallel queries, you have to scan and parse all the query plans that are cached using sys.dm_exec_query_plan.

But that takes for ever so I don’t recommend it. But recently I found out that we don’t really need it: Parallel queries are interesting, they’re one of the only kind of queries that can use more CPU time than actual elapsed time. So if that’s true, then we just need to examine these queries:

select * from sys.dm_exec_query_stats 
where total_elapsed_time < total_worker_time
	-- total_worker_time is measured to the microsecond, but is accurate to the millisecond so:
	and ( ( total_worker_time – total_elapsed_time ) / execution_count ) * 1.0 > 1000 -- avg difference is at least 1 ms

But it might not be true, for example, a parallel query might be suffer from blocking too long. Long enough that the elapsed time might be longer than the total amount of CPU time. That’s an interesting question: How good an indication of parallelism is the ratio between elapsed time and CPU time?

(total_elapsed_time < total_cpu_time) Vs. uses_parallelism

I’m lucky. I’ve got a large set of query data that can help me.

This chart needs a little explanation. Pick a dot in the chart above. It represents a set of queries whose ratio of cpu to elapsed time are the same. Look at the value on the x-axis to see what that common ratio is. Say that it’s 1.5, this means that that dot represents a set of queries whose elapsed time is exactly 50% greater than its cpu time. Now look at its value on the Y axis. You might see 2%. That’s means that 2% of those queries use parallelism.

I expected a slope, not a waterfall and this is what that means to me. Knowing that a query’s worker time is greater than its elapsed time is a great indication of parallelism. And knowing by how much doesn’t change that.

CXPACKET? Here’s What’s Next

This query lists the top 20 worst parallel queries cached in your database.

-- TOP 20 parallel queries (by CPU)
SELECT TOP (20)
    [Total CPU] = total_worker_time,
	[Total Elapsed Time] = total_elapsed_time,
	[Execution Count] = execution_count,
    [Average CPU in microseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
    [DB Name] = DB_NAME(ST.dbid),
    [Object Name] = OBJECT_NAME(ST.objectid, ST.dbid),
    [Query Text] = (SELECT [processing-instruction(q)] = CASE 
            WHEN [sql_handle] IS NULL THEN ' '
            ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                (CASE 
                        WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                        ELSE QS.statement_end_offset
                        END - QS.statement_start_offset) / 2))
            END
			FOR XML PATH(''), type),
    [Query Plan] = qp.query_plan
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
WHERE total_elapsed_time < total_worker_time
	AND ( ( total_worker_time - total_elapsed_time ) / execution_count ) * 1.0 > 1000 -- average difference is more than a millisecond
ORDER BY total_worker_time DESC

Some interesting things about this query:

  • I’ve written it for SQL Server Management Studio. In SSMS, you can click on the query and the query plan to explore these queries further.
  • I’ve added a filter to weed out queries whose average cpu time is more than a millisecond. I’m sure you won’t miss those stats. Microsoft reports CPU time in microseconds, but it’s only accurate to milliseconds. So you can get false-positives if you don’t ignore the blazing fast queries. That’s why I added that filter.

Powered by WordPress