October, 2011 | Michael J. Swart

October 26, 2011

Where Are Your Popular Joins?

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", cached queries, joins, query plans — Michael J. Swart @ 8:00 pm

So you know a lot about your databases right? You’re familiar with their schemas and tables and the queries that run on them. Personally I use sys.dm_exec_query_stats to understand what the most popular queries are.

But I recently started wondering about popular table joins.

I was wondering: “What tables in my database are most commonly joined together?” I already have a pretty good idea based on the data model. But I wanted to find out if the popular queries are in sync with my understanding. Unfortunately there’s no system view called sys.dm_exec_join_stats. The whole reason that I was curious is that I wanted to find a set of common table joins whose queries might be improved with a indexed view.

So I wrote something that gives me a bit of an idea. It’s a query that looks at cached query plans and counts nested loop joins (multiplied by execution count).

USE tempdb;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION
 
;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select 
	cp.usecounts as numberOfJoins,
	seeknodes.query('.') as plansnippet
into #my_joins
from sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_query_plan(cp.plan_handle)
	as qp
cross apply query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//SeekKeys/Prefix[@ScanType="EQ"]') 
	as seeks(seeknodes)
where seeknodes.exist('./RangeColumns/ColumnReference[1]/@Database') = 1
	and seeknodes.exist('./RangeExpressions/ScalarOperator/Identifier/ColumnReference[1]/@Database') = 1;
 
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p1)
select sum(numberOfJoins) as [Number Of Joins],
	myValues.lookupTable + '(' + myValues.lookupColumn + ')' as lookupColumn,
	myValues.expressionTable + '(' + myValues.expressionColumn + ')' as expressionColumn
from #my_joins
cross apply plansnippet.nodes('./p1:Prefix/p1:RangeColumns/p1:ColumnReference[1]')
	as rangeColumns(rangeColumnNodes)
cross apply plansnippet.nodes('./p1:Prefix/p1:RangeExpressions/p1:ScalarOperator/p1:Identifier/p1:ColumnReference[1]')
	as rangeExpressions(rangeExpressionNodes)
cross apply (
	select
		rangeColumnNodes.value('@Database', 'sysname') as lookupDatabase, 
		rangeColumnNodes.value('@Schema', 'sysname') as lookupSchema,
		rangeColumnNodes.value('@Table', 'sysname') as lookupTable,
		rangeColumnNodes.value('@Column', 'sysname') as lookupColumn,
		rangeExpressionNodes.value('@Database', 'sysname') as expressionDatabase, 
		rangeExpressionNodes.value('@Schema', 'sysname') as expressionSchema,
		rangeExpressionNodes.value('@Table', 'sysname') as expressionTable,
		rangeExpressionNodes.value('@Column', 'sysname') as expressionColumn	
	) as myValues
where myValues.expressionTable != myValues.lookupTable
group by myValues.lookupTable, myValues.lookupColumn, myValues.expressionTable, myValues.expressionColumn
order by SUM(numberOfJoins) desc;
 
rollback;

USE tempdb; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; BEGIN TRANSACTION ;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') select cp.usecounts as numberOfJoins, seeknodes.query('.') as plansnippet into #my_joins from sys.dm_exec_cached_plans cp cross apply sys.dm_exec_query_plan(cp.plan_handle) as qp cross apply query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//SeekKeys/Prefix[@ScanType="EQ"]') as seeks(seeknodes) where seeknodes.exist('./RangeColumns/ColumnReference[1]/@Database') = 1 and seeknodes.exist('./RangeExpressions/ScalarOperator/Identifier/ColumnReference[1]/@Database') = 1; WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p1) select sum(numberOfJoins) as [Number Of Joins], myValues.lookupTable + '(' + myValues.lookupColumn + ')' as lookupColumn, myValues.expressionTable + '(' + myValues.expressionColumn + ')' as expressionColumn from #my_joins cross apply plansnippet.nodes('./p1:Prefix/p1:RangeColumns/p1:ColumnReference[1]') as rangeColumns(rangeColumnNodes) cross apply plansnippet.nodes('./p1:Prefix/p1:RangeExpressions/p1:ScalarOperator/p1:Identifier/p1:ColumnReference[1]') as rangeExpressions(rangeExpressionNodes) cross apply ( select rangeColumnNodes.value('@Database', 'sysname') as lookupDatabase, rangeColumnNodes.value('@Schema', 'sysname') as lookupSchema, rangeColumnNodes.value('@Table', 'sysname') as lookupTable, rangeColumnNodes.value('@Column', 'sysname') as lookupColumn, rangeExpressionNodes.value('@Database', 'sysname') as expressionDatabase, rangeExpressionNodes.value('@Schema', 'sysname') as expressionSchema, rangeExpressionNodes.value('@Table', 'sysname') as expressionTable, rangeExpressionNodes.value('@Column', 'sysname') as expressionColumn ) as myValues where myValues.expressionTable != myValues.lookupTable group by myValues.lookupTable, myValues.lookupColumn, myValues.expressionTable, myValues.expressionColumn order by SUM(numberOfJoins) desc; rollback;

Some caveats:

Parsing xml takes a lot of time and a lot of CPU (the subtree cost is huge and execution time is measured in seconds or minutes)
It’s only useful on a system that is used a lot (as opposed to a dev database).
It only reports statistics about queries that are found in cached plans. So the stats are only relevant since the last server restart
It only counts loop joins (not hash or merge joins)
If you want, you can adjust the query to include schema and database names in the results

I hope you find it useful. This query gives hints for further investigation into potential indexed views. It worked well for me and so I thought it was worth sharing. I ran this query against a server I know well and I was surprised at some of the results. Good luck.

-- Comments (4)

October 19, 2011

Secret Santa as a Puzzle

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: combinatorics, puzzle, secret santa, sql — Michael J. Swart @ 12:00 pm

So every year, the adults in my family pick names to do a gift exchange. I guess lots of people do this and they call it Secret Santa.

In my family there are five couples that participate and we draw names from a hat. Inevitably, someone will draw their own name or the name of their spouse and everyone puts the names back in the hat. It usually takes us quite a few tries before we get it right.

This past thanksgiving (Canadian) we went through this again. It’s fun and we always give the person who picked their own name a hard time. But I got to thinking about this as a puzzle:

“How can we exchange names fairly and secretly without the do-overs?”

Well, let’s look at this a few ways:

As a mathie.
As a computer techie.
From a practical point of view.

As a Mathie

I graduated from University of Waterloo with a BMath with a major in CS and C&O*. But since graduating, I’ve focused only on CS (Computer Science) things. The mathie in me wonders: “What are the odds that we pass the hat around successfully?”

I thought long and hard about it but couldn’t crack that question. My textbooks were no help and there were no professors on hand to answer, so I asked http://math.stackexchange.com for help. Hoping that I’d have just as much success with that site as I do with stackoverflow.com. I was not disappointed. Here’s the question I asked:

Five couples draw names from a hat. If a person draws their own name, or the name of their spouse, all the names go back in a hat and names are re-drawn. Using a computer, I know that the probability of this happening is 1 – (440192 / 10!) or about 88%. What’s a general expression for n couples?

And a stackexchange user joriki gave a brilliant answer:

By assigning a letter to each couple, this can be reduced to the problem of finding the number a_n of anagrams of a word with n different letters, each occurring twice, with no letters fixed. The desired number of permutations is then 2ⁿa_n, since each of the n couples can be assigned in two ways to the two instances of its letter. Wikipedia mentions this problem as a generalized derangement problem. The general formula given for a word with numbers n₁, … ,n_r of r different letters is

where P_k is the k-th Laguerre polynomial. In the present case, r=n and n_i=2, so we only need the second Laguerre polynomial, which is P₂(x)=(1/2)(x²-4x+2). The n factors of (1/2) cancel with the n factors of 2, so the probability of success is

where (2n)! counts the total number of permutations. For n=5, Wolfram|Alpha gives 440192/(10!), as you calculated.

I think it’s a beautiful answer and for the first time in a long time, I missed doing Math. So this math problem and the probabilities are now well understood. But it doesn’t save any time picking names at Thanksgiving does it? Let’s look at it from another point of view.

As a Computer Techie

Using C#
Okay, I understand this subject much better. A C# program is easy to write. In fact, I wrote a quick one and it looks something like this:

static void Main()
{
	List<string> names = new List<string>() {"Mike", "Leanne", "Dave S",
		"Cindy", "Marianne", "Dave B", "Linda", "Matt",
		"Lisa", "Dan" };
 
	List<int> picks = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
 
	while (!IsValid(picks))
	{
		Shuffle(ref picks);
	}
 
	PrintResults(picks, names);
}

You can look at the full program here. It works, but it doesn’t feel like the right solution to the problem: The results aren’t secret for one thing. We’d need a trusted third party to distribute the results. Also notice the while loop, we don’t know for certain if/when the while loop will break, so this program seems a little sloppy.

Using SQL
So now here’s a subject I’m really really comfortable with. I used an approach that my friend Paul Santos explained to me when we talked about this problem. His approach is this:

Drawing from a hat is a permutation of 10 names. So generate all the permutations
Filter out the invalid permutations
Pick a random permutation from the list of valid ones and report that

So here’s me showing all the permutations:

with OneToTen AS
(
	SELECT n
	FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) as N(n)
)
SELECT A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n, J.n
FROM OneToTen A
JOIN OneToTen B
	ON B.n NOT IN (A.n)
JOIN OneToTen C
	ON C.n NOT IN (A.n, B.n)
JOIN OneToTen D
	ON D.n NOT IN (A.n, B.n, C.n)
JOIN OneToTen E
	ON E.n NOT IN (A.n, B.n, C.n, D.n)
JOIN OneToTen F
	ON F.n NOT IN (A.n, B.n, C.n, D.n, E.n)
JOIN OneToTen G
	ON G.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n)
JOIN OneToTen H
	ON H.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n)
JOIN OneToTen I
	ON I.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n)
JOIN OneToTen J
	ON J.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n)
OPTION (FORCE ORDER, MAXDOP 1)

To filter out the invalid permutations, I add a where clause. To report a random permutation, I order the results by newid() and select the top 1 row:

with OneToTen AS
(
	SELECT n
	FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) as N(n)
)
SELECT TOP (1) A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n, J.n
FROM OneToTen A
JOIN OneToTen B
	ON B.n NOT IN (A.n)
JOIN OneToTen C
	ON C.n NOT IN (A.n, B.n)
JOIN OneToTen D
	ON D.n NOT IN (A.n, B.n, C.n)
JOIN OneToTen E
	ON E.n NOT IN (A.n, B.n, C.n, D.n)
JOIN OneToTen F
	ON F.n NOT IN (A.n, B.n, C.n, D.n, E.n)
JOIN OneToTen G
	ON G.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n)
JOIN OneToTen H
	ON H.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n)
JOIN OneToTen I
	ON I.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n)
JOIN OneToTen J
	ON J.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n)
WHERE A.n NOT IN (1, 2)
	AND B.n NOT IN (1, 2)
	AND C.n NOT IN (3, 4)
	AND D.n NOT IN (3, 4)
	AND E.n NOT IN (5, 6)
	AND F.n NOT IN (5, 6)
	AND G.n NOT IN (7, 8)
	AND H.n NOT IN (7, 8)
	AND I.n NOT IN (9, 10)
	AND J.n NOT IN (9, 10)
ORDER BY NEWID()
OPTION (FORCE ORDER, MAXDOP 1)

So I’m really happy with this. But only as a solution to a puzzle. The only thing it has going for it is that it’s guaranteed to halt. It doesn’t take less cpu than the C# program and we’re further away from telling relatives who they’re meant to buy presents for!

A Practical Point Of View

In real life, using authentication schemes and trusted third parties is a great way to bring any holiday party to a halt Mr. Buzz Killington (No more stuffing for you). So what kind of practical things can you do to make things easier or avoid do-overs? Here’s a couple ideas:

There are many many websites that are built just for this problem: SecretSanta.com, Elfster, and DrawNames.com. (I’m not making those up!) I think there’s even a facebook or iPhone app for that. I haven’t looked too closely at these so I can’t vouch for them.
You could write people’s names on an old deck of cards. Shuffle those cards and deal them out. You don’t avoid any problems this way – I mean, you won’t avoid getting your own name – but shuffling and dealing is quicker than passing around a hat.
If you pick your own name, replace it and pick again. The exchange won’t be 100% secret, but most people don’t care. This is what most Secret Santas are like.
Status quo. Do nothing. This is probably what my family will stick with. We’ll continue to pass around the hat and give people a hard time for picking their own names.

Have you got any other ideas? If you do a gift exchange with your family, what does your family do?

*~ CS and C&O are short for “Computer Science and Combinatorics and Optimization”, my first year at University was spent learning to pronounce that correctly.

-- Comments (8)

October 13, 2011

Power View Demo Good, Not Great

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: #sqlpass, crescent, gapminder, hans rosling, power view, statistics — Michael J. Swart @ 9:02 am

So I caught the keynote speech online on the first day of the PASS Summit conference (2011). There were a couple announcements but no big ones. I remember this time last year, many were expecting a release this week. But the big release isn’t until next year. We can now start calling SQL Server codename Denali by its real name, SQL Server 2012. No surprises there. We also learned that the project codename Crescent will be called Power View.

the Power View Demo

You can actually watch the whole keynote here. There’s a demo of Power View near the end. This demo was presented by (new) Microsoft Technical Fellow Amir Netz. He did a good job of presenting what Power View can do. Power View is – among other things – a data visualization tool. Amir gave an updated version of last year’s demo where he explored movie and box-office data.
In Amir’s demo, we learned that:

Eddie Murphy voices Donkey in Shrek
Alan Rickman plays Professor Snape
John Wayne acted a lot.

Basically movie trivia. Amir demonstrated that it’s easy to dig into the data to any depth we want. After the talk, Ted Kummert thanked Amir and called the demo fun. It was a good demo, but it wasn’t a great demo. A lot of it was interesting, Amir and his team are clearly passionate about the product, but the best reactions on twitter could be summed up as “neat!” rather than “wow!”

Compare that to Hans Rosling

Hans Rosling gave a TED talk in 2006 called “Debunking third-world myths with the best stats you’ve ever seen” Several people on Twitter compared this TED talk to today’s keynote.

The data technology that Hans uses is something he developed called Gapminder. On the surface, it’s the same as Power View. As Hans speaks, we learn:

By most metrics, countries we think of as third-world are equivalent to 1960s U.S.A.
The term “developing country” carries a meaning with most people that’s not useful today.
There’s tremendous variation in regions (like sub-Saharan Africa, Arab states, etc…). A distinction that we rarely make.

Now that’s a great talk. Hans is a statistician first (not a B.I. expert). He demonstrates the power of data to illuminate topics and solve problems.

The Difference

So what takes Hans Rosling from good to great? What’s the difference between today’s keynote demo and the TED talk?

Both presenters are passionate and enthusiastic. There’s no faulting them there.
Only during the TED talk did I imagine myself solving problems. (Like using data to improve software quality… Lean-manufacturing style!)
There’s a difference in the importance of the data used in the examples…

A Parallel in the Teaching of Poetry

I’m reminded of something from the world of poetry actually. A poet, James Fenton, wrote a textbook called: An Introduction To English Poetry. In it, he gives advice to poetry teachers about the best examples of poetry to give poets-in-training. For illustration, Fenton talks about a particular form of poetry, the villanelle. What’s a villanelle? It’s a 19 line-poem with a very particular form including many repetitive lines. Here are two examples (from the public domain):

Do Not Go Gentle into that Dark Night
by Dylan Thomas

Do not go gentle into that good night,
Old age should burn and rage at close of day;
Rage, rage against the dying of the light.

Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.

Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.

Grave men, near death, who see with blinding sight
Blind eyes could blaze like meteors and be gay,
Rage, rage against the dying of the light.

And you, my father, there on the sad height,
Curse, bless me now with your fierce tears, I pray.
Do not go gentle into that good night.
Rage, rage against the dying of the light.

A Dainty Thing’s the Villanelle
by William Ernest Henley

A Dainty thing’s the Villanelle,
Sly, musical, a jewel in rhyme,
It serves its purpose passing well.

A double-clappered silver bell
That must be made to clink in chime,
A dainty thing’s the Villanelle;

And if you wish to flute a spell,
Or ask a meeting ‘neath the lime,
It serves its purpose passing well.

You must not ask of it the swell
Of organs grandiose and sublime–
A dainty thing’s the Villanelle;

And, filled with sweetness, as a shell
Is filled with sound, and launched in time,
It serves its purpose passing well.

Still fair to see and good to smell
As in the quaintness of its prime,
A dainty thing’s the Villanelle,
It serves its purpose passing well.

You don’t have to be an English major to tell which one of those poems is inspiring, and which one isn’t. Fenton argues that “…if you start from Thomas’s villanelle as a model, you will be setting your sights much higher than if you start from Henley.” And he’s quite right, people take cues from examples like these.

Thinking About Examples

Back to B.I. demos. Not every example in every talk in every conference has to change the world. But if you’re introducing a powerful new tool like Power View in a keynote, maybe “nifty” or “neat” isn’t the impact you’re aiming for. I wonder if there’s a missed opportunity here for Microsoft to inspire. I wonder what Amir Netz’s demo with Hans Rosling’s data would have looked like? In that hypothetical scenario, I bet Power View users who saw that demo would then be more likely to start projects that make people say “Wow!” instead of simply “Neat!”

-- Comments (9)

October 11, 2011

Missing the PASS Summit

Filed under: SQLServerPedia Syndication,Tongue In Cheek — Michael J. Swart @ 8:30 am

Well I hope this won’t turn into an annual thing! Last year, I missed the PASS summit:
Why Michael J. Swart won’t be at the PASS Summit.

And this year…

So if you’re in Seattle this week: have fun, spread some knowledge, meet some smart people.

If you’re not in Seattle like me.

Follow Jen McCown’s blog post here: “Attend the PASS Summit from Home”. I’ll be doing a lot of this. I’ll be watching the keynotes as they’re broadcast, but without SQL Server Denali being released, I’m not sure what’s in store.
Leave me a comment here or on Twitter @MJSwart. There’s no reason not to get a little networking in 🙂

I hope to see you next year in Seattle.

-- Comments (3)