So you know a lot about your databases right? You’re familiar with their schemas and tables and the queries that run on them. Personally I use sys.dm_exec_query_stats to understand what the most popular queries are.
But I recently started wondering about popular table joins.
I was wondering: “What tables in my database are most commonly joined together?” I already have a pretty good idea based on the data model. But I wanted to find out if the popular queries are in sync with my understanding. Unfortunately there’s no system view called sys.dm_exec_join_stats. The whole reason that I was curious is that I wanted to find a set of common table joins whose queries might be improved with a indexed view.
So I wrote something that gives me a bit of an idea. It’s a query that looks at cached query plans and counts nested loop joins (multiplied by execution count).
USE tempdb; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; BEGIN TRANSACTION ;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') select cp.usecounts as numberOfJoins, seeknodes.query('.') as plansnippet into #my_joins from sys.dm_exec_cached_plans cp cross apply sys.dm_exec_query_plan(cp.plan_handle) as qp cross apply query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//SeekKeys/Prefix[@ScanType="EQ"]') as seeks(seeknodes) where seeknodes.exist('./RangeColumns/ColumnReference[1]/@Database') = 1 and seeknodes.exist('./RangeExpressions/ScalarOperator/Identifier/ColumnReference[1]/@Database') = 1; WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p1) select sum(numberOfJoins) as [Number Of Joins], myValues.lookupTable + '(' + myValues.lookupColumn + ')' as lookupColumn, myValues.expressionTable + '(' + myValues.expressionColumn + ')' as expressionColumn from #my_joins cross apply plansnippet.nodes('./p1:Prefix/p1:RangeColumns/p1:ColumnReference[1]') as rangeColumns(rangeColumnNodes) cross apply plansnippet.nodes('./p1:Prefix/p1:RangeExpressions/p1:ScalarOperator/p1:Identifier/p1:ColumnReference[1]') as rangeExpressions(rangeExpressionNodes) cross apply ( select rangeColumnNodes.value('@Database', 'sysname') as lookupDatabase, rangeColumnNodes.value('@Schema', 'sysname') as lookupSchema, rangeColumnNodes.value('@Table', 'sysname') as lookupTable, rangeColumnNodes.value('@Column', 'sysname') as lookupColumn, rangeExpressionNodes.value('@Database', 'sysname') as expressionDatabase, rangeExpressionNodes.value('@Schema', 'sysname') as expressionSchema, rangeExpressionNodes.value('@Table', 'sysname') as expressionTable, rangeExpressionNodes.value('@Column', 'sysname') as expressionColumn ) as myValues where myValues.expressionTable != myValues.lookupTable group by myValues.lookupTable, myValues.lookupColumn, myValues.expressionTable, myValues.expressionColumn order by SUM(numberOfJoins) desc; rollback; |
Some caveats:
- Parsing xml takes a lot of time and a lot of CPU (the subtree cost is huge and execution time is measured in seconds or minutes)
- It’s only useful on a system that is used a lot (as opposed to a dev database).
- It only reports statistics about queries that are found in cached plans. So the stats are only relevant since the last server restart
- It only counts loop joins (not hash or merge joins)
- If you want, you can adjust the query to include schema and database names in the results
I hope you find it useful. This query gives hints for further investigation into potential indexed views. It worked well for me and so I thought it was worth sharing. I ran this query against a server I know well and I was surprised at some of the results. Good luck.