Michael J. Swart

April 3, 2019

Finding Tables with Few Dependencies

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:00 am

A couple weeks ago, I wrote about how to find lonely tables in Sql Server. This is a follow up to that post. I’m now going to talk about small sets of tables that are joined to eachother, but no-one else.

It’s Not Just Me
It seems everyone’s talking about this.

So as I was writing this post and code I noticed an amazing coincidence. I saw the same ideas I was writing about being discussed on twitter by Kelly Sommers, Ben Johnson and others.

They discuss Uber’s microservice graph. When visualized, it’s a big mish-mash of dependencies. Kelly points out how hard it is to reason about and Ben points to a small decoupled piece of the system that he wants to work on.

Me too Ben! And I think that’s the value of that visualization. It can demonstrate to others how tangled your system is. It can also identify small components that are not connected to the main mess. When I tie it to my last post and consider this idea in the database world, I can expand my idea of lonely tables to small sets of tables that are never joined to other tables.

I want to find them because these tables are also good candidates for extraction but how do I find them? I start by visualizing tables and their joins.

Visualizing Table Joins

I started by looking for existing visualizations. I didn’t find exactly what I wanted so I coded my own visualization (with the help of the d3 library). It’s always fun to code your own physics engine.

Here’s what I found

A monolith with some smaller isolated satellites

An example that might be good to extract

That ball of mush in the middle is hard to look at, but the smaller disconnected bits aren’t! Just like Ben, I want to work on those smaller pieces too! And just like the lonely tables we looked at last week, these small isolated components are also good candidates for extracting from SQL Server.

Try It Yourself

I’ve made this visualization available here:


There’s a query at the end of this post. When you run it, you’ll get pairs of table names and when you paste it into the Show Graph page, you’ll see a visualization of your database.

(This is all client-side code, I don’t collect any data).

The Query

use [your database name goes here];
    cast(null as xml) as query_plan
into #myplans
from sys.dm_exec_query_stats qs
cross apply sys.dm_exec_plan_attributes(qs.plan_handle) pa
where pa.attribute = 'dbid'
and pa.value = db_id();
with duplicate_queries as
    select ROW_NUMBER() over (partition by query_hash order by (select 1)) r
    from #myplans
delete duplicate_queries
where r > 1;
update #myplans
set query_plan = qp.query_plan
from #myplans mp
cross apply sys.dm_exec_query_plan(mp.plan_handle) qp
;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
mycte as
    select q.query_hash,
           obj.value('(@Schema)[1]', 'sysname') AS schema_name,
           obj.value('(@Table)[1]', 'sysname') AS table_name
    from #myplans q
    cross apply q.query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') as nodes(stmt)
    CROSS APPLY stmt.nodes('.//IndexScan/Object') AS index_object(obj)
select query_hash, schema_name, table_name
into #myExecutions
from mycte
where schema_name is not null
and object_id(schema_name + '.' + table_name) in (select object_id from sys.tables)
group by query_hash, schema_name, table_name;
select DISTINCT A.table_name as first_table,
       B.table_name as second_table
from #myExecutions A
join #myExecutions B
on A.query_hash = B.query_hash
where A.table_name < B.table_name;

March 12, 2019

Lonely Tables in SQL Server

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Takeaway: I provide a script that looks at the procedure cache and reports tables that are never joined to other tables.

Recently, I’ve been working hard to reduce our use of SQL Server as much as possible. In other words, I’ve been doing some spring cleaning. I pick up a table in my hands and I look at it. If it doesn’t spark joy then I drop it.

If only it were that easy. That’s not quite the process I’m using. The specific goals I’m chasing are about reducing cost. I’m moving data to cheaper data stores when it makes sense.

So let’s get tidying. But where do I start?

Getting rid of SQL Server tables should accomplish a couple things. First, it should “move the needle”. If my goal is cost, then the tables I choose to remove should reduce my hardware or licensing costs in a tangible way. The second thing is that dropping the table is achievable without 10 years of effort. So I want to focus on “achievability” for a bit.


What’s achievable? I want to identify tables to extract from the database that won’t take years. Large monolithic systems can have a lot of dependencies to unravel.

So what tables in the database have the least dependencies? How do I tell without a trustworthy data model? Is it the ones with the fewest foreign keys (in or out)? Maybe, but foreign keys aren’t always defined properly or they can be missing all together.

My thought is that if two tables are joined together in some query, then they’re related or connected in some fashion. So that’s my idea. I can look at the procedure cache of a database in production to see where the connections are. And when I know that, I can figure out what tables are not connected.

Lonely Tables

This script gives me set of tables that aren’t joined to any other table in any query in cache

use [your db name here];
SELECT qs.query_hash,
       cast(null as xml) as query_plan
  INTO #myplans
  FROM sys.dm_exec_query_stats qs
 CROSS APPLY sys.dm_exec_plan_attributes(qs.plan_handle) pa
 WHERE pa.attribute = 'dbid'
   AND pa.value = db_id();
WITH duplicate_queries AS
  FROM #myplans
DELETE duplicate_queries
 WHERE n > 1;
UPDATE #myplans
   SET query_plan = qp.query_plan
  FROM #myplans mp
 CROSS APPLY sys.dm_exec_query_plan(mp.plan_handle) qp;
WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'),
my_cte AS 
    SELECT q.query_hash,
           obj.value('(@Schema)[1]', 'sysname') AS [schema_name],
           obj.value('(@Table)[1]', 'sysname') AS table_name
      FROM #myplans q
     CROSS APPLY q.query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') as nodes(stmt)
     CROSS APPLY stmt.nodes('.//IndexScan/Object') AS index_object(obj)
SELECT query_hash, [schema_name], table_name
  INTO #myExecutions
  FROM my_cte
 WHERE [schema_name] IS NOT NULL
   AND OBJECT_ID([schema_name] + '.' + table_name) IN (SELECT object_id FROM sys.tables)
 GROUP BY query_hash, [schema_name], table_name;
WITH multi_table_queries AS
    SELECT query_hash
      FROM #myExecutions
     GROUP BY query_hash
    HAVING COUNT(*) > 1
lonely_tables as
    SELECT [schema_name], table_name
      FROM #myExecutions
    SELECT [schema_name], table_name
      FROM #myexecutions WHERE query_hash IN (SELECT query_hash FROM multi_table_queries)
SELECT l.*, ps.row_count
  FROM lonely_tables l
  JOIN sys.dm_db_partition_stats ps
       ON OBJECT_ID(l.[schema_name] + '.' + l.table_name) = ps.object_id
 WHERE ps.index_id in (0,1)
 ORDER BY ps.row_count DESC;


So many caveats.
There are so many things that take away from the accuracy and utility of this script that I hesitated to even publish it.
Here’s the way I used the script. The list of tables was something that helped me begin an investigation. For me, I didn’t use it to give answers, but to generate questions. For example, taking each table in the list, I asked: “How hard would it be to get rid of table X and what would that save us?” I found it useful to consider those questions. Your mileage of course will vary.

October 26, 2018

Uncovering Hidden Complexity

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:15 pm

The other day, Erin Stellato asked a question on twitter about the value of nested SPs. Here’s how I weighed in:

Hidden complexity has given me many problems in the past. SQL Server really really likes things simple and so it’s nice to be able to uncover that complexity. Andy Yun has tackled this problem for nested views with his sp_helpexpandview.

Here’s what I came up with for nested anything. It helps unravel a tree of dependencies based on information found in sys.triggers and sys.dm_sql_referenced_entities. With it, you can see what’s involved when interacting with objects. Here’s what things look like for Sales.SalesOrderDetail in AdventureWorks2014. A lot of the resulting rows can be ignored, but there can be surprises in there too.

A lot in there

DECLARE @object_name SYSNAME = 'Sales.SalesOrderDetail';
WITH dependencies AS
    SELECT @object_name AS [object_name],
             QUOTENAME(OBJECT_SCHEMA_NAME(OBJECT_ID(@object_name))) + '.' + 
             as sysname) as [escaped_name],
           object_id(@object_name) AS [object_id],
           1 AS is_updated,
           CAST('/' + CAST(object_id(@object_name) % 10000 as VARCHAR(30)) + '/' AS hierarchyid) as tree,
           0 as trigger_parent_id
      FROM sys.objects 
     WHERE object_id = object_id(@object_name)
    SELECT CAST(OBJECT_SCHEMA_NAME(o.[object_id]) + '.' + OBJECT_NAME(o.[object_id]) as sysname),
           CAST(QUOTENAME(OBJECT_SCHEMA_NAME(o.[object_id])) + '.' + QUOTENAME(OBJECT_NAME(o.[object_id])) as sysname),
           CASE o.[type] when 'U' then re.is_updated else 1 end,
           CAST(d.tree.ToString() + CAST(o.[object_id] % 10000 as VARCHAR(30)) + '/' AS hierarchyid),
           0 as trigger_parent_id
      FROM dependencies d
     CROSS APPLY sys.dm_sql_referenced_entities(d.[escaped_name], default) re
      JOIN sys.objects o
           ON o.object_id = isnull(re.referenced_id, object_id(ISNULL(re.referenced_schema_name,'dbo') + '.' + re.referenced_entity_name))
     WHERE tree.GetLevel() < 10
       AND re.referenced_minor_id = 0
       AND o.[object_id] <> d.trigger_parent_id
       AND CAST(d.tree.ToString() as varchar(1000)) not like '%' + CAST(o.[object_id] % 10000 as varchar(1000)) + '%'
     SELECT CAST(OBJECT_SCHEMA_NAME(t.[object_id]) + '.' + OBJECT_NAME(t.[object_id]) as sysname),
            CAST(QUOTENAME(OBJECT_SCHEMA_NAME(t.[object_id])) + '.' + QUOTENAME(OBJECT_NAME(t.[object_id])) as sysname),
            0 AS is_updated,
            CAST(d.tree.ToString() + CAST(t.object_id % 10000 as VARCHAR(30)) + '/' AS hierarchyid),
            t.parent_id as trigger_parent_id
       FROM dependencies d
       JOIN sys.triggers t
            ON d.[object_id] = t.parent_id
      WHERE d.is_updated = 1
        AND tree.GetLevel() < 10
        AND CAST(d.tree.ToString() as varchar(1000)) not like '%' + cast(t.[object_id] % 10000 as varchar(1000)) + '%'
SELECT replicate('—', tree.GetLevel() - 1) + ' ' + [object_name], 
       [type_desc] as [type],
       tree.ToString() as dependencies       
  FROM dependencies
 ORDER BY tree

April 6, 2018

Are There Any System Generated Constraint Names Lurking In Your Database?

Names for constraints are optional meaning that if you don’t provide a name when it’s created or cannot afford one, one will be appointed to you by the system.
These system provided names are messy things and I don’t think I have to discourage you from using them. Kenneth Fisher has already done that in Constraint names, Say NO to the default.

But how do you know whether you have any?

Here’s How You Check

SELECT SCHEMA_NAME(schema_id) AS [schema name],
       OBJECT_NAME(object_id) AS [system generated object name],
       OBJECT_NAME(parent_object_id) AS [parent object name],
       type_desc AS [object type]
  FROM sys.objects
         type + '\_\_' + LEFT(OBJECT_NAME(parent_object_id),8) + '\_\_%' ESCAPE '\'
       OBJECT_NAME(object_id) LIKE 
          REPLACE(sys.fn_varbintohexstr(CAST(object_id AS VARBINARY(MAX))), '0x', '%\_\_') ESCAPE '\'

This will find all your messy system-named constraints.
For example, a table defined like this:

create table MY_TABLE
  CHECK (id >= 0)

Will give results like this:

Happy hunting.

Update: April 9, 2018
We can get this info from the system views a little easier as Rob Volk pointed out. I’ve also included the parent object’s type.

       OBJECT_NAME(constid) AS [system generated constraint name],
       (select type_desc from sys.objects where object_id = constid) as [constraint type],
       OBJECT_NAME(id) AS [parent object name],
       (select type_desc from sys.objects where object_id = id) as [parent object type]
  FROM sys.sysconstraints
 WHERE status & 0x20000 > 0
   AND OBJECT_NAME(id) NOT IN (N'__RefactorLog', N'sysdiagrams')
 ORDER BY [parent object type], [parent object name], [system generated constraint name];

September 11, 2017

Remove SQL Junk (Brackets and Other Clutter)

Filed under: Miscelleaneous SQL,SQL Scripts,Technical Articles — Michael J. Swart @ 10:42 am

“Above all else, show the data” says Edwarde Tufte. He’s the data visualization expert who promotes a high data-ink ratio in data visualizations. He describes data-ink as “the non-erasable core of a graphic”. In other words, avoid chartjunk.

SQL Junk

I buy that. Less is More. And we can apply that idea to SQL. If SQL is going to be maintained by a human, it’s best to use a style that is easy on the reader. There’s a lot of syntax in SQL that is redundant. By keeping only the non-erasable syntax in SQL statements, the SQL gets easier to understand and maintain.

Square Brackets

I know I’m not the only one who thinks so. I was watching Kendra Little’s presentation SSMS Shortcuts & Secrets. When someone asked whether there was a shortcut for removing brackets. There isn’t. Most people search and replace “[” with “” and then do the same thing with “]”. Kendra wondered if there was a regular expression that allowed a user to do both. There is, but it’s awkward because brackets need to be escaped: [\[\]] (which is only slightly easier to remember than ¯\_(ツ)_/¯).

SQL Junk in SSMS Generated Scripts

SSMS’s scripting engine is great at taking an object like a table and giving you a CREATE statement that will perfectly recreate what you need. But it’s a huge contributor of SQL Junk. I’ve seen so much SSMS-generated SQL get checked in to repositories that the style is assumed to be a best practice.

Here’s something that shows just how much SQL Junk can be removed. I think the resulting SQL is so much more clear and so much easier to maintain.

April 18, 2017

Find Your Dark Queries

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:36 am

My friend Josh came up with the term “dark queries”. Just like “dark matter”, dark queries can’t be detected directly, but their effect can still be observed. He’s talking about queries that aren’t stored in cache. If your monitoring solution relies heavily on the statistics of cached queries, then you may not be capturing all the activity on your server.

Obi-Wan never told you what happened to those queries

Some of my favorite monitoring solutions rely on the cached queries:

but some queries will fall out of cache or don’t ever make it into cache. Those are the dark queries I’m interested in today. Today let’s look at query recompiles to shed light on some of those dark queries that maybe we’re not measuring.

By the way, if you’re using SQL Server 2016’s query store then this post isn’t for you because Query Store is awesome. Query Store doesn’t rely on the cache. It captures all activity and stores queries separately – Truth in advertising!

High Recompile Rate?

If you work with a high-frequency transactional workload like I do, then you can’t afford the CPU required for frequent recompiles. If you have sustained recompiles larger than a few hundred per second, that’s probably too much. It’s easy to check. Use the performance monitor to take a look at the SQL Re-Compilations/sec counter which is found in SQLServer:SQL Statistics/sec.

Way too many recompiles

Drill Into Recompile Causes
You can drill into this a little further with an extended event session stored to a histogram like this:

  ADD EVENT sqlserver.sql_statement_recompile
  ADD TARGET package0.histogram (
      SET filtering_event_name=N'sqlserver.sql_statement_recompile',
          source_type=(0) );

Query it like this

SELECT sv.subclass_name as recompile_cause, 
  FROM sys.dm_xe_session_targets AS xet  
  JOIN sys.dm_xe_sessions AS xe  
       ON (xe.address = xet.event_session_address)  
 CROSS APPLY ( SELECT CAST(xet.target_data as xml) ) as target_data_xml ([xml])
 CROSS APPLY target_data_xml.[xml].nodes('/HistogramTarget/Slot') AS nodes (slot_data)
         SELECT nodes.slot_data.value('(value)[1]', 'int') AS recompile_cause,
                nodes.slot_data.value('(@count)[1]', 'int') AS recompile_count
       ) as shredded
  JOIN sys.trace_subclass_values AS sv
       ON shredded.recompile_cause = sv.subclass_value
 WHERE xe.name = 'Recompile_Histogram' 
   AND sv.trace_event_id = 37 -- SP:Recompile

To get results like this:

Infrequent Recompiles?

Even if you don’t have frequent recompiles, it’s important to know what’s going on in the server. On your OLTP database, I bet you’re reluctant to let those BI folks run their analytical queries whenever they want. They may try to hide their shenanigans with an OPTION(RECOMPILE) hint. This extended events captures those query recompiles. Be sure to only capture a small sample by having the session run for a small amount of time. Or you can use the histogram above to make sure that the frequency is low.

  ADD EVENT sqlserver.sql_statement_recompile(
    WHERE ([recompile_cause]=(11))) -- Option (RECOMPILE) Requested
  ADD TARGET package0.event_file(SET filename=N'DarkQueries');

Take a look at the dark queries by executing this:

SELECT DarkQueryData.eventDate,
       DB_NAME(DarkQueryData.database_id) as DatabaseName,
                OBJECT_NAME(DarkQueryData.object_id, DarkQueryData.database_id)) command,
  FROM sys.fn_xe_file_target_read_file ( 'DarkQueries*xel', null, null, null) event_file_value
 CROSS APPLY ( SELECT CAST(event_file_value.[event_data] as xml) ) event_file_value_xml ([xml])
         SELECT event_file_value_xml.[xml].value('(event/@timestamp)[1]', 'datetime') as eventDate,
                event_file_value_xml.[xml].value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') as sql_text,
                event_file_value_xml.[xml].value('(event/data[@name="object_type"]/text)[1]', 'nvarchar(100)') as object_type,
                event_file_value_xml.[xml].value('(event/data[@name="object_id"]/value)[1]', 'bigint') as object_id,
                event_file_value_xml.[xml].value('(event/data[@name="source_database_id"]/value)[1]', 'bigint') as database_id,
                event_file_value_xml.[xml].value('(event/data[@name="recompile_cause"]/text)[1]', 'nvarchar(100)') as recompile_cause
       ) as DarkQueryData
 ORDER BY eventDate DESC

This gives you results that look something like the following contrived example:

November 21, 2016

Do You Have Check Constraints On Your Start and End Dates?

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:13 am

Adam Machanic tweeted this advice last week:

Are you missing any of these check constraints? Run this query to check.
This query looks for any columns in the same table that begin with “Start” and “End”. It then looks for check constraints that reference both these columns. If it doesn’t find them, it suggests a check constraint.

WITH StartColumnNames AS
    SELECT object_id,
           name AS column_name
      FROM sys.columns
     WHERE name like 'start%'
EndColumnNames AS
    SELECT object_id,
           name AS column_name
      FROM sys.columns
     WHERE name like 'end%'
SELECT t.object_id,            
       OBJECT_SCHEMA_NAME(t.object_id) AS [schema_name], 
       t.[name] AS table_name,
       s.column_name AS start_column,
       e.column_name AS end_column,
       N'ALTER TABLE ' + QUOTENAME(OBJECT_SCHEMA_NAME(t.object_id)) + N'.' + QUOTENAME(t.name) + 
           N' ADD CONSTRAINT ' + 
           QUOTENAME(N'CK_' + t.name + N'_' + s.column_name + N'_' + e.column_name) + 
           N' CHECK (' + QUOTENAME(s.column_name) + N' <= ' + QUOTENAME(e.column_name) + N');' as check_suggestion
  FROM StartColumnNames s
  JOIN EndColumnNames e
       ON s.object_id = e.object_id
       AND s.column_id <> e.column_id
       AND REPLACE(s.column_name, 'start', 'end') = e.column_name        
  JOIN sys.tables t
       ON t.object_id = s.object_id
           SELECT *
             FROM sys.check_constraints c
             JOIN sys.sql_expression_dependencies start_dependency
                  ON start_dependency.referencing_id = c.object_id
                  AND start_dependency.referenced_id = t.object_id
                  AND start_dependency.referenced_minor_id = s.column_id
             JOIN sys.sql_expression_dependencies end_dependency
                  ON end_dependency.referencing_id = c.object_id
                  AND end_dependency.referenced_id = t.object_id
                  AND end_dependency.referenced_minor_id = e.column_id
            WHERE c.parent_object_id = t.object_id


Don’t blindly run scripts that you got from some random guy’s blog. Even if that someone is me. That’s terribly irresponsible.

But this query may be useful if you do want to look for a very specific, simple kind of constraint that may match your business specs. These constraints are just suggestions and may not match your business rules. For example, when I run this query on Adventureworks, I get one “missing” check constraint for HumanResources.Shift(StartTime, EndTime) and when I look at the contents of the Shift table, I get this data:
Notice that I can’t create a constraint on this table because of the night shift. The constraint doesn’t make sense here.

Creating constraints on existing tables may take time if the table is huge. Locks may be held on that table for an uncomfortably long time.

Of course if your table has data that would violate the constraint, you can’t create it. But now you have to make some other choices. You can correct or delete the offending data or you can add the constraint with NOCHECK.

September 23, 2016

Build Your Own Tools

There are so many ways to look inside SQL Server. New extended events and dynamic management views are introduced every version. But if you want to collect something that’s unavailable, with a little bit of creativity, you can create your own tools.

Inspired by Richie Rump’s latest project, I’m going to describe a tool I wrote and the process I took to get there.

The Problem To Tackle

I had a deadlock graph I wanted to tackle, but I was having trouble reproducing it. I needed to know more about the queries involved. But the query plans were no longer in cache. So here’s the problem

Can I collect the execution plans that were used for the queries involved in a deadlock graph?

I want to use that information to reproduce – and ultimately fix – the deadlock.

The Right Tool For the Job

If you don’t have enough data to get to the root cause of an issue, put something in place for next time.

Can I Use Out-Of-The-Box Extended Events?
I’m getting used to extended events and so my first thought was “Is there a query plan field I can collect with the deadlock extended event? There is not. Which isn’t too surprising. Deadlock detection is independent of any single query.

So How Do I Get To The Execution Plans?
So when I look at a deadlock graph, I can see there are sql_handles. Given that, I can grab the plan_handle and then the query plan from the cache, but I’m going to need to collect it automatically at the time the deadlock is generated. So I’m going to need

  • XML shredding skills
  • Ability to navigate DMVs to get at the cached query plans
  • A way to programatically respond to deadlock graph events (like an XE handler or a trigger)

Responding Automatically to Extended Events
This is when I turned to #sqlhelp. And sure enough, six minutes later, Dave Mason helped me out:

I had never heard of Event Notifications, so after some googling, I discovered two things. The first thing is that I can only use Event Notifications with DDL or SQLTrace events rather than the larger set of extended events. Luckily deadlock graphs are available in both. The second thing is that Event Notifications aren’t quite notifications the way alerts are. They’re a way to push event info into a Service Broker queue. If I want automatic actions taken on Service Broker messages, I have to define and configure an activation procedure to process each message. In pictures, here’s my plan so far:

My Plan

Will It Work?
I think so, I’ve had a lot of success creating my own tools in the past such as

Also, looking at the different pieces, they’re probably robust enough to do what I need them to do. One surprise involved security concerns for activated procedures that need to access system views

The Deadlock Logging Script

Here’s the tool!

Create the Database

USE master;
IF (DB_ID('DeadlockLogging') IS NOT NULL)
    DROP DATABASE DeadlockLogging;

Create the Service Broker Objects
I’ve never used Service Broker before, so a lot of this comes from examples found in Books Online.

use DeadlockLogging;
CREATE QUEUE dbo.LogDeadlocksQueue;
CREATE SERVICE LogDeadlocksService
    ON QUEUE dbo.LogDeadlocksQueue 
CREATE ROUTE LogDeadlocksRoute
    WITH SERVICE_NAME = 'LogDeadlocksService',
-- add server level notification
IF EXISTS (SELECT * FROM sys.server_event_notifications WHERE [name] = 'LogDeadlocks')
    FOR deadlock_graph -- name of SQLTrace event type
    TO SERVICE ''LogDeadlocksService'', ''' + CAST(service_broker_guid as nvarchar(max))+ ''';'
FROM sys.databases 
WHERE [name] = DB_NAME();
EXEC sp_executesql @SQL;

The dynamic SQL is used to fetch the database guid of the newly created database.

a Place to Store Deadlocks

-- Create a place to store the deadlock graphs along with query plan information
CREATE SEQUENCE dbo.DeadlockIdentity START WITH 1;
CREATE TABLE dbo.ExtendedDeadlocks 
    DeadlockId bigint not null,
    DeadlockTime datetime not null,
    SqlHandle varbinary(64),
    StatementStart int,
    [Statement] nvarchar(max) null,
    Deadlock XML not null,
    FirstQueryPlan XML
    ON dbo.ExtendedDeadlocks(DeadlockTime, DeadlockId);

The Procedure That Processes Queue Messages

CREATE PROCEDURE dbo.ProcessDeadlockMessage
    WAITFOR ( 
        RECEIVE TOP(1)
            @RecvMsgTime = message_enqueue_time,
            @RecvMsg = message_body
        FROM dbo.LogDeadlocksQueue
    ), TIMEOUT 5000;
    IF (@@ROWCOUNT = 0)
    DECLARE @DeadlockId BIGINT = NEXT VALUE FOR dbo.DeadlockIdentity;
    DECLARE @RecsvMsgXML XML = CAST(@RecvMsg AS XML);
    DECLARE @DeadlockGraph XML = @RecsvMsgXML.query('/EVENT_INSTANCE/TextData/deadlock-list/deadlock');
    WITH DistinctSqlHandles AS
        SELECT DISTINCT node.value('@sqlhandle', 'varchar(max)') as SqlHandle
        FROM @RecsvMsgXML.nodes('//frame') AS frames(node)            
    INSERT ExtendedDeadlocks (DeadlockId, DeadlockTime, SqlHandle, StatementStart, [Statement], Deadlock, FirstQueryPlan)
    SELECT @DeadlockId,
    FROM DistinctSqlHandles s
    LEFT JOIN sys.dm_exec_query_stats qs 
        on qs.sql_handle = CONVERT(VARBINARY(64), SqlHandle, 1) 
    OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
    OUTER APPLY sys.dm_exec_sql_text (CONVERT(VARBINARY(64), SqlHandle, 1)) st
      SELECT SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2,
             WHEN qs.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX), st.text)) * 2
             ELSE qs.statement_end_offset + 2
             END - qs.statement_start_offset) / 2)) as sqlStatement([statement]);
    -- clean up old deadlocks
    SELECT DISTINCT TOP (500) @limit = DeadlockId 
    FROM ExtendedDeadlocks 
    ORDER BY DeadlockId DESC;
    DELETE ExtendedDeadlocks 
    WHERE DeadlockId < @limit;

Activating the Procedure

ALTER QUEUE dbo.LogDeadlocksQueue
    ( STATUS = ON,
      PROCEDURE_NAME = dbo.ProcessDeadlockMessage,

Clean Up
And when you’re all done, this code will clean up this whole experiment.

use master;
if (db_id('DeadlockLogging') is not null)
    alter database DeadlockLogging set single_user with rollback immediate 
    drop database DeadlockLogging
if exists (select * from sys.server_event_notifications where name = 'DeadlockLogging')

April 20, 2016

Are You Programming In The Database?

Typically, T-SQL is not the best platform for programming (understatement). If you have many procedures that call other procedures, that’s a signal that you might be programming in the database.

Find out using this query:

    OBJECT_SCHEMA_NAME(p.object_id) as schemaName, 
    OBJECT_NAME(p.object_id) as procedureName,
    count(*) as [calls to other procedures]	
from sys.procedures p
cross apply sys.dm_sql_referenced_entities(schema_name(p.schema_id) + '.' + p.name, 'OBJECT') re
where re.referenced_entity_name in (select name from sys.procedures)
group by p.object_id
order by count(*) desc;

in Adventureworks, we see this result:

To drill down into those results, use this query:

select distinct
    QUOTENAME(OBJECT_SCHEMA_NAME(p.object_id)) + '.' 
        + QUOTENAME(OBJECT_NAME(p.object_id)) [This procedure...], 
    QUOTENAME(OBJECT_SCHEMA_NAME(p_ref.object_id)) + '.' 
        + QUOTENAME(OBJECT_NAME(p_ref.object_id)) [... calls this procedure]
from sys.procedures p
cross apply sys.dm_sql_referenced_entities(schema_name(p.schema_id) + '.' + p.name, 'OBJECT') re
join sys.procedures p_ref
	on re.referenced_entity_name = p_ref.name
order by 1,2

which gives results like this:

Adventureworks seems just fine to me. Only four instances of procedures calling procedures. I looked at the database I work with most. Hundreds of procedures (representing 15% of the procedures) call other procedures. On the other end of the spectrum is Stackoverflow. I understand that they don’t use stored procedures at all.

April 11, 2016

Tackle WRITELOG Waits Using the Transaction Log and Extended Events

Takeaway: WRITELOG waits are associated with a busy or slow transaction log. To tackle these waits, we need to measure transaction log activity. I describe a lightweight way to examine transaction log usage for busy OLTP systems.


Start with Microsoft’s Advice: I’m not going to introduce the topic of transaction log performance. Microsoft’s SQL Customer Advisory Team already provides a great introduction with Diagnosing Transaction Log Performance Issues and Limits of the Log Manager. Their advice includes watching the “Log Bytes Flushed/sec” performance counter found in the “SQL Server:Databases” object.

Reactive Efforts: If you’re curious about transaction log activity for open transactions, Paul Randal has a script at Script: open transactions with text and plans.

Spiky Activity: It’s not too difficult to find infrequent activities that write a lot of data to the transaction log; activities like data warehouse ETLs, or index rebuilds. Use a trace or extended events to look for statements with large values for “writes”.

Scalability of OLTP Workloads

WRITELOG waits are a scalability challenge for OLTP workloads under load. Chris Adkin has a lot of experience tuning SQL Server for high-volume OLTP workloads. So I’m going to follow his advice when he writes we should minimize the amount logging generated. And because I can’t improve something if I can’t measure it, I wonder what’s generating the most logging? OLTP workloads are characterized by frequent tiny transactions so I want to measure that activity without filters, but I want to have as little impact to the system as I can. That’s my challenge.

Getting #SQLHelp

So I asked twitter. And I got some great advice from Erin Stellato:
Erin also pointed out that the UI warns you that it’s a very high volume event.

Combining fn_dblog With Extended Events

So to avoid that kind of volume, I got the idea to read straight from the transaction log and combine that with a lighter extended events session to get the SQL text. The transaction_id captured by the extended events session corresponds to the XAct ID column in fn_dblog.

Here’s how that went:

The Script
The details for this script are kind of fussy, but it all comes together in a solution that won’t drag a server down. Care is still recommended; start with 10 seconds and go from there.

declare @Duration varchar(10) = '00:00:10';
declare @FileSize varchar(10) = '5'; -- in megabytes
-- create session
DECLARE @CreateSessionSQL nvarchar(max) = N'
    ADD EVENT sqlserver.sp_statement_completed ( 
        SET collect_statement=(0)
        ACTION(sqlserver.transaction_id, sqlserver.database_name)
        WHERE sqlserver.transaction_id > 0
          AND sqlserver.database_name = ''' + DB_NAME() + N''')
    ADD TARGET package0.asynchronous_file_target(
      SET filename = N''query_writes.xel'',
          max_file_size = ' + @FileSize + N',
          max_rollover_files = 1)
    WITH (
exec sp_executesql @CreateSessionSQL;
-- get the latest lsn for current DB
declare @xact_seqno binary(10);
declare @xact_seqno_string_begin varchar(50);
exec sp_replincrementlsn @xact_seqno OUTPUT;
set @xact_seqno_string_begin = '0x' + CONVERT(varchar(50), @xact_seqno, 2);
set @xact_seqno_string_begin = stuff(@xact_seqno_string_begin, 11, 0, ':')
set @xact_seqno_string_begin = stuff(@xact_seqno_string_begin, 20, 0, ':');
-- wait a minute
waitfor delay @Duration;
-- get the latest lsn for current DB
declare @xact_seqno_string_end varchar(50);
exec sp_replincrementlsn @xact_seqno OUTPUT;
set @xact_seqno_string_end = '0x' + CONVERT(varchar(50), @xact_seqno, 2);
set @xact_seqno_string_end = stuff(@xact_seqno_string_end, 11, 0, ':')
set @xact_seqno_string_end = stuff(@xact_seqno_string_end, 20, 0, ':');
-- Stop the session
-- read from transaction log
    max([Xact ID]) as transactionId,
    max([Transaction Name]) as transactionName, 
    sum([Log Record Length]) as logSize,
    count(*) as [logRowCount]
into #TLOGS
from fn_dblog(@xact_seqno_string_begin, @xact_seqno_string_end) f
group by [Transaction Id]
-- read from session data
CREATE TABLE #SessionData (
    id int identity primary key,
    XEXml xml NOT NULL    
INSERT #SessionData(XEXml)
SELECT CAST(fileData.[event_data] as xml)
FROM sys.fn_xe_file_target_read_file ( 'query_writes*xel', null, null, null) fileData;
-- find minimum transactionId captured by xes 
-- (almost always the first one, depending on luck here)
declare @minTXFromSession bigint;
select TOP (1) @minTXFromSession = S.XEXml.value(
    '(/event/action[(@name=''transaction_id'')]/value)[1]', 'bigint')
from #SessionData S;
            '(/event/action[(@name=''transaction_id'')]/value)[1]', 'bigint') as transactionId,
            '(/event/data[(@name=''object_id'')]/value)[1]', 'bigint') as objectId
    FROM #SessionData S
    ISNULL(T.transactionName, 'Unknown') as transactionTypeName, 
    OBJECT_NAME(S.objectid) as ObjectName,
    SUM(T.logsize) as totalLogSizeBytes,
    SUM(T.logRowCount) as totalLogRowCount,
    COUNT(*) as executions
    ON T.transactionId = S.transactionId
WHERE T.transactionId >= @minTXFromSession
GROUP BY T.transactionName, S.objectId
-- clean up
DROP TABLE #SessionData

Sample Results

Here’s an example of what the results would look like. It’s an aggregated view of all transaction log activity in a database for 10 seconds.
Example Results


  • Notice that the session is database specific. That’s because transaction logs are database specific. To help focus on the right database, use the “Log Bytes Flushed/sec” performance counter found in the “SQL Server:Databases” object.
  • Also notice that I’m tracking ObjectIds. That’s because we use procedures quite heavily. You may want to adapt this code to use query_hash instead. In both cases, collecting the statement text is not recommended.
  • The sample of data is limited by the size of the extended events target file or the duration variable, whichever is smaller.
  • @sqL_handLe pointed out to me that reading the log using fn_dblog will prevent the transaction log from truncating. Reading from the transaction log can be very tricky to do efficiently. Luckily we can use the sp_replincrementlsn trick to get LSN parameter values for fn_dblog.
Older Posts »

Powered by WordPress