Michael J. Swart

April 18, 2017

Find Your Dark Queries

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:36 am

My friend Josh came up with the term “dark queries”. Just like “dark matter”, dark queries can’t be detected directly, but their effect can still be observed. He’s talking about queries that aren’t stored in cache. If your monitoring solution relies heavily on the statistics of cached queries, then you may not be capturing all the activity on your server.

Obi-Wan never told you what happened to those queries

Some of my favorite monitoring solutions rely on the cached queries:

but some queries will fall out of cache or don’t ever make it into cache. Those are the dark queries I’m interested in today. Today let’s look at query recompiles to shed light on some of those dark queries that maybe we’re not measuring.

By the way, if you’re using SQL Server 2016’s query store then this post isn’t for you because Query Store is awesome. Query Store doesn’t rely on the cache. It captures all activity and stores queries separately – Truth in advertising!

High Recompile Rate?

If you work with a high-frequency transactional workload like I do, then you can’t afford the CPU required for frequent recompiles. If you have sustained recompiles larger than a few hundred per second, that’s probably too much. It’s easy to check. Use the performance monitor to take a look at the SQL Re-Compilations/sec counter which is found in SQLServer:SQL Statistics/sec.

Way too many recompiles

Drill Into Recompile Causes
You can drill into this a little further with an extended event session stored to a histogram like this:

CREATE EVENT SESSION Recompile_Histogram ON SERVER 
  ADD EVENT sqlserver.sql_statement_recompile
  ADD TARGET package0.histogram (
      SET filtering_event_name=N'sqlserver.sql_statement_recompile',
          source=N'recompile_cause',
          source_type=(0) );
 
ALTER EVENT SESSION Recompile_Histogram ON SERVER STATE = START;

Query it like this

SELECT sv.subclass_name as recompile_cause, 
       shredded.recompile_count
  FROM sys.dm_xe_session_targets AS xet  
  JOIN sys.dm_xe_sessions AS xe  
       ON (xe.address = xet.event_session_address)  
 CROSS APPLY ( SELECT CAST(xet.target_data as xml) ) as target_data_xml ([xml])
 CROSS APPLY target_data_xml.[xml].nodes('/HistogramTarget/Slot') AS nodes (slot_data)
 CROSS APPLY (
         SELECT nodes.slot_data.value('(value)[1]', 'int') AS recompile_cause,
                nodes.slot_data.value('(@count)[1]', 'int') AS recompile_count
       ) as shredded
  JOIN sys.trace_subclass_values AS sv
       ON shredded.recompile_cause = sv.subclass_value
 WHERE xe.name = 'Recompile_Histogram' 
   AND sv.trace_event_id = 37 -- SP:Recompile

To get results like this:
recompile_reasons

Infrequent Recompiles?

Even if you don’t have frequent recompiles, it’s important to know what’s going on in the server. On your OLTP database, I bet you’re reluctant to let those BI folks run their analytical queries whenever they want. They may try to hide their shenanigans with an OPTION(RECOMPILE) hint. This extended events captures those query recompiles. Be sure to only capture a small sample by having the session run for a small amount of time. Or you can use the histogram above to make sure that the frequency is low.

CREATE EVENT SESSION [DarkQueries] ON SERVER
  ADD EVENT sqlserver.sql_statement_recompile(
    ACTION(sqlserver.database_id,sqlserver.sql_text)
    WHERE ([recompile_cause]=(11))) -- Option (RECOMPILE) Requested
  ADD TARGET package0.event_file(SET filename=N'DarkQueries');
ALTER EVENT SESSION [DarkQueries] ON SERVER STATE = START;
GO

Take a look at the dark queries by executing this:

SELECT DarkQueryData.eventDate,
       DB_NAME(DarkQueryData.database_id) as DatabaseName,
       DarkQueryData.object_type,
       COALESCE(DarkQueryData.sql_text, 
                OBJECT_NAME(DarkQueryData.object_id, DarkQueryData.database_id)) command,
       DarkQueryData.recompile_cause
  FROM sys.fn_xe_file_target_read_file ( 'DarkQueries*xel', null, null, null) event_file_value
 CROSS APPLY ( SELECT CAST(event_file_value.[event_data] as xml) ) event_file_value_xml ([xml])
 CROSS APPLY (
         SELECT event_file_value_xml.[xml].value('(event/@timestamp)[1]', 'datetime') as eventDate,
                event_file_value_xml.[xml].value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') as sql_text,
                event_file_value_xml.[xml].value('(event/data[@name="object_type"]/text)[1]', 'nvarchar(100)') as object_type,
                event_file_value_xml.[xml].value('(event/data[@name="object_id"]/value)[1]', 'bigint') as object_id,
                event_file_value_xml.[xml].value('(event/data[@name="source_database_id"]/value)[1]', 'bigint') as database_id,
                event_file_value_xml.[xml].value('(event/data[@name="recompile_cause"]/text)[1]', 'nvarchar(100)') as recompile_cause
       ) as DarkQueryData
 ORDER BY eventDate DESC

This gives you results that look something like the following contrived example:
darkqueries

November 21, 2016

Do You Have Check Constraints On Your Start and End Dates?

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:13 am

Adam Machanic tweeted this advice last week:

Are you missing any of these check constraints? Run this query to check.
This query looks for any columns in the same table that begin with “Start” and “End”. It then looks for check constraints that reference both these columns. If it doesn’t find them, it suggests a check constraint.

WITH StartColumnNames AS
(
    SELECT object_id,
           column_id,
           name AS column_name
      FROM sys.columns
     WHERE name like 'start%'
),
EndColumnNames AS
(
    SELECT object_id,
           column_id,
           name AS column_name
      FROM sys.columns
     WHERE name like 'end%'
)
SELECT t.object_id,            
       OBJECT_SCHEMA_NAME(t.object_id) AS [schema_name], 
       t.[name] AS table_name,
       s.column_name AS start_column,
       e.column_name AS end_column,
       N'ALTER TABLE ' + QUOTENAME(OBJECT_SCHEMA_NAME(t.object_id)) + N'.' + QUOTENAME(t.name) + 
           N' ADD CONSTRAINT ' + 
           QUOTENAME(N'CK_' + t.name + N'_' + s.column_name + N'_' + e.column_name) + 
           N' CHECK (' + QUOTENAME(s.column_name) + N' <= ' + QUOTENAME(e.column_name) + N');' as check_suggestion
  FROM StartColumnNames s
  JOIN EndColumnNames e
       ON s.object_id = e.object_id
       AND s.column_id <> e.column_id
       AND REPLACE(s.column_name, 'start', 'end') = e.column_name        
  JOIN sys.tables t
       ON t.object_id = s.object_id
 WHERE NOT EXISTS 
       (
           SELECT *
             FROM sys.check_constraints c
             JOIN sys.sql_expression_dependencies start_dependency
                  ON start_dependency.referencing_id = c.object_id
                  AND start_dependency.referenced_id = t.object_id
                  AND start_dependency.referenced_minor_id = s.column_id
             JOIN sys.sql_expression_dependencies end_dependency
                  ON end_dependency.referencing_id = c.object_id
                  AND end_dependency.referenced_id = t.object_id
                  AND end_dependency.referenced_minor_id = e.column_id
            WHERE c.parent_object_id = t.object_id
       )

Caveats

Don’t blindly run scripts that you got from some random guy’s blog. Even if that someone is me. That’s terribly irresponsible.

But this query may be useful if you do want to look for a very specific, simple kind of constraint that may match your business specs. These constraints are just suggestions and may not match your business rules. For example, when I run this query on Adventureworks, I get one “missing” check constraint for HumanResources.Shift(StartTime, EndTime) and when I look at the contents of the Shift table, I get this data:
shifttimes
Notice that I can’t create a constraint on this table because of the night shift. The constraint doesn’t make sense here.

Creating constraints on existing tables may take time if the table is huge. Locks may be held on that table for an uncomfortably long time.

Of course if your table has data that would violate the constraint, you can’t create it. But now you have to make some other choices. You can correct or delete the offending data or you can add the constraint with NOCHECK.

September 23, 2016

Build Your Own Tools

There are so many ways to look inside SQL Server. New extended events and dynamic management views are introduced every version. But if you want to collect something that’s unavailable, with a little bit of creativity, you can create your own tools.

Inspired by Richie Rump’s latest project, I’m going to describe a tool I wrote and the process I took to get there.

The Problem To Tackle

I had a deadlock graph I wanted to tackle, but I was having trouble reproducing it. I needed to know more about the queries involved. But the query plans were no longer in cache. So here’s the problem

Can I collect the execution plans that were used for the queries involved in a deadlock graph?


I want to use that information to reproduce – and ultimately fix – the deadlock.

The Right Tool For the Job

If you don’t have enough data to get to the root cause of an issue, put something in place for next time.

Can I Use Out-Of-The-Box Extended Events?
I’m getting used to extended events and so my first thought was “Is there a query plan field I can collect with the deadlock extended event? There is not. Which isn’t too surprising. Deadlock detection is independent of any single query.

So How Do I Get To The Execution Plans?
So when I look at a deadlock graph, I can see there are sql_handles. Given that, I can grab the plan_handle and then the query plan from the cache, but I’m going to need to collect it automatically at the time the deadlock is generated. So I’m going to need

  • XML shredding skills
  • Ability to navigate DMVs to get at the cached query plans
  • A way to programatically respond to deadlock graph events (like an XE handler or a trigger)

Responding Automatically to Extended Events
This is when I turned to #sqlhelp. And sure enough, six minutes later, Dave Mason helped me out:

I had never heard of Event Notifications, so after some googling, I discovered two things. The first thing is that I can only use Event Notifications with DDL or SQLTrace events rather than the larger set of extended events. Luckily deadlock graphs are available in both. The second thing is that Event Notifications aren’t quite notifications the way alerts are. They’re a way to push event info into a Service Broker queue. If I want automatic actions taken on Service Broker messages, I have to define and configure an activation procedure to process each message. In pictures, here’s my plan so far:

My Plan

Will It Work?
I think so, I’ve had a lot of success creating my own tools in the past such as

Also, looking at the different pieces, they’re probably robust enough to do what I need them to do. One surprise involved security concerns for activated procedures that need to access system views

The Deadlock Logging Script

Here’s the tool!

Create the Database

USE master;
 
IF (DB_ID('DeadlockLogging') IS NOT NULL)
BEGIN
    ALTER DATABASE DeadlockLogging SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE DeadlockLogging;
END
 
CREATE DATABASE DeadlockLogging WITH TRUSTWORTHY ON;
GO

Create the Service Broker Objects
I’ve never used Service Broker before, so a lot of this comes from examples found in Books Online.

use DeadlockLogging;
 
CREATE QUEUE dbo.LogDeadlocksQueue;
 
CREATE SERVICE LogDeadlocksService
    ON QUEUE dbo.LogDeadlocksQueue 
    ([http://schemas.microsoft.com/SQL/Notifications/PostEventNotification]);
 
CREATE ROUTE LogDeadlocksRoute
    WITH SERVICE_NAME = 'LogDeadlocksService',
    ADDRESS = 'LOCAL';
 
-- add server level notification
IF EXISTS (SELECT * FROM sys.server_event_notifications WHERE [name] = 'LogDeadlocks')
    DROP EVENT NOTIFICATION LogDeadlocks ON SERVER;
 
DECLARE @SQL NVARCHAR(MAX);
SELECT @SQL = N'
    CREATE EVENT NOTIFICATION LogDeadlocks 
    ON SERVER 
    FOR deadlock_graph -- name of SQLTrace event type
    TO SERVICE ''LogDeadlocksService'', ''' + CAST(service_broker_guid as nvarchar(max))+ ''';'
FROM sys.databases 
WHERE [name] = DB_NAME();
EXEC sp_executesql @SQL;
GO

The dynamic SQL is used to fetch the database guid of the newly created database.

a Place to Store Deadlocks

-- Create a place to store the deadlock graphs along with query plan information
CREATE SEQUENCE dbo.DeadlockIdentity START WITH 1;
 
CREATE TABLE dbo.ExtendedDeadlocks 
(
    DeadlockId bigint not null,
    DeadlockTime datetime not null,
    SqlHandle varbinary(64),
    StatementStart int,
    [Statement] nvarchar(max) null,
    Deadlock XML not null,
    FirstQueryPlan XML
);
 
CREATE CLUSTERED INDEX IX_ExtendedDeadlocks 
    ON dbo.ExtendedDeadlocks(DeadlockTime, DeadlockId);
GO

The Procedure That Processes Queue Messages

CREATE PROCEDURE dbo.ProcessDeadlockMessage
AS
  DECLARE @RecvMsg NVARCHAR(MAX);
  DECLARE @RecvMsgTime DATETIME;
  SET XACT_ABORT ON;
  BEGIN TRANSACTION;
 
    WAITFOR ( 
        RECEIVE TOP(1)
            @RecvMsgTime = message_enqueue_time,
            @RecvMsg = message_body
        FROM dbo.LogDeadlocksQueue
    ), TIMEOUT 5000;
 
    IF (@@ROWCOUNT = 0)
    BEGIN
      ROLLBACK TRANSACTION;
      RETURN;
    END
 
    DECLARE @DeadlockId BIGINT = NEXT VALUE FOR dbo.DeadlockIdentity;
    DECLARE @RecsvMsgXML XML = CAST(@RecvMsg AS XML);
    DECLARE @DeadlockGraph XML = @RecsvMsgXML.query('/EVENT_INSTANCE/TextData/deadlock-list/deadlock');
 
    WITH DistinctSqlHandles AS
    (
        SELECT DISTINCT node.value('@sqlhandle', 'varchar(max)') as SqlHandle
        FROM @RecsvMsgXML.nodes('//frame') AS frames(node)            
    )
    INSERT ExtendedDeadlocks (DeadlockId, DeadlockTime, SqlHandle, StatementStart, [Statement], Deadlock, FirstQueryPlan)
    SELECT @DeadlockId,
        @RecvMsgTime,
        qs.sql_handle,
        qs.statement_start_offset,
        [statement],
        @DeadlockGraph, 
        qp.query_plan
    FROM DistinctSqlHandles s
    LEFT JOIN sys.dm_exec_query_stats qs 
        on qs.sql_handle = CONVERT(VARBINARY(64), SqlHandle, 1) 
    OUTER APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
    OUTER APPLY sys.dm_exec_sql_text (CONVERT(VARBINARY(64), SqlHandle, 1)) st
    OUTER APPLY ( 
      SELECT SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2,
          (CASE 
             WHEN qs.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX), st.text)) * 2
             ELSE qs.statement_end_offset + 2
             END - qs.statement_start_offset) / 2)) as sqlStatement([statement]);
 
    -- clean up old deadlocks
    DECLARE @limit BIGINT
    SELECT DISTINCT TOP (500) @limit = DeadlockId 
    FROM ExtendedDeadlocks 
    ORDER BY DeadlockId DESC;
    DELETE ExtendedDeadlocks 
    WHERE DeadlockId < @limit;
 
  COMMIT
 
GO

Activating the Procedure

ALTER QUEUE dbo.LogDeadlocksQueue
    WITH ACTIVATION
    ( STATUS = ON,
      PROCEDURE_NAME = dbo.ProcessDeadlockMessage,
      MAX_QUEUE_READERS = 1,
      EXECUTE AS SELF
    );
GO

Clean Up
And when you’re all done, this code will clean up this whole experiment.

use master;
 
if (db_id('DeadlockLogging') is not null)
begin
    alter database DeadlockLogging set single_user with rollback immediate 
    drop database DeadlockLogging
end
 
if exists (select * from sys.server_event_notifications where name = 'DeadlockLogging')
    DROP EVENT NOTIFICATION LogDeadlocks ON SERVER;

April 20, 2016

Are You Programming In The Database?

Typically, T-SQL is not the best platform for programming (understatement). If you have many procedures that call other procedures, that’s a signal that you might be programming in the database.

Find out using this query:

select 
    OBJECT_SCHEMA_NAME(p.object_id) as schemaName, 
    OBJECT_NAME(p.object_id) as procedureName,
    count(*) as [calls to other procedures]	
from sys.procedures p
cross apply sys.dm_sql_referenced_entities(schema_name(p.schema_id) + '.' + p.name, 'OBJECT') re
where re.referenced_entity_name in (select name from sys.procedures)
group by p.object_id
order by count(*) desc;

in Adventureworks, we see this result:
ProcsCallingProcs1

To drill down into those results, use this query:

select distinct
    QUOTENAME(OBJECT_SCHEMA_NAME(p.object_id)) + '.' 
        + QUOTENAME(OBJECT_NAME(p.object_id)) [This procedure...], 
    QUOTENAME(OBJECT_SCHEMA_NAME(p_ref.object_id)) + '.' 
        + QUOTENAME(OBJECT_NAME(p_ref.object_id)) [... calls this procedure]
from sys.procedures p
cross apply sys.dm_sql_referenced_entities(schema_name(p.schema_id) + '.' + p.name, 'OBJECT') re
join sys.procedures p_ref
	on re.referenced_entity_name = p_ref.name
order by 1,2

which gives results like this:
ProcsCallingProcs2

Adventureworks seems just fine to me. Only four instances of procedures calling procedures. I looked at the database I work with most. Hundreds of procedures (representing 15% of the procedures) call other procedures. On the other end of the spectrum is Stackoverflow. I understand that they don’t use stored procedures at all.

April 11, 2016

Tackle WRITELOG Waits Using the Transaction Log and Extended Events

Takeaway: WRITELOG waits are associated with a busy or slow transaction log. To tackle these waits, we need to measure transaction log activity. I describe a lightweight way to examine transaction log usage for busy OLTP systems.

Tackle WRITELOG

Start with Microsoft’s Advice: I’m not going to introduce the topic of transaction log performance. Microsoft’s SQL Customer Advisory Team already provides a great introduction with Diagnosing Transaction Log Performance Issues and Limits of the Log Manager. Their advice includes watching the “Log Bytes Flushed/sec” performance counter found in the “SQL Server:Databases” object.

Reactive Efforts: If you’re curious about transaction log activity for open transactions, Paul Randal has a script at Script: open transactions with text and plans.

Spiky Activity: It’s not too difficult to find infrequent activities that write a lot of data to the transaction log; activities like data warehouse ETLs, or index rebuilds. Use a trace or extended events to look for statements with large values for “writes”.

Scalability of OLTP Workloads

WRITELOG waits are a scalability challenge for OLTP workloads under load. Chris Adkin has a lot of experience tuning SQL Server for high-volume OLTP workloads. So I’m going to follow his advice when he writes we should minimize the amount logging generated. And because I can’t improve something if I can’t measure it, I wonder what’s generating the most logging? OLTP workloads are characterized by frequent tiny transactions so I want to measure that activity without filters, but I want to have as little impact to the system as I can. That’s my challenge.

Getting #SQLHelp

So I asked twitter. And I got some great advice from Erin Stellato:
sqlhelp
Erin also pointed out that the UI warns you that it’s a very high volume event.

Combining fn_dblog With Extended Events

So to avoid that kind of volume, I got the idea to read straight from the transaction log and combine that with a lighter extended events session to get the SQL text. The transaction_id captured by the extended events session corresponds to the XAct ID column in fn_dblog.

Here’s how that went:

The Script
The details for this script are kind of fussy, but it all comes together in a solution that won’t drag a server down. Care is still recommended; start with 10 seconds and go from there.

declare @Duration varchar(10) = '00:00:10';
declare @FileSize varchar(10) = '5'; -- in megabytes
 
-- create session
DECLARE @CreateSessionSQL nvarchar(max) = N'
    CREATE EVENT SESSION query_writes ON SERVER 
    ADD EVENT sqlserver.sp_statement_completed ( 
        SET collect_statement=(0)
        ACTION(sqlserver.transaction_id, sqlserver.database_name)
        WHERE sqlserver.transaction_id > 0
          AND sqlserver.database_name = ''' + DB_NAME() + N''')
    ADD TARGET package0.asynchronous_file_target(
      SET filename = N''query_writes.xel'',
          max_file_size = ' + @FileSize + N',
          max_rollover_files = 1)
    WITH (
        STARTUP_STATE=ON,
        EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
        TRACK_CAUSALITY=OFF)';
exec sp_executesql @CreateSessionSQL;
 
ALTER EVENT SESSION query_writes ON SERVER
    STATE=START;
 
-- get the latest lsn for current DB
declare @xact_seqno binary(10);
declare @xact_seqno_string_begin varchar(50);
exec sp_replincrementlsn @xact_seqno OUTPUT;
set @xact_seqno_string_begin = '0x' + CONVERT(varchar(50), @xact_seqno, 2);
set @xact_seqno_string_begin = stuff(@xact_seqno_string_begin, 11, 0, ':')
set @xact_seqno_string_begin = stuff(@xact_seqno_string_begin, 20, 0, ':');
 
-- wait a minute
waitfor delay @Duration;
 
-- get the latest lsn for current DB
declare @xact_seqno_string_end varchar(50);
exec sp_replincrementlsn @xact_seqno OUTPUT;
set @xact_seqno_string_end = '0x' + CONVERT(varchar(50), @xact_seqno, 2);
set @xact_seqno_string_end = stuff(@xact_seqno_string_end, 11, 0, ':')
set @xact_seqno_string_end = stuff(@xact_seqno_string_end, 20, 0, ':');
 
-- Stop the session
ALTER EVENT SESSION query_writes ON SERVER
    STATE=STOP;
 
-- read from transaction log
select 
    max([Xact ID]) as transactionId,
    max([Transaction Name]) as transactionName, 
    sum([Log Record Length]) as logSize,
    count(*) as [logRowCount]
into #TLOGS
from fn_dblog(@xact_seqno_string_begin, @xact_seqno_string_end) f
group by [Transaction Id]
 
-- read from session data
CREATE TABLE #SessionData (
    id int identity primary key,
    XEXml xml NOT NULL    
)
 
INSERT #SessionData(XEXml)
SELECT CAST(fileData.[event_data] as xml)
FROM sys.fn_xe_file_target_read_file ( 'query_writes*xel', null, null, null) fileData;
 
-- find minimum transactionId captured by xes 
-- (almost always the first one, depending on luck here)
declare @minTXFromSession bigint;
select TOP (1) @minTXFromSession = S.XEXml.value(
    '(/event/action[(@name=''transaction_id'')]/value)[1]', 'bigint')
from #SessionData S;
 
WITH SD AS
(
    SELECT 
        S.XEXml.value(
            '(/event/action[(@name=''transaction_id'')]/value)[1]', 'bigint') as transactionId,
        S.XEXml.value(
            '(/event/data[(@name=''object_id'')]/value)[1]', 'bigint') as objectId
    FROM #SessionData S
)
SELECT 
    ISNULL(T.transactionName, 'Unknown') as transactionTypeName, 
    OBJECT_NAME(S.objectid) as ObjectName,
    SUM(T.logsize) as totalLogSizeBytes,
    SUM(T.logRowCount) as totalLogRowCount,
    COUNT(*) as executions
FROM #TLOGS T
LEFT JOIN (SELECT DISTINCT * FROM SD) S
    ON T.transactionId = S.transactionId
WHERE T.transactionId >= @minTXFromSession
GROUP BY T.transactionName, S.objectId
ORDER BY SUM(T.logsize) DESC
 
-- clean up
DROP EVENT SESSION query_writes ON SERVER;
DROP TABLE #TLOGS
DROP TABLE #SessionData

Sample Results

Here’s an example of what the results would look like. It’s an aggregated view of all transaction log activity in a database for 10 seconds.
Example Results

Notes

  • Notice that the session is database specific. That’s because transaction logs are database specific. To help focus on the right database, use the “Log Bytes Flushed/sec” performance counter found in the “SQL Server:Databases” object.
  • Also notice that I’m tracking ObjectIds. That’s because we use procedures quite heavily. You may want to adapt this code to use query_hash instead. In both cases, collecting the statement text is not recommended.
  • The sample of data is limited by the size of the extended events target file or the duration variable, whichever is smaller.
  • @sqL_handLe pointed out to me that reading the log using fn_dblog will prevent the transaction log from truncating. Reading from the transaction log can be very tricky to do efficiently. Luckily we can use the sp_replincrementlsn trick to get LSN parameter values for fn_dblog.

January 24, 2016

Two Scripts That Help You Monitor Deadlocks with Extended Events

I want to use extended events to store recent deadlock graphs. And I want to create an asynchronous file target instead of using the existing system_health session. I don’t like the system_health target for a couple reasons. It’s too slow to query and it rolls over too quickly and it disappears after server restarts.

So I searched the web for a solution and when I couldn’t find one, I wrote my own solution, I tested it and I decided to blog about it.

Guess what? Apparently I “reinvented the wheel”. The extended events session I created is equivalent to one that Jeremiah Peschka wrote two years ago in Finding Blocked Processes and Deadlocks using SQL Server Extended Events. The embarrassing thing is that in Jeremiah’s article, he references a tool I wrote. And the first comment was written by yours truly.

So go read Jeremiah’s article, it’s really well written. What follows is my solution. The only difference is that mine only focuses on deadlocks. Jeremiah’s focuses on both deadlocks and blocked processes.

Create The Session

Here’s the session that I use. It

  • has five rollover files so that a couple server restarts don’t lose any recent deadlock graphs
  • uses an asynchronous_file_target which I prefer over the ring buffer,
  • and it cleans itself up over time. I don’t need a maintenance job to remove ancient data
DECLARE @ExtendedEventsTargetPath sysname = 'Change this string to something like "D:\XEvents\Traces"';
 
DECLARE @SQL nvarchar(max) = N'
CREATE EVENT SESSION [capture_deadlocks] ON SERVER 
ADD EVENT sqlserver.xml_deadlock_report( ACTION(sqlserver.database_name) ) 
ADD TARGET package0.asynchronous_file_target(
  SET filename = ''' + @ExtendedEventsTargetPath + N'\capture_deadlocks.xel'',
      max_file_size = 10,
      max_rollover_files = 5)
WITH (
    STARTUP_STATE=ON,
    EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY=15 SECONDS,
    TRACK_CAUSALITY=OFF
    )
 
ALTER EVENT SESSION [capture_deadlocks] ON SERVER
    STATE=START';
 
exec sp_executesql @SQL;

Query the Results

Oh great. Now I’ve got to dig through several files. That’s a lot of work.
… but not if you have this query:

declare @filenamePattern sysname;
 
SELECT @filenamePattern = REPLACE( CAST(field.value AS sysname), '.xel', '*xel' )
FROM sys.server_event_sessions AS [session]
JOIN sys.server_event_session_targets AS [target]
  ON [session].event_session_id = [target].event_session_id
JOIN sys.server_event_session_fields AS field 
  ON field.event_session_id = [target].event_session_id
  AND field.object_id = [target].target_id	
WHERE
    field.name = 'filename'
    and [session].name= N'capture_deadlocks'
 
SELECT deadlockData.*
FROM sys.fn_xe_file_target_read_file ( @filenamePattern, null, null, null) 
    as event_file_value
CROSS APPLY ( SELECT CAST(event_file_value.[event_data] as xml) ) 
    as event_file_value_xml ([xml])
CROSS APPLY (
    SELECT 
        event_file_value_xml.[xml].value('(event/@name)[1]', 'varchar(100)') as eventName,
        event_file_value_xml.[xml].value('(event/@timestamp)[1]', 'datetime') as eventDate,
        event_file_value_xml.[xml].query('//event/data/value/deadlock') as deadlock	
  ) as deadlockData
WHERE deadlockData.eventName = 'xml_deadlock_report'
ORDER BY eventDate DESC

October 6, 2015

Don’t Abandon Your Transactions

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:59 am

DGQuote
About eight years ago, Dan Guzman wrote a post called Use Caution with Explicit Transactions in Stored Procedures. In it, he talks about error handling and transactions, specifically with respect to the XACT_ABORT setting.

XACT_ABORT

Microsoft’s docs for XACT_ABORT are pretty clear. The setting determines whether “SQL Server automatically rolls back the current transaction when a statement raises an error”.

And in nearly every scenario I can think of that uses a transaction, this automatic rollback is the desired behavior. The problem is that it’s not the default behavior. And this leads to Dan Guzman’s advice where he strongly recommends that SET XACT_ABORT ON be included “in all stored procedures with explicit transactions unless you have a specific reason to do otherwise.”

What Could Go Wrong?

When a statement inside a transaction fails (for whatever reason) and XACT_ABORT is set to off, then…

  • That transaction is abandoned.
  • Any locks taken during that transaction are still held.
  • Even if you close the connection from the application, .NET’s connection pooling will keep that connection alive and the transaction on SQL Server stays open.
  • Fortunately, if someone reuses the same database connection from the connection pool, the old transaction will be rolled back.
  • Unfortunately developers can’t count on that happening immediately.
  • Abandoned transactions can cause excessive blocking leading to a concurrency traffic jam.
  • Also, abandoned transactions can interfere with downstream solutions. Specifically ones that depend on the transaction log. Transaction logs can grow indefinitely. Replication solutions can suffer. If RCSI is enabled, the version store can get out of hand.

Some (or all) of those things happened to us last week.

Steps To Take

Here are some things you can do:

Do you have abandoned transactions right now?
It’s not too hard to identify these abandoned transactions:

-- do you have abandoned transactions?
select p.spid, s.text as last_sql
from sys.sysprocesses p
cross apply sys.dm_exec_sql_text(p.sql_handle) s
where p.status = 'sleeping'
and p.open_tran > 0

Also if you use sp_whoisactive, you can identify these processes as those with a sleeping status and at least one open transaction. But there’s a trick I use to identify these quickly. The sql_text value in the output of sp_whoisactive will typically begin with CREATE PROCEDURE. When I see that, I know it’s time to check whether this connection is sleeping or not.

SET XACT_ABORT ON
Follow Dan Guzman’s advice to include SET XACT_ABORT ON in all stored procedures with explicit transactions.
You can actually find the procedures in your database that need a closer look

-- find procedures that could suffer from abandoned transactions
SELECT * 
FROM sys.procedures 
where OBJECT_DEFINITION(object_id) like '%BEGIN TRAN%'
and OBJECT_DEFINITION(object_id) not like '%XACT_ABORT%'
order by name

Set XACT_ABORT ON server-wide
If you choose, you can decide to set the default value for all connections to your server. You can do that using Management Studio:
ServerProperties

Or via a script:

-- turn the server's xact_abort default on
declare @user_options_value bigint;
select @user_options_value = cast(value as bigint)
from sys.configurations 
where name = 'user options';
set @user_options_value = @user_options_value | 0x4000; 
exec sp_configure N'user options', @user_options_value;
RECONFIGURE WITH OVERRIDE;
 
-- (if necessary) turn the server's xact_abort default off
declare @user_options_value bigint;
select @user_options_value = cast(value as bigint)
from sys.configurations 
where name = 'user options';
set @user_options_value = @user_options_value &amp; 0x3fff; 
exec sp_configure N'user options', @user_options_value;
RECONFIGURE WITH OVERRIDE;

Code Review

I love code reviews. They’re more than just a tool for improving quality. They’re learning opportunities and teaching opportunities for all involved.

Last week, I invited readers to have a look at a procedure in a post called Code Review This Procedure. I was looking for anyone to suggest turning on XACT_ABORT as a best practice. It’s a best practice where I work, but things like this slip through. We should have caught this not just during testing, but during development. It’s obvious with hindsight. But I wanted to determine how obvious it was without that hindsight. I guess it was pretty subtle, the XACT_ABORT was not mentioned once. That’s either because the setting is not often used by most developers, or because it is easily overlooked.

But here are some other thoughts that readers had:

Concurrency
Many people pointed at concurrency and transaction isolation levels as a problem. It turns out that concurrency is very hard to do right and nearly impossible to verify by inspection. In fact one of my favorite blog posts is about getting concurrency right. It’s called Mythbusting: Concurrent Update/Insert Solutions. The lesson here is just try it.

Cody Konior (blog) submitted my favorite comment. Cody writes “I often can’t disentangle what the actual impact of various isolation levels would be so I go a different route; which is to create a quick and dirty load test”. I can’t determine concurrency solely by inspection either, which is why I never try. Cody determined that after hammering this procedure, it never failed.

He’s entirely right. Concurrency is done correctly here. Ironically, most of the fixes suggested in other people’s code reviews actually introduced concurrency issues like deadlocks or primary key violations.

People also suggested that blocking would become excessive. It turns out that throughput does not suffer either. My testing framework still managed to process 25,000 batches per second on my desktop without error.

Validating inputs
Some people pointed out that if NULL values or other incorrect values were passed in, then a foreign key violation could be thrown. And they suggested that the procedure should validate the inputs. But what then? If there’s a problem, then there are two choices. Choice one, raise no error and exit quietly which is not ideal. Or choice 2, raise a new error which is not a significant improvement over the existing implementation.

Avoiding the transaction altogether
It is possible to rewrite this procedure without using an explicit transaction. Without the explicit transaction, there’s no chance of abandoning it. And no chance of encountering the trouble that goes with abandoned transactions. But it’s still necessary to worry about concurrency. Solutions that use single statements like MERGE or INSERT...WHERE NOT EXISTS still need SERIALIZABLE and UPDLOCK.

Error handling
I think Aaron Mathison (blog) nailed it: I’m just going to quote his review entirely:

Since your EVENT_TICKETS table has required foreign keys (evidenced by NOT NULL on all columns with foreign key references) the proc should be validating that the input parameter values exist in the foreign key tables before trying to insert into EVENT_TICKETS. If it doesn’t find any one of them it should throw an error and gracefully rollback the transaction and return from the proc.

The way it’s designed currently I think you could get an error on inserting to EVENT_TICKETS that would fail the proc and leave the transaction open.

April 6, 2015

Finding Scalar Aggregate Indexed Views in Your Database

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:35 am

Fellow MVP Paul White recently blogged about a SQL Server bug. His blog post is called An Indexed View Bug with Scalar Aggregates

It’s a really well-written article (as always). After reading it, it’s natural to wonder whether your databases have any such indexed views?

Chances are that you don’t have any. These indexed views aren’t too common, but if you do have indexed views that use scalar aggregates, here’s a query that can help you find them.
The following query finds indexed views without GROUP BY that have exactly one row in any partition.

with IndexedViewIds as
(
  SELECT [object_id] from sys.indexes
  INTERSECT
  SELECT [object_id] from sys.views
), 
IndexedViewInfo as 
(
  SELECT 
    [object_id],
    OBJECT_SCHEMA_NAME([object_id]) as SchemaName,
    OBJECT_NAME([object_id]) as ViewName,
    OBJECT_DEFINITION([object_id]) as [Definition]
  FROM IndexedViewIds
)
SELECT 
    v.[object_id],
    v.SchemaName,
    v.ViewName,
    v.[Definition]
FROM IndexedViewInfo v
WHERE NOT EXISTS
  (
    SELECT * 
    FROM sys.partitions
    WHERE [object_id] = v.[object_id]
    AND [rows] <> 1
  )
  AND v.[definition] NOT LIKE '%GROUP BY%'

Notes

The query isn’t perfect. It’s actually possible (but rare) to get false positives here. This query doesn’t look for any aggregate keywords. So look for them in the SELECT list. Also make sure no GROUP BY clause exists.

It’s also possible (but rare) to miss some indexed views when GROUP BY gets mentioned, but not used. For example, if an indexed view definition contains the phrase GROUP BY in a comment, it won’t show up in this list.

(For my curious co-workers, none of our indexed views use scalar aggregates)

April 2, 2015

Look at Blocking By Index

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:33 am

This post is for me. It’s a script I find useful so I’m putting it in a place where I know to go look for it, my blog. You may find it useful too.

The script below extends the DMV sys.dm_db_index_operational_stats by focusing on lock waits and supplying index and table names. If you want to know about blocking by index, these queries can help.

If you want something more comprehensive, I’d suggest Kendra Little’s http://www.brentozar.com/blitzindex/

Blocking Wait Stats

-- Get index blocking wait stats
select 
  t.name as tableName, 
  i.name as indexName, 
  ios.row_lock_wait_count, 
  ios.row_lock_wait_in_ms, 
  ios.page_lock_wait_count,
  ios.page_lock_wait_in_ms
from sys.dm_db_index_operational_stats(db_id(), null, null, null) ios
join sys.indexes i
  on i.object_id = ios.object_id
  and i.index_id = ios.index_id
join sys.tables t
  on ios.object_id = t.object_id
where ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms > 0
order by ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms desc

Create Snapshot Of Stats

begin try
    drop table #IndexBlockingWaitStats
end try
begin catch
-- swallow error
end catch
 
select 
  [object_id], 
  index_id, 
  row_lock_wait_count, 
  row_lock_wait_in_ms, 
  page_lock_wait_count,
  page_lock_wait_in_ms
into #IndexBlockingWaitStats
from sys.dm_db_index_operational_stats(db_id(), null, null, null)

Get Waits Since Last Snapshot

-- Get delta results
select 
  t.name as tableName, 
  i.name as indexName, 
  ios.row_lock_wait_count  - iossnapshot.row_lock_wait_count    as row_lock_wait_count, 
  ios.row_lock_wait_in_ms  - iossnapshot.row_lock_wait_in_ms    as row_lock_wait_in_ms, 
  ios.page_lock_wait_count - iossnapshot.page_lock_wait_count   as page_lock_wait_count,
  ios.page_lock_wait_in_ms -  iossnapshot.page_lock_wait_in_ms  as page_lock_wait_in_ms
from sys.dm_db_index_operational_stats(db_id(), null, null, null) ios
join #IndexBlockingWaitStats iossnapshot
  on iossnapshot.[object_id] = ios.[object_id]
  and iossnapshot.index_id = ios.index_id
join sys.indexes i
  on i.[object_id] = ios.[object_id]
  and i.index_id = ios.index_id
join sys.tables t
  on ios.[object_id] = t.[object_id]
cross apply ( 
  select 
  ( ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms ) -
  ( iossnapshot.row_lock_wait_in_ms + iossnapshot.page_lock_wait_in_ms )
) as calc(totalwaittime)
where totalwaittime > 0
order by totalwaittime desc

Notes

  • There are many kinds of lock waits, this script focuses on waits on pages or rows. Other kinds of waits not shown here include objects (i.e. locks on tables), latches and IO latches.
  • This is only one small focused tool in a troubleshooting tool belt. Don’t depend on it too much
  • If you’re keen, you’ll notice I didn’t give info on schemas or on partitions, sounds like a fun exercise doesn’t it?
  • No illustration? Nope, or at least not yet. If I continue to find this script useful, then I plan on adding an illustration, because I use Browse By Illustration as my main navigation tool

October 3, 2014

Watch Out for Misleading Behaviour From SQL Server

Takeaway: To get consistent behaviour from SQL Server, I share a set of statements I like to run when performing tuning experiments.

Inconsistent Behaviour From SQL Server?

I often have conversations where a colleague wants to understand why SQL Server performs faster in some cases and slower in other cases.

The conversation usually starts “Why does SQL Server perform faster when I…” (fill in the blank):

  1. … changed the join order of the query
  2. … added a transaction
  3. … updated statistics
  4. … added a comment
  5. … crossed my fingers
  6. … simply ran it again

What’s Going On?

It can actually seem like SQL Server performs differently based on its mood. Here are some reasons that can affect the duration of queries like the ones above

  • You changed something insignificant in the query. What you may be doing is comparing the performance of a cached plan with a newly compiled plan. Examples 1 – 4 might fall under this scenario. If that’s the case, then you took a broken thing and gave it a good thump. This percussive maintenance may be good for broken jukeboxes, but maybe not for SQL Server.
  • What about those last two? Say you hit F5 to execute a query in Management Studio, and wait a minute for your results. You immediately hit F5 again and watched the same query take fifteen seconds. Then I like to point out that maybe all that data is cached in memory.

In order to do tune queries effectively, we need consistent behaviour from SQL Server, if only to test theories and be able to rely on the results. SQL Server doesn’t seem to want to give us consistent behaviour…

So Is It Possible To Get Straight Answers?

Best line from all Star Wars

But maybe we can get straight answers from SQL Server. Here’s a test framework that I like to use before all experiments when I want consistent behaviour:

-- Only do this on dev sql servers!
CHECKPOINT 
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SET STATISTICS IO, TIME ON
-- Ctrl+M in Management Studio to include actual query plan

The first two statements are meant to clear SQL Server’s cache of data. Because of write ahead logging, SQL Server will write data changes to disk immediately, but may take its time writing data changes to disk. Executing CHECKPOINT makes SQL Server do that immediately. After the checkpoint there should be no dirty buffers. That’s why DBCC DROPCLEANBUFFERS will succeed in dropping all data from memory.

The DBCC FREEPROCCACHE command will remove all cached query plans.

These commands give SQL Server a fresh starting point. It makes it easier to compare behaviour of one query with the behaviour of another.

The SET STATISTICS IO, TIME ON and the Ctrl+M are there in order to retrieve better information about the performance of the query. Often CPU time, Logical IO, and the actual query plan are more useful when tuning queries than elapsed time.

Older Posts »

Powered by WordPress