Michael J. Swart

April 27, 2016

You Can’t Force Query Plans If They Use TVPs With PKs

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:24 pm

Have you ever played “Fortunately/Unfortunately”? It’s a game where players alternately give good news and bad news. It goes something like this:

Databases such as SQL Server make it easy to retrieve sets of data.
Unfortunately, it’s kind of awkward to send sets of data to SQL Server.
Fortunately, table-valued parameters (TVPs) make this easier.
Unfortunately, queries that use TVPs often suffer from non-optimal query plans.
Fortunately, creating primary key or unique key constraints gives us a way to index table types.
Unfortunately, those constraints prevent any kind of plan forcing.
Fortunately, SQL Server 2014 lets us create named indexes for table types which lets us force plans if we need to.

otherhand

Let’s break this down:

Sending Sets of Data to SQL Server is Awkward

It always has been. Originally, developers were forced to send a CSV string to SQL Server and write a do-it-yourself function to split the string into a set of values.

  • In 2005, Microsoft introduced XML and CLR which let developers shred or split strings in new ways,
  • In 2008, Microsoft introduced table-valued parameters,
  • In 2014, they introduced In-Memory TVPs,
  • In 2016, there’s a new SPLIT_STRING() function

So there are more options now then there ever have been and they each have their own issues.

Aaron Bertrand explores some of those performance issues in STRING_SPLIT() in SQL Server 2016. It’s a specific use-case where he focuses on duration. In our case, we focus on aggregated system load like worker time or reads so we don’t necessarily value parallel query plans. But I love his methods. He gives us tools that let us evaluate our own situation based on our own criteria.

I’m going to focus on TVPs which is the most natural method of sending sets of data to SQL Server from a syntax point of view.

Indexes on Table Types

Table-valued parameters are implemented using table types. Before SQL Server 2014, the only way to index a table type was to define a primary key or a unique key on it like this:

create type dbo.TypeWithPK 
    as table ( id int primary key );

The syntax for CREATE TYPEprevents us from naming our primary key and this turns out to be important. Every time I define and use a table variable, SQL Server will dynamically generate a name for the primary key. So when I look at the plan for

declare @ids dbo.TypeWithPK;
select * from @ids

I see that it has a primary key named [@ids].[PK__#A079849__3213E83FDB6D7A43]:
PKName

As I’ll show later, this dynamically generated name prevents any kind of query plan forcing. But as of SQL Server 2014, we can include indexes in our table type definitions. More importantly, we can name those indexes:

create type dbo.TypeWithIndex 
    as table ( id int index IX_TypeWithIndex );
go
declare @ids dbo.TypeWithIndex;
select * from @ids;

This has a primary key named [@ids].[IX_TypeWithIndex] which is what we expect.

Plan Forcing is Not Allowed For TVPs with PKs

Where does plan forcing fit in your tool belt? For me, I’ve never used plan forcing as a permanent solution to a problem, but when I see a query that often suffers from suboptimal query plan choices, I look to plan guides to give me some stability while I work at fixing and deploying a permanent solution.

Plan forcing in SQL Server involves specifying a plan for a particular query. But the primary key name for a table variable is always different so the specified query plan is never going to match. In other words SQL Server is never going to use your query plan because your plan includes index [@ids].[PK__#A079849__3213E83FDB6D7A43], but the query it’s compiling has a differently named index like [@ids].[PK__#AA02EED__3213E83FAF123E51].

If you try, this is what that failure looks like:

USE PLAN
If you try to use the USE PLAN query hint, you’ll get error 8712:

Msg 8712, Level 16, State 0, Line 15
Index '@ids.PK__#B305046__3213E83F57A32F24', specified in the USE PLAN hint, does not exist. Specify an existing index, or create an index with the specified name.

Plan Guides
If you try to force the plan by creating a plan guide, you’ll also see message 8712:

select 
from sys.plan_guides
cross apply fn_validate_plan_guide(plan_guide_id)
-- Index '@ids.PK__#BA711C0__3213E83F44A3F2C8', specified in the USE PLAN hint, does not exist. Specify an existing index, or create an index with the specified name.

Query Store
And if you try to force a plan using SQL Server 2016’s Query Store, you’ll see this:

select plan_id, last_force_failure_reason_desc
from sys.query_store_plan
where is_forced_plan = 1
-- last_force_failure_reason_desc = 'NO_INDEX'

Summary

When defining table variables, avoid primary key or unique key constraints. Opt instead for named indexes if you’re using SQL Server 2014 or later. Otherwise, be aware that plan forcing is limited to queries that don’t use these table variables.

April 20, 2016

Are You Programming In The Database?

Typically, T-SQL is not the best platform for programming (understatement). If you have many procedures that call other procedures, that’s a signal that you might be programming in the database.

Find out using this query:

select 
    OBJECT_SCHEMA_NAME(p.object_id) as schemaName, 
    OBJECT_NAME(p.object_id) as procedureName,
    count(*) as [calls to other procedures]	
from sys.procedures p
cross apply sys.dm_sql_referenced_entities(schema_name(p.schema_id) + '.' + p.name, 'OBJECT') re
where re.referenced_entity_name in (select name from sys.procedures)
group by p.object_id
order by count(*) desc;

in Adventureworks, we see this result:
ProcsCallingProcs1

To drill down into those results, use this query:

select distinct
    QUOTENAME(OBJECT_SCHEMA_NAME(p.object_id)) + '.' 
        + QUOTENAME(OBJECT_NAME(p.object_id)) [This procedure...], 
    QUOTENAME(OBJECT_SCHEMA_NAME(p_ref.object_id)) + '.' 
        + QUOTENAME(OBJECT_NAME(p_ref.object_id)) [... calls this procedure]
from sys.procedures p
cross apply sys.dm_sql_referenced_entities(schema_name(p.schema_id) + '.' + p.name, 'OBJECT') re
join sys.procedures p_ref
	on re.referenced_entity_name = p_ref.name
order by 1,2

which gives results like this:
ProcsCallingProcs2

Adventureworks seems just fine to me. Only four instances of procedures calling procedures. I looked at the database I work with most. Hundreds of procedures (representing 15% of the procedures) call other procedures. On the other end of the spectrum is Stackoverflow. I understand that they don’t use stored procedures at all.

Field and Record vs. Column and Table

Filed under: SQLServerPedia Syndication,Tongue In Cheek — Michael J. Swart @ 8:50 am

Erik Darling wrote me last week with this idea:

There’s a sort of recurring company chat joke about people who flip out over rows and columns vs. fields and records, etc. And of course, there’s this outdoorsman magazine called Field & Stream.

And he suggested a parody of Field & Stream called Field & Record. That led directly to this:

FieldAndRecord

Thanks to the team at Brent Ozar Unlimited for the suggestion and the article ideas. You guys are hilarious!

My Two Cents on the Debate

I’m definitely a descriptivist. Language is always changing and if a word or phrase gets adopted widely enough, it is no longer “wrong” (whatever that means).

So when I hear “Field” and “Record” they’re acceptable to me. But if I’m explaining something, I don’t want to distract from the thing I’m saying. And from that point of view, I try to use “Row” and “Column” because I don’t know anyone who blinks at those terms. In other words

  • When speaking, I use “row” and “column”
  • When listening, I do not correct “field” and “record”.

This also means I never use the word “whom” which is a word that has the strange quality of being distracting and correct.

Exceptions

I can think of a couple exceptions

I dare someone to tell me I used the word ironically wrong.

April 11, 2016

Tackle WRITELOG Waits Using the Transaction Log and Extended Events

Takeaway: WRITELOG waits are associated with a busy or slow transaction log. To tackle these waits, we need to measure transaction log activity. I describe a lightweight way to examine transaction log usage for busy OLTP systems.

Tackle WRITELOG

Start with Microsoft’s Advice: I’m not going to introduce the topic of transaction log performance. Microsoft’s SQL Customer Advisory Team already provides a great introduction with Diagnosing Transaction Log Performance Issues and Limits of the Log Manager. Their advice includes watching the “Log Bytes Flushed/sec” performance counter found in the “SQL Server:Databases” object.

Reactive Efforts: If you’re curious about transaction log activity for open transactions, Paul Randal has a script at Script: open transactions with text and plans.

Spiky Activity: It’s not too difficult to find infrequent activities that write a lot of data to the transaction log; activities like data warehouse ETLs, or index rebuilds. Use a trace or extended events to look for statements with large values for “writes”.

Scalability of OLTP Workloads

WRITELOG waits are a scalability challenge for OLTP workloads under load. Chris Adkin has a lot of experience tuning SQL Server for high-volume OLTP workloads. So I’m going to follow his advice when he writes we should minimize the amount logging generated. And because I can’t improve something if I can’t measure it, I wonder what’s generating the most logging? OLTP workloads are characterized by frequent tiny transactions so I want to measure that activity without filters, but I want to have as little impact to the system as I can. That’s my challenge.

Getting #SQLHelp

So I asked twitter. And I got some great advice from Erin Stellato:
sqlhelp
Erin also pointed out that the UI warns you that it’s a very high volume event.

Combining fn_dblog With Extended Events

So to avoid that kind of volume, I got the idea to read straight from the transaction log and combine that with a lighter extended events session to get the SQL text. The transaction_id captured by the extended events session corresponds to the XAct ID column in fn_dblog.

Here’s how that went:

The Script
The details for this script are kind of fussy, but it all comes together in a solution that won’t drag a server down. Care is still recommended; start with 10 seconds and go from there.

declare @Duration varchar(10) = '00:00:10';
declare @FileSize varchar(10) = '5'; -- in megabytes
 
-- create session
DECLARE @CreateSessionSQL nvarchar(max) = N'
    CREATE EVENT SESSION query_writes ON SERVER 
    ADD EVENT sqlserver.sp_statement_completed ( 
        SET collect_statement=(0)
        ACTION(sqlserver.transaction_id, sqlserver.database_name)
        WHERE sqlserver.transaction_id > 0
          AND sqlserver.database_name = ''' + DB_NAME() + N''')
    ADD TARGET package0.asynchronous_file_target(
      SET filename = N''query_writes.xel'',
          max_file_size = ' + @FileSize + N',
          max_rollover_files = 1)
    WITH (
        STARTUP_STATE=ON,
        EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
        TRACK_CAUSALITY=OFF)';
exec sp_executesql @CreateSessionSQL;
 
ALTER EVENT SESSION query_writes ON SERVER
    STATE=START;
 
-- get the latest lsn for current DB
declare @xact_seqno binary(10);
declare @xact_seqno_string_begin varchar(50);
exec sp_replincrementlsn @xact_seqno OUTPUT;
set @xact_seqno_string_begin = '0x' + CONVERT(varchar(50), @xact_seqno, 2);
set @xact_seqno_string_begin = stuff(@xact_seqno_string_begin, 11, 0, ':')
set @xact_seqno_string_begin = stuff(@xact_seqno_string_begin, 20, 0, ':');
 
-- wait a minute
waitfor delay @Duration;
 
-- get the latest lsn for current DB
declare @xact_seqno_string_end varchar(50);
exec sp_replincrementlsn @xact_seqno OUTPUT;
set @xact_seqno_string_end = '0x' + CONVERT(varchar(50), @xact_seqno, 2);
set @xact_seqno_string_end = stuff(@xact_seqno_string_end, 11, 0, ':')
set @xact_seqno_string_end = stuff(@xact_seqno_string_end, 20, 0, ':');
 
-- Stop the session
ALTER EVENT SESSION query_writes ON SERVER
    STATE=STOP;
 
-- read from transaction log
select 
    max([Xact ID]) as transactionId,
    max([Transaction Name]) as transactionName, 
    sum([Log Record Length]) as logSize,
    count(*) as [logRowCount]
into #TLOGS
from fn_dblog(@xact_seqno_string_begin, @xact_seqno_string_end) f
group by [Transaction Id]
 
-- read from session data
CREATE TABLE #SessionData (
    id int identity primary key,
    XEXml xml NOT NULL    
)
 
INSERT #SessionData(XEXml)
SELECT CAST(fileData.[event_data] as xml)
FROM sys.fn_xe_file_target_read_file ( 'query_writes*xel', null, null, null) fileData;
 
-- find minimum transactionId captured by xes 
-- (almost always the first one, depending on luck here)
declare @minTXFromSession bigint;
select TOP (1) @minTXFromSession = S.XEXml.value(
    '(/event/action[(@name=''transaction_id'')]/value)[1]', 'bigint')
from #SessionData S;
 
WITH SD AS
(
    SELECT 
        S.XEXml.value(
            '(/event/action[(@name=''transaction_id'')]/value)[1]', 'bigint') as transactionId,
        S.XEXml.value(
            '(/event/data[(@name=''object_id'')]/value)[1]', 'bigint') as objectId
    FROM #SessionData S
)
SELECT 
    ISNULL(T.transactionName, 'Unknown') as transactionTypeName, 
    OBJECT_NAME(S.objectid) as ObjectName,
    SUM(T.logsize) as totalLogSizeBytes,
    SUM(T.logRowCount) as totalLogRowCount,
    COUNT(*) as executions
FROM #TLOGS T
LEFT JOIN (SELECT DISTINCT * FROM SD) S
    ON T.transactionId = S.transactionId
WHERE T.transactionId >= @minTXFromSession
GROUP BY T.transactionName, S.objectId
ORDER BY SUM(T.logsize) DESC
 
-- clean up
DROP EVENT SESSION query_writes ON SERVER;
DROP TABLE #TLOGS
DROP TABLE #SessionData

Sample Results

Here’s an example of what the results would look like. It’s an aggregated view of all transaction log activity in a database for 10 seconds.
Example Results

Notes

  • Notice that the session is database specific. That’s because transaction logs are database specific. To help focus on the right database, use the “Log Bytes Flushed/sec” performance counter found in the “SQL Server:Databases” object.
  • Also notice that I’m tracking ObjectIds. That’s because we use procedures quite heavily. You may want to adapt this code to use query_hash instead. In both cases, collecting the statement text is not recommended.
  • The sample of data is limited by the size of the extended events target file or the duration variable, whichever is smaller.
  • @sqL_handLe pointed out to me that reading the log using fn_dblog will prevent the transaction log from truncating. Reading from the transaction log can be very tricky to do efficiently. Luckily we can use the sp_replincrementlsn trick to get LSN parameter values for fn_dblog.

April 1, 2016

Microsoft Dropped the Cover Charge on SQL Server

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 8:00 am

Takeaway: SQL Server Developer Edition is now free. This makes it very easy for newcomers to learn about SQL Server.

Some Skills Are Hard to Teach

Working with SQL Server and teaching SQL Server are very, very different skills. I sometimes get asked to teach others about how to tackle SQL Server problems in a teach-someone-to-fish kind of way. But I find it very difficult and I often don’t know what to say. What works for me may not work for others. For example, none of these activities are easy:

  • Develop a strong curiosity about SQL Server
  • Practice
  • Read as much as you can about SQL Server
  • More Practice
  • Find yourself in high-pressure situations where you have to tackle a difficult technical problem. Then when you give up, find yourself still facing the problem which hasn’t gone away.

… and iterate.

It’s challenging to fit those lessons into a session.

But Everyone Loves Free SQL Server Resources

So I’ve discovered that what is helpful are all the free resources available to me. And giving people a list of free resources is always well-received. To stretch the metaphor, maybe I can’t teach someone to fish, but here’s a free fishing rod.

For example, This list is more constructive and helpful than the last list:

  • Free tools like sp_whoisactive and sql sentry plan explorer
  • Free events like SQL Saturdays or local user group meetings
  • Free forums like Stackoverflow and Twitter

With those resources, it’s pretty easy to get started with SQL Server. In other words, there’s a very small barrier to entry to the world of SQL Server.

One More Free Resource

Well yesterday, Microsoft just made it even easier. They just got rid of the cover charge for using SQL Server. SQL Server (Developer Edition) is now free.

velvet

Microsoft made the announcement on their blog.

All editions have had a free trial period which allowed people to evaluate SQL Server for a limited time. This announcement removes even that restriction. If you have a computer and an internet connection, you can get started today by joining Visual Studio Dev Essentials.

March 23, 2016

Microsoft Fixed My Biggest SQL Server Pet Peeve (Two Years Ago)

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

Takeaway: The best way to avoid tempdb GAM and PFS contention caused by table-valued parameters (TVPs) is to use Memory-Optimized Table Variables.

This is my last post about tempdb. You can’t believe how wonderful it is to type that.*

explodedTempdb

SQL Server 2014 can eradicate tempdb contention and I didn’t even know it. Just yesterday I was saying that if I could have my one SQL Server wish, it would be to use table-valued parameters at high frequency without suffering from tempdb latch contention on GAM or PFS.

Then, I saw Jos de Bruijn’s article Improving temp table and table variable performance using memory optimization. He pointed out how In-Memory OLTP can alleviate excessive tempdb use. The piece that I was most interested in involves SQL Server 2014’s Memory-Optimized Table Parameters. Jos describes them as a great solution for “workloads that intensively use TVPs”.

So now in addition to database with types like this:

CREATE TYPE dbo.BIGINTSET AS TABLE
    ([Value] BIGINT NOT NULL INDEX IX_BIGINTSET);

I can add an additional memory optimized type like this:

CREATE TYPE dbo.BIGINTSET_OLTP AS TABLE
    ([Value] BIGINT NOT NULL INDEX IX_BIGINTSET_OLTP)
    WITH (MEMORY_OPTIMIZED=ON);

Any procedures that use the new table type BIGINTSET_OLTP will not touch tempdb.

It seems pretty easy doesn’t it? In this example, the reason I don’t just replace the existing type is because I want to be very deliberate about choosing memory optimized parameters. By adding a new table type, I make that choice in each procedure or query on a case by case basis.

To see whether this technique avoids hitting tempdb as advertised, I wrote this demo.

A Complete Demo

use master
GO
 
IF ( DB_ID('MyTVPTest') IS NOT NULL )
BEGIN
    ALTER DATABASE MyTVPTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE MyTVPTest;
END
GO
 
CREATE DATABASE MyTVPTest;
 
ALTER DATABASE MyTVPTest 
    ADD FILEGROUP MyTVPTest_MOD CONTAINS MEMORY_OPTIMIZED_DATA;
 
ALTER DATABASE MyTVPTest 
    ADD FILE (name='MyTVPTest_MOD1', filename='c:\sqldata\MyTVPTest_MOD1') 
    TO FILEGROUP MyTVPTest_MOD;
GO
 
use MyTVPTest;
GO
 
CREATE TYPE dbo.MyMOIntSet AS TABLE
    ([Value] INT NOT NULL INDEX IX_MyMOIntSet)
    WITH (MEMORY_OPTIMIZED=ON);
 
CREATE TYPE dbo.MyIntSet AS TABLE
    ([Value] INT NOT NULL INDEX IX_MyIntSet);
GO
 
CREATE TABLE dbo.MyTable
(
    id int not null primary key
);
 
INSERT dbo.MyTable (id)
SELECT top 1000 ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM sys.messages;
GO
 
CREATE PROCEDURE dbo.s_FetchMyTableRowsWithFilter
    @Filter MyIntSet READONLY
AS
 
    SELECT MT.id
    FROM dbo.MyTable MT
    JOIN @Filter FT 
        ON FT.[Value] = MT.id;
GO
 
CREATE PROCEDURE dbo.s_FetchMyTableRowsWithMOFilter
    @Filter MyMOIntSet READONLY
AS
 
    SELECT MT.id
    FROM dbo.MyTable MT
    JOIN @Filter FT 
        ON FT.[Value] = MT.id;
GO
 
-- Call procedure using Memory Optimized table type
DECLARE @F_MO MyMOIntSet;
INSERT @F_MO ([Value]) VALUES (1), (3), (5);
EXEC dbo.s_FetchMyTableRowsWithMOFilter @F_MO;
 
-- Call procedure using regular table type
DECLARE @F MyIntSet;
INSERT @F ([Value]) VALUES (1), (3), (5);
EXEC dbo.s_FetchMyTableRowsWithFilter @F;

Look at the last two queries. The first one invokes a procedure that uses a memory-optimized table variable. The other one invokes a different procedure that uses a standard table variable. Then look at what’s going on in tempdb’s transaction log (using techniques found here). With the standard table variable I see this sort of activity in tempdb’s log:
tempdbActivity

But I see that memory-optimized queries avoids all tempdb activity. It’s not just the logging activity that’s avoided, all tempdb activity is avoided. The tempdb is simply not touched in this case.

Caveats

Notice a few things:

  1. This mainly addresses tempdb contention experienced by workloads with extremely frequent TVP use (thousands per second).
  2. In SQL Server 2014, memory-optimized table variables precludes the use of parallel queries.
  3. As the feature name suggests, the temp tables live in memory.

Luckily the high frequency described by the first caveat means that the queries I use must be as lightweight as possible. And that means that I don’t mind the second or third caveat so much. Aaron Bertrand dives deeper on the performance of Memory-Optimized table variables at Hekaton with a twist: In-memory TVPs – Part 3.

In the past, my colleagues and I have been incredibly frustrated at this bottleneck. A bottleneck that could not be fixed, not even with dollars. And so I’m looking forward to this feature more than any other single feature introduced in 2014 or 2016. We vendors are often constrained to use only those features present in every SQL Server version we support. In my case, it will be a while before we de-support SQL Server 2012 but that day will be like Christmas.

* I’ve written before about TVPs and the different kinds of contention caused by frequent use of TVPs:

Table Valued Parameters, A Short, Complete Example
PAGELATCH_EX Contention on 2:1:103
Follow up on Ad Hoc TVP contention
Three More Tricky Tempdb Lessons
What’s Going On Inside Tempdb?
Avoid Frequent use of TVPs With Wide Rows
Troubleshooting Tempdb, a Case Study

February 25, 2016

Look at Blocked Process Reports Collected With Extended Events

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:25 am

SQL Server Concurrency

I just met a friend at a SQL Saturday who let me know that he recognizes my name because it was attached to a project I wrote five years ago. The “Blocked Process Report Viewer”. I was impressed. I’m glad to know that it’s still used. So I decided to update it for 2016.

I’ve extended the Blocked Process Report Viewer to look at blocked process reports that were collected with extended events session.

bpr2

Where To Find It

The code for the procedure is where it always is on CodePlex at https://sqlblockedprocesses.codeplex.com/:

download

How To Use It

The viewer can consume blocked process report events captured by any extended events session as long as that session has a target of ring_buffer or event_file. For example, if you set up your extended events session the way Jeremiah Peschka did in Finding Blocked Processes and Deadlocks using SQL Server Extended Events. Then you would use the viewer like this:

exec sp_blocked_process_report_viewer
    @Source = 'blocked_process', -- the name that Jeremiah gave to his xe session
    @Type = 'XESESSION';

which gives you something like

The value of the blocked process report viewer is that it organizes all the blocked processes into “episodes”. Each episode has a lead blocker which is the process in the front of the traffic jam. Look at that process closely.

Feedback

Let me know how it goes. Tell me if there are any errors or performance issues. I’m especially interested in the performance of the viewer when given extended events sessions that use ring_buffer targets.

February 17, 2016

Future Proofing for Concurrency; Blocked Process Reports Are Not Enough

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am
SQL Server Concurrency

Takeaway: When assessing concurrency, performance testing is the primary way to ensure that a system can handle enough load concurrently. Monitoring blocked process reports does not give enough warning to anticipate blocking issues.

Logical Contention

Capacity planning is difficult for DBAs who expect growth. Will there be enough CPU, Memory or I/O to serve the anticipated load? One category falls outside those three, logical contention.

Logical contention is a problem where excessive blocking causes throughput to suffer. It would be great to get advanced warning. One essential strategy is to make use of the blocked process report. The problem is that blocked process reports are an alarm metric, not a guage metric. In other words, the blocked process report can indicate when there is a problem, but it is poor at giving advanced notice.

A Simple Concurrency Simulator

Check out this Concurrency Simulator.
ConcurrencyScreen

It simulates how well a server can handle tasks that use pessimistic locking (tasks that cannot be run in parallel because they block eachother). It’s amazing to see how nice things seem at 6000 tasks per hour and how quickly things can go wrong by 7000 tasks per hour. Unfortunately this scenario is too familiar. Episodes of excessive blocking tend to be either nonexistent or very painful with no in-between.

Graphically that looks like this:
ConcurrencyIssues

You’ll notice that even at a five second threshold, the blocked process report gives us little warning. Blocked process reports don’t warn of approaching trouble, they indicate that trouble is here.

How Useful Are Blocked Process Reports Then?

Continue monitoring blocked processes. Knowing that you’ve got concurrency issues is preferable to not knowing. Just keep in mind that it can’t replace performance testing.


Other animations and interactive media by Michael J Swart

February 11, 2016

Ugly Pragmatism For The Win

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 11:37 am

Yesterday I had my mind changed about the best way to do concurrency. I describe several methods in Mythbusting: Concurrent Update/Insert Solutions. My preferred method is to increase the isolation level and fine tune locks.

At least that was my preference. I recently changed my approach to use a method that gbn suggested in the comments. He describes his method as the “TRY CATCH JFDI pattern”. Normally I avoid solutions like that. There’s a rule of thumb that says developers should not rely on catching errors or exceptions for control flow. But I broke that rule of thumb yesterday.

By the way, I love the gbn’s description for the pattern “JFDI”. It reminds me of Shia Labeouf’s motivational video.

Okay, I’ll tell you the story.

The Original Defect

So there’s this table. It’s defined something like:

CREATE TABLE dbo.AccountDetails
(
  Email NVARCHAR(400) NOT NULL
    CONSTRAINT PK_AccountDetails PRIMARY KEY (Email),
  Created DATETIME NOT NULL
    CONSTRAINT DF_AccountDetails_Created DEFAULT GETUTCDATE(),
  Etc NVARCHAR(MAX)
)

And there’s a procedure defined something like:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
 
  DECLARE @Created DATETIME;
  DECLARE @EtcDetails NVARCHAR(MAX);
 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  SET XACT_ABORT ON
  BEGIN TRAN
 
    SELECT
      @Created = Created,
      @EtcDetails = Etc
    FROM dbo.AccountDetails
    WHERE Email = @Email
 
    IF @Created IS NULL
    BEGIN
      SET @Created = GETUTCDATE();
      SET @EtcDetails = @Etc;
      INSERT INTO dbo.AccountDetails (Email, Created, Etc)
      VALUES (@Email, @Created, @EtcDetails);
    END
 
  COMMIT
 
  SELECT @Email as Email, @Created as Created, @EtcDetails as Etc

Applications executing this procedure were deadlocking with each other. If you’re keen, try to figure out why before reading ahead. It’s pretty close to the problem described in the Mythbusting post. Specifically this was method 3: increased isolation level.

Initial Fix

So I decided to fine tune locks. I added an UPDLOCK hint:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
 
  DECLARE @Created DATETIME;
  DECLARE @EtcDetails NVARCHAR(MAX);
 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  SET XACT_ABORT ON
  BEGIN TRAN
 
    SELECT
      @Created = Created,
      @EtcDetails = Etc
    FROM dbo.AccountDetails WITH (UPDLOCK)    WHERE Email = @Email
 
    IF @Created IS NULL
    BEGIN
      SET @Created = GETUTCDATE();
      SET @EtcDetails = @Etc;
      INSERT INTO dbo.AccountDetails (Email, Created, Etc)
      VALUES (@Email, @Created, @EtcDetails);
    END
 
  COMMIT
 
  SELECT @Email as Email, @Created as Created, @EtcDetails as Etc

Bail Early If Possible

Okay, so this solution works. It’s concurrent and it performs just fine. I realized though that I can improve this further by avoiding the transaction and locks. Basically select the row and if it exists, bail early:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
 
  DECLARE @Created DATETIME;
  DECLARE @EtcDetails NVARCHAR(MAX);
 
  SELECT    @Created = Created,    @EtcDetails = Etc  FROM dbo.AccountDetails  WHERE Email = @Email;   IF (@Created IS NOT NULL)  BEGIN    SELECT @Email as Email, @Created as Created, @EtcDetails as Etc;    RETURN;  END 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  SET XACT_ABORT ON
  BEGIN TRAN
 
    SELECT
      @Created = Created,
      @EtcDetails = Etc
    FROM dbo.AccountDetails WITH (UPDLOCK)
    WHERE Email = @Email;
 
    IF @Created IS NULL
    BEGIN
      SET @Created = GETUTCDATE();
      SET @EtcDetails = @Etc;
      INSERT INTO dbo.AccountDetails (Email, Created, Etc)
      VALUES (@Email, @Created, @EtcDetails);
    END
 
  COMMIT
 
  SELECT @Email as Email, @Created as Created, @EtcDetails as Etc;

Take A Step Back

Okay, this is getting out of hand. The query shouldn’t have to be this complicated.
Luckily I work with a guy named Chris. He’s amazing at what he does. He questions everything without being a nitpicker (there’s a difference). He read through the Mythbusters post and followed all the links in the comments. He asked whether gbn’s JFDI pattern wasn’t better here. So I implemented it just to see what that looked like:

CREATE PROCEDURE dbo.s_GetAccountDetails_CreateIfMissing
(
  @Email NVARCHAR(400),
  @Etc NVARCHAR(MAX)
)
AS
  BEGIN TRY
    INSERT INTO dbo.AccountDetails Email, Etc
    SELECT @Email, @Etc
    WHERE NOT EXISTS 
      ( SELECT * FROM dbo.AccountDetails WHERE Email = @Email );
  END TRY
  BEGIN CATCH
    -- ignore duplicate key errors, throw the rest.
    IF ERROR_NUMBER() <> 2601 THROW; 
  END CATCH
 
  SELECT Email, Created, Etc
  FROM dbo.AccountDetails 
  WHERE Email = @Email;

Look at how much better that looks! No elevated transaction isolation levels. No query hints. The procedure itself is half as long as it used to be. The SQL is so much simpler and for that reason, I prefer this approach. I am happy in this case to use error handling for control flow.

So I checked in the change and updated my pull request. Chris’s last comment before he approved the pull request was “Looks good. Ugly pragmatism FTW.”

February 3, 2016

You’re Probably Taking Sort Order For Granted Somewhere

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

As many people will point out, you can’t depend on the order of results without using the ORDER BY clause. So it’s easy to say “Simple! Don’t write code that expects unsorted data to be sorted”. But it’s very difficult to be careful everywhere all the time.

Remember that this is an application problem and is not a SQL problem. We only get into trouble when applications (or people) expect results to be sorted when they’re not. So unless you have a tiny application, or a huge amount of discipline, it’s likely that there is some part of your application that assumes sorted results when it shouldn’t.

Here’s a method I used that attempts to identify such areas, exposing those assumptions. It involves reversing indexes.

If you don’t ask for results to be ordered they may still appear to be ordered. SQL Server will return results in a way that’s convenient and this is often in some index order. So if the indexes are reversed, then the idea is that what’s convenient for SQL Server will be reversed.

Which results are ordered on purpose and which are ordered by luck?

Which results are ordered on purpose and which are ordered by luck?

It’s impossible to tell. But after the indexes are reversed:

It's now apparent.

It’s now apparent.

Reverse the Indexes On A Dev Box

First use this powershell script to generate some SQL. It’s a script adapted from a Stackoverflow answer by Ben Thul “How to script primary key constraints”

[string]$dbname = "Adventureworks2012";
[string]$server = ".";
 
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO") | out-null
$SMOserver = New-Object ('Microsoft.SqlServer.Management.Smo.Server') -argumentlist $server
$db = $SMOserver.databases[$dbname]
 
"" > drop_indexes.sql
"" > create_indexes.sql
"" > drop_foreign_keys.sql
"" > create_foreign_keys.sql
 
$option_drop = new-object Microsoft.SqlServer.Management.Smo.ScriptingOptions;
$option_drop.ScriptDrops = $true;
foreach ($table in $db.Tables) {
	foreach ($index in $table.Indexes) {
		$index.Script( $option_drop ) >> drop_indexes.sql
		$index.Script() >> create_indexes.sql
	}
	foreach ($foreign_key in $table.ForeignKeys) {
		$foreign_key.Script( $option_drop ) >> drop_foreign_keys.sql
		$foreign_key.Script() >> create_foreign_keys.sql
	}
}

Now follow these steps. Use the generated SQL to:

  1. Drop the foreign keys
  2. Drop the indexes
  3. Open up the create_indexes.sql script in an editor and swap ASC for DESC and vice versa
  4. Create the reversed indexes
  5. Recreate the foreign keys

That’s it! Now unsorted results will be returned in a format convenient to SQL Server which should be opposite to the original order.

Test

Remember how these ORDER BY assumptions are human or application problems? It’s time to bring them into this process. Test your applications/reports manually, or if you’re fortunate enough to have them, run your automated tests.

I’m one of the fortunate ones. I have access to a suite of automated tests that includes thousands of integration tests. In my case, roughly one percent of them failed after this experiment. Colleagues reactions were varied. They ranged from “That many?” to “That few?”

This experiment cannot identify all ordering assumptions, but it has a good chance at identifying many.

Fix

First let me give some advice on how not to fix this. Don’t begin by indiscriminately sprinkling ORDER BY clauses throughout all your queries. I found the best approach is to handle each problem on a case-by-case basis.

Here are some approaches:

  • Fix the test For automated tests, sometimes the test itself assumed an order. This is an easy case to deal with.
  • Order results in the app If you’re using C#, try using Linq’s Enumerable.OrderBy. And if you’re using some other language or reporting framework, you should be able to sort there too.
  • Order in SQL If necessary order your results using SQL with the ORDER BY clause.

Happy ordering!

Older Posts »

Powered by WordPress