Michael J. Swart

August 31, 2015

Avoid Frequent use of TVPs With Wide Rows

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am
Tackling A Hairy Problem
This series includes a number of stand-alone posts which can fit together to tell a bigger story

Let’s say there’s a procedure that uses a table-valued parameter (TVP), and that TVP type has a large maximum width. Something like:

-- Create a message table type
CREATE TYPE MessageTableType AS TABLE 
  ( Message VARCHAR(MAX) );

Then each time the procedure is executed, SQL Server will pre-allocate six extents (48 pages) of space in tempdb whether or not it’s needed.

Frodo Baggins Quote

Measuring tempdb Activity

So what’s the problem? Well, if these procedures are called frequently (like hundreds or thousands of times per second), then the extra allocations for wide TVPs can be excessive. These allocations can cause tempdb latch contention.

I found that SQL Server will allocate tempdb space based on the width of a TVP table. If a TVP uses a table type that has a NVARCHAR(10) column, then SQL Server doesn’t pre-allocate any space at all. But if the TVP table type has a NVARCHAR(4000) column, then SQL Server will allocate six extents of space. To measure the tempdb activity exactly, I wrote a program which let me generate this:

PFS Operations

This chart takes some explaining. But first notice that a single query can cause up to 56 tempdb PFS operations!

Here’s how my program helped me create this chart. It starts by creating a table type. The table type has one column of type NVARCHAR(X). Next, the program executes a simple query that uses a table-valued parameter of that table type. The program measures the number of tempdb allocations and deallocations for various TVP widths: (X ranges from 1 to 4000). Also notice that

  • I’m focusing on PFS operations here, but GAM activity shows similar activity
  • The x axis corresponds to the maximum width of the column in the table type
  • The y axis corresponds to the number of PFS operations found in tempdb. An extent is allocated with one PFS operation, but deallocated one page at a time, this results in nine PFS operations per extent.
  • The steps correspond to four, five and six extents.
  • There is some overhead associated with creating and dropping tempdb objects. But that overhead is avoided with temp table caching. And those operations have been removed from this chart.
  • Table types that can be wider than a 8000 bytes such as those with multiple columns or with NVARCHAR(MAX) columns are treated the same as table types with an NVARCHAR(4000) column; they’re given six extents of space.

What I Think

I’m still making an assumption that transaction log operations on PFS and GAM pages share a one-to-one relationship with latches on those pages. But experimentally, I have seen that skinny TVPs do in fact enjoy much higher throughput than wider TVPs. In general, this is really hard to measure. I’m frustrated that there are no good metrics to help measure this kind of thing.

Six extents for wide TVPs is really excessive and causes unnecessary pressure on tempdb latches when used frequently. I now begin to worry about the width of table types and the frequency that they’re used as table-valued parameters. But I don’t think we should have to. Stuff like this should be the concern of the database engine, not the database administrator.

Thanks

While writing this post, I really appreciated the help of Paul White and his post Temporary Table Caching Explained

I also appreciate the huge amount of information on SQL Server internals posted by Paul Randal on his blog. Especially posts like Understanding data vs log usage for spills in tempdb.

August 12, 2015

What’s Going On Inside Tempdb?

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:56 am
Tackling A Hairy Problem
This series includes a number of stand-alone posts which can fit together to tell a bigger story

I discovered that digging into tempdb’s transaction log using the undocumented fn_dblog is a surprisingly useful technique for finding data to help battle tempdb contention. But it’s not easy. The transaction log is cryptic and sorting through the information can be difficult.

I wanted more tempdb internals info. Be careful what you wish for.

Like many many others, I’ve found it difficult to avoid or diagnose tempdb allocation contention issues. Typical advice includes

  • Use multiple equally-sized data files for tempdb
  • Use trace flag 1118 in order to allocated full extents instead of mixed extents
  • Look at your applications to reduce queries that use tempdb

It’s that last point that is so tricky. I need to find out what parts of my application are using tempdb but it’s used for so many different things. Robert Davis lists eighteen different ways that SQL Server uses tempdb in his whitepaper Demystifying tempdb Performance and Manageability. So in order to tackle tempdb allocation contention, there are two things I want to do. I want to (A) measure tempdb allocations and (B) dig into tempdb allocations.

The Trouble (a Quick Recap)

Remember, tempdb is a resource shared by everyone using the database server. When a process wants to use space in tempdb, it allocates pages. Part of that allocation involves calling dibs for those pages by updating the Page Free Space (PFS) page and the Global Allocation Map (GAM) page. But updating those pages involves taking a short-lived latch on them. So with enough activity, we can see latch contention on these pages. Other people have done a better job explaining that than I did just now. You can probably find many of those explanations by googling tempdb latch contention.

Measuring Tempdb Allocations

So if latches on tempdb pages are a limited resource, then it would be wonderful to measure how often latches are taken there. That way developers could experiment to determine what activities use tempdb and to what extent. Right now, that can’t be measured. Maybe we’re spoiled. We have a wide variety of extended events and a rich set of dynamic management views that let us dig really deep into SQL Server internals. But when it comes to measuring tempdb latches, there’s no suitable metric.

Here’s how Microsoft comes close:

Metric Why It’s Not What I Want
Performance counter Database transactions/sec specifically the tempdb database. This is a gauge metric and it would be useful if every tempdb transaction allocated a small number of pages. But at least one kind of activity – tempdb spills – will allocate many pages in a single tempdb transaction and that activity gets hidden.
sys.dm_db_task_space_usage and sys.dm_db_session_space_usage These are useful when you want to categorize tempdb allocations. It’s reports the number of tempdb allocations that have been taken for any given point in time. But tempdb allocations taken by long-running transactions can be over-reported and short-running transactions tend to be missed. This makes it difficult to tell whether the data indicates typical tempdb allocations or whether it’s just showing long held allocations.
Performance counter Page latch waits in Wait Statistics, specifically Waits started per second. This is an alarm metric and not specific to tempdb.
Performance counter category Latches (any counter). Again, these are all alarm metrics and not specific to tempdb.
Peformance counter Pages Allocated/sec or Page Deallocations/sec in Access Methods I really like this one. It’s not tempdb specific, but it’s a gauge metric and it’s getting close to what I want.
Performance counter Tempdb latches/sec with instances on GAM, SGAM and PFS Oooh perfect. Too bad this performance counter is fictional. It doesn’t exist. I would love this metric if it existed because it would be a gauge metric that most directly corresponds to tempdb contention problems.

Digging into Tempdb Allocations

Putting metrics aside for the moment, I’ve also tried a number of ways to dig into events and logs in order to tie tempdb use with user activity. I’ve had limited success with these two methods:

  • I found that polling the DMV sys.dm_os_waiting_tasks (an alarm metric) doesn’t always give me the sql text I want.
  • There are new extended events called sqlserver.mixed_extent_allocation and sqlserver.mixed_extent_deallocation introduced in SQL Server 2014, but these are focused on mixed extents (which I already avoid by enabling trace flag TF-1118).

But it turns out that tempdb’s transaction log is rich with information (even though a lot of it is cryptic).

Digging into Tempdb’s Transaction Log

The easiest way to look at tempdb’s transaction log is by using fn_dblog:

use tempdb;
select * from fn_dblog(null, null);

But it can be a heavy query. Here’s a light-weight query that gets the most recent 5 seconds of activity or 10000 rows whichever is less:

-- Collect tempdb log activity
use tempdb;
 
-- get the latest lsn for tempdb
declare @xact_seqno binary(10);
declare @xact_seqno_string varchar(50);
exec sp_replincrementlsn @xact_seqno OUTPUT;
set @xact_seqno_string = '0x' + CONVERT(varchar(50), @xact_seqno, 2);
set @xact_seqno_string = stuff(@xact_seqno_string, 11, 0, ':')
set @xact_seqno_string = stuff(@xact_seqno_string, 20, 0, ':');
 
-- wait for five seconds of activity:
waitfor delay '00:00:05';
 
select top 10000 * 
from fn_dblog(@xact_seqno_string, null);
go

This actually lets me answer a lot of questions:

What activity is there against PFS or GAM pages?
Each operation (allocation or deallocation) is likely one page latch. So we have:

select *
from fn_dblog(@xact_seqno_string, null)
where Context in ('LCX_PFS', 'LCX_GAM', 'LCX_SGAM')

or aggregated we have:

select count(*), Context
from fn_dblog(@xact_seqno_string, null)
where Context in ('LCX_PFS', 'LCX_GAM', 'LCX_SGAM')
group by Context
order by count(*) desc;

What kinds of tempdb transactions are there?

select count(*), [Transaction Name]
fn_dblog(@xact_seqno_string, null)
where Operation = 'LOP_BEGIN_XACT'
group by [Transaction Name]
order by count(*) desc;

Transaction names are cryptic, but they are organized by activity and we can use some educated guesses to figure out what’s going on. For example, queries that use table valued parameters are called TVQuery and are often associated with transactions called AllocFirstPage, FCheckAndCleanupCachedTempTable and if the TVP’s table type has a primary key defined, then offline index build.

PFS or GAM activity by tempdb transaction type
Putting it all together:

with recentTempdbLogs as
(
    select top 10000 * 
    from fn_dblog(@xact_seqno_string, null)
),
TransactionNames as
(
    select [Transaction Id], [Transaction Name]
    from recentTempdbLogs
    where Operation = 'LOP_BEGIN_XACT'
)
select tn.[Transaction Name], rtl.Context, COUNT(*) as Operations
from recentTempdbLogs rtl
join TransactionNames tn
    on tn.[Transaction Id] = rtl.[Transaction ID]
where Context in ('LCX_PFS', 'LCX_GAM', 'LCX_SGAM')
group by tn.[Transaction Name], Context
order by COUNT(*) desc

How Is This Actionable?

Remus Rusanu has a post titled How to read and interpret the SQL Server log. It gives an introduction to some of the internals involved with what you see in the transaction log. It’s not always obvious why understanding SQL Server internals helps in real world scenarios. I’ve found that it’s most useful for troubleshooting by focusing avenues of investigation (whether it’s TVPs, tempdb spills, version store stuff, etc…).

In my case, I used this data to narrow down the type of activity. Specifically I used

  • the Transaction SID column (giving me the user),
  • the SPID which gives me the session id and helps me tie it to data collected elsewhere
  • the AllocUnitName which I used to tie to a particular tempdb objects

More on what I found next week.

August 5, 2015

Types of Performance Metrics

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:00 am
Tackling A Hairy Problem
This series includes a number of stand-alone posts which can fit together to tell a bigger story

There seem to be two main kinds of performance metrics, ones that measure trouble and ones that measure resources. I’ll call the first kind “alarm” metrics and the other kind “gauge” metrics. Alarm metrics are important, but I value gauge metrics more. Both are essential to an effective monitoring and alerting strategy.

In the land of Mordor where calls get dropped

Alarms

Alarms are great for troubleshooting, they indicate that it’s time to react to something. They have names containing words like timeouts, alerts and errors, but also words like waits or queue length. And they tend to be spikey. For example, think about a common alarm metric: SQL Server’s Blocked Process Report (BPR). The report provides (actionable) information, but only after a concurrency issue is detected. Trouble can strike quick and SQL Server can go from generating zero BPR events per second to dozens or hundreds. Alarm metrics look like this:

Alarm Metric

Alarm Metric

Gauges

Now contrast that with a gauge metric. Gauge metrics often change value gradually and allow earlier interventions because they provide a larger window of opportunity to make corrections.
If you pick a decent threshold value, then all gauges can generate alerts (just like alarms do!). As they approach trouble, gauges can look like this:

Gauge Metric

Gauge Metric

And the best kind of gauge metrics are the kind that have their own natural threshold. Think about measuring the amount of free disk space or available memory. Trouble occurs when those values hit zero and those guages look like this:

Decreasing Gauge Metric

Decreasing Gauge Metric

Examples

I compare different gauges and alarms to further explain what I mean.

Alarms Gauges
Avg. Disks Read Queue Length Disk Reads/sec
Processor Queue Length % Processor Time
Buffer Cache Hit Ratio Page lookups/sec
“You are running low on disk space” “10.3 GB free of 119 GB”
Number of shoppers waiting at checkout Number of shoppers arriving per hour
Number of cars travelling slower than speed limit Number of cars per hour
Number of rings of power tossed into mount doom Ring distance to mount doom

Hat tip to Daryl McMillan. Our conversations led directly to this post.

June 24, 2015

How to Create Indexed Views Online

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

One of the limitations of indexed views is that their initial clustered indexes cannot be created online. Try it and you’ll get this error:

Msg 1967, Level 16, State 1, Line 34
Cannot create a new clustered index on a view online.

In Guidelines for Online Index Operations, Microsoft explicitly excludes the “Initial unique clustered index on a view” as an on-line operation. Challenge accepted.

Solution Overview

I’ll be honest. This solution is not for everyone. Especially if you don’t like to get your hands dirty.

The idea is to create a bit column in one of the base tables called IsMigrated which is initially 0. Add an extra where clause to the definition of the view IsMigrated = 1 so that the view is initially empty. Create the index and then gradually update the IsMigrated values to 1.

What follows is an example of what I mean.

The Setup

Consider these two tables I made up:
OnlineIndexViewSchema
In my scenario, queries will often join these tables together and sometimes concatenate the columns BaseURL and URL. I want to create the following view and index it to facilitate URL lookups. That view looks like this.

CREATE VIEW dbo.LINK_FULLURLS WITH SCHEMABINDING AS
  SELECT 
    L.LinkId,
    CASE 
      WHEN L.IsExternal = 1 THEN L.URL
      ELSE C.BaseURL + L.URL
    END AS FullURL,
    CHECKSUM ( 
      CASE 
        WHEN L.IsExternal = 1 THEN L.URL
        ELSE C.BaseURL + L.URL
      END ) AS FullURLChecksum
  FROM dbo.LINKS L 
  INNER JOIN dbo.CLIENTS C
    ON C.ClientId = L.ClientId
  WHERE L.DateDeleted IS NULL;

And the base tables can get big. In my example I’ll use 50,000 clients and 200 links per client (giving 10 million rows in LINKS). So now when the initial unique clustered index is created, it can takes minutes to complete and the base tables are unavailable for the whole duration.

Offline Method

For comparison purposes, here is one offline method of creating an indexed view.

  • Create the view (0 seconds)
  • Scan a base table to warm up the cache (20 seconds)
  • Create the initial clustered index on the view (1 minute 50 seconds offline)
  • Create an additional nonclustered index on the view (2 minute 30 seconds)

Depending on the circumstances, that offline step could take longer and could be a problem. The next method attempts to get around that.

Online Method

Follow these steps

  • Add a new bit column IsMigrated (default 0) to one of the base tables. In my case I use the table LINKS and because I’m using SQL Server 2012, this step is instantaneous (0 seconds)
  • Create the view (0 seconds)
  • Warm up the cache by scanning the LINKS base table (20 seconds)
  • Create the initial clustered index on the view. With no writes and minimal reads this is very quick. Check it out.

    /*————————
    CREATE UNIQUE CLUSTERED INDEX IX_LINK_FULLURLS
      ON dbo.LINK_FULLURLS(LinkId);
    ————————*/
     
     Query executed successfully.
     Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, …
     Table ‘LINKS’. Scan count 3, logical reads 511159, physical reads 0, read-ahead reads 0, …
     
     SQL Server Execution Times:
       CPU time = 1295 ms, elapsed time = 833 ms.

  • Create the additional nonclustered index which is also initially empty (0 seconds)
  • Change the default of IsMigrated to 1 (0 seconds)
  • Update the value of IsMigrated to 1 in batches. (8 minutes 4 seconds but online)

I don’t know if you caught it, but there is no step that removes the column IsMigrated. It stays behind messing up the data model. To remove it would require that we modify the definition of the view and that would unravel everything we did. That’s the catch.

Comparison

Here are the times that I saw:

Indexed View Creation Timing

With my method, I’m not eliminating the offline step, I’m reducing its duration. It normally takes a long time to create the clustered index. I measured about half a minute of reading and two minutes of writing. My method’s offline step avoids all of the writes with the IsMigrated column and avoids most (or all) of the physical reads by warming up the cache.

Caveats and Other Notes

  • The demo scripts I provided are not a recipe but an example of how to implement this strategy. If you feel like using this strategy yourself, you’re going to have to write and test your own scripts.
  • This indexed view is not quite as useful as your typical indexed view. Normally SQL Server can sometimes choose to use an indexed view when executing queries that don’t even mention the view. But the extra IsMigrated = 1 clause in the view prevents SQL Server from doing so.
  • Other databases like Oracle and Postgres have things called Materialized Views. Those can sometimes store data that’s a bit stale, but they don’t suffer from the same online troubles that SQL Server’s Indexed View does. Sometimes the grass is greener on the other side of the fence.
  • I wish I didn’t need the workaround. If you’d like to add your voice to the feedback I gave to Microsoft, then vote up this connect item Create Clustered Indexes on Views WITH (ONLINE=ON).
  • I tried creating a temporary index (filtered, covering and narrow) on the base tables to help the view creation. It didn’t seem to help too much because my largest bottleneck is on writes, not reads.
  • I tried creating a check constraint on IsMigrated = 0 before I created the index hoping that SQL Server could use it to reduce the reads to zero. It didn’t help.

Use Case

Like I mentioned, this method will be useful when you have no opportunity to apply offline changes to live databases and you don’t mind messing up your data model. I doubt that many will use this method. I think it’s rare to belong to an organization that can’t tolerate downtime but can tolerate this data modeling sloppiness.

If you have a large amount of development time and resources, a better way to handle this situation might be to build in some wiggle room for offline maintenance like this. A slightly more graceful alternative is to build reduced functionality into your application like the ability to disable a single feature or operate in read-only mode. This is easier said than done of course, but it accommodates a much wider variety of solutions to deployment challenges like this.

May 20, 2015

It’s Hard To Destroy Data

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

Had I been born later, I probably would have gone with Wreck-It Ralph It is surprisingly difficult to delete data permanently. SQL Server has sophisticated features that prevent all kinds of data loss, but the opposite is not true. SQL Server has very few features that help you destroy data.

But why would anyone need to destroy data? Lots of reasons.

  • Some laws and data retention policies require that data older than a certain age gets destroyed.
  • Developers often ask for production data, or a subset. A DBA could decide to give them a backup having first dropped certain tables or columns that contain personally identifiable information.

There are countless other reasons why you might want to destroy data and I’m going to focus on three general use cases: destroying tables, destroying columns and destroying rows.

Straight Up Deleting Is Not Enough

I’m going to demonstrate that deleting data is not equivalent to destroying data. Deleting data is just hiding it.

First create a table, populate it and see what it looks like on disk:

CREATE DATABASE DeletedDataDemo;
GO
 
USE DeletedDataDemo;
 
SELECT col1, col2, col3 
INTO dbo.Table1
FROM 
( 
  VALUES (1, 'aaaaaaaaaaaa', 'bbbbbbbbbbbb'),
         (2, 'cccccccccccc', 'dddddddddddd'),
         (3, 'eeeeeeeeeeee', 'ffffffffffff')
) data (col1, col2, col3)
 
DBCC TRACEON (3604);
DBCC IND(DeletedDataDemo, 'dbo.Table1', 1) -- file 1, page 216 for example
 
--update the following command to look at that page
DBCC PAGE (DeletedDataDemo, 1, 216, 2);

Notice that the last command DBCC PAGE shows the contents of the table as it appears on disk.

Data dump

That data is still there

And now drop the table that was just created:

DROP TABLE dbo.Table1;
 
-- or
-- ALTER TABLE dbo.Table1 DROP COLUMN col3;
 
-- or
-- DELETE dbo.Table1 WHERE col1 IN (2,3);
 
DBCC PAGE (DeletedDataDemo, 1, 216, 2);

Run the DBCC PAGE command and notice that the data values are still there! That data cannot be queried but it remains on disk. Even though this data is in deallocated space, this data can survive backups and restores (even after checkpoints).

Repeat the demo from the beginning. But this time, instead of dropping the table, try deleting rows or dropping a column. Just like the dropped table the data remains on disk.

Try to Destroy Data by Cleaning Pages

So SQL Server doesn’t actually delete data, it hides it and deallocates the space. A ghost cleanup job can eventually clean up this data but you can’t depend on how frequently it runs. That leads to stackoverflow questions like this one: How can I securely destroy some data using sql server 2008?

One of the answers there mentions that sp_clean_db_free_space can force the cleanup to run early. Cleaning pages is exactly what’s required to get rid of that data in deallocated space. But cleaning pages consumes a lot of I/O and maybe that’s why SQL Server 2008 introduced a lot of control over the granularity of cleaning pages:

  • Use sp_clean_db_free_space for a whole database.
  • Use sp_clean_db_file_free_space for a single data file.
  • Use DBCC CLEANPAGE(@dbid, @fileid, @pageid) after a CHECKPOINT for a single page.

These commands force SQL Server to write zeros to any deallocated space.

Mostly…

Dropped-Column Data Sticks Around

Columns are a special case. SQL Server doesn’t reclaim the space that is freed by dropped columns. And this dropped column data is not touched by any of the page cleaning methods. In order to destroy this data, it’s necessary to add a few manual steps. These steps can destroy column data by either overwriting column values or by rebuilding its table’s indexes.

How To Destroy Data

Here’s a table that shows the syntax to use when trying to destroy data.

When destroying … Use this syntax
Tables
DROP TABLE dbo.Table1;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Rows
DELETE dbo.Table1 WHERE col1 = @col1;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Columns
(except columns in heaps)
ALTER TABLE dbo.Table1 DROP COLUMN Col3;
ALTER INDEX ALL ON dbo.Table1 REBUILD;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Columns
(alternative syntax)
UPDATE dbo.Table1 SET col3 = N'';
ALTER TABLE Table1 DROP COLUMN Col3;
EXEC sp_clean_db_free_space 'DeletedDataDemo';
Other data structures
(Columnstore, Full-text, XML indexes, Service Broker Queues etc…)
Not evaluated. I don’t know how long this data sticks around.

The Lessons

Microsoft warns you that these data cleaning procedures are very I/O intensive so I would be reluctant to suggest these methods when alternatives exist.

The clumsiness of the dropped column scenario makes me wonder about other scenarios where SQL Server deletes data but doesn’t destroy it.

And this leads directly to this bit of advice:

Don’t distribute backups of databases that have contained sensitive information.

April 27, 2015

The Appeal Of Lightning Talks

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:32 am

Lightning talks are quick presentations, usually five-ten minutes long, and I really enjoy them.

But it’s not (only) about indulging short-attention-span habits. Lightning talks tend to be really dense with interesting information. The speaker is forced to say one thing and nothing else. Speakers need to make a choice between important content and not-so important content. It’s very difficult for the speaker, but it’s great for us in the audience. 
It’s possible to get dense information into a hour long session, but those sessions are more prone to contain filler.

The best lightning talk I’ve ever watched is this one, Fighting Dirty In Scrabble by Mehal Shah.

He just crushes that talk. I use him as a model for a really good content-driven lightning talk.

That’s why I’m excited about attending the third annual member presentations at Toronto PASS.
TORPASS_web

Lightning Talks About SQL in Toronto

It’s basically like an open mic night. If you’re going to attend one user group meeting in Toronto, this is the one. I plan on giving a talk about my favorite nemesis, tempdb.

The Toronto PASS user group meeting is tomorrow (April 28, 2015) at 5:30 pm at the Northern District Library (near Eglinton and Yonge).
By the way, if any local friends from KW want to catch a ride to Toronto, I’ll be leaving KW in the afternoon.

April 6, 2015

Finding Scalar Aggregate Indexed Views in Your Database

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:35 am

Fellow MVP Paul White recently blogged about a SQL Server bug. His blog post is called An Indexed View Bug with Scalar Aggregates

It’s a really well-written article (as always). After reading it, it’s natural to wonder whether your databases have any such indexed views?

Chances are that you don’t have any. These indexed views aren’t too common, but if you do have indexed views that use scalar aggregates, here’s a query that can help you find them.
The following query finds indexed views without GROUP BY that have exactly one row in any partition.

with IndexedViewIds as
(
  SELECT [object_id] from sys.indexes
  INTERSECT
  SELECT [object_id] from sys.views
), 
IndexedViewInfo as 
(
  SELECT 
    [object_id],
    OBJECT_SCHEMA_NAME([object_id]) as SchemaName,
    OBJECT_NAME([object_id]) as ViewName,
    OBJECT_DEFINITION([object_id]) as [Definition]
  FROM IndexedViewIds
)
SELECT 
    v.[object_id],
    v.SchemaName,
    v.ViewName,
    v.[Definition]
FROM IndexedViewInfo v
WHERE NOT EXISTS
  (
    SELECT * 
    FROM sys.partitions
    WHERE [object_id] = v.[object_id]
    AND [rows] <> 1
  )
  AND v.[definition] NOT LIKE '%GROUP BY%'

Notes

The query isn’t perfect. It’s actually possible (but rare) to get false positives here. This query doesn’t look for any aggregate keywords. So look for them in the SELECT list. Also make sure no GROUP BY clause exists.

It’s also possible (but rare) to miss some indexed views when GROUP BY gets mentioned, but not used. For example, if an indexed view definition contains the phrase GROUP BY in a comment, it won’t show up in this list.

(For my curious co-workers, none of our indexed views use scalar aggregates)

April 2, 2015

Look at Blocking By Index

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:33 am

This post is for me. It’s a script I find useful so I’m putting it in a place where I know to go look for it, my blog. You may find it useful too.

The script below extends the DMV sys.dm_db_index_operational_stats by focusing on lock waits and supplying index and table names. If you want to know about blocking by index, these queries can help.

If you want something more comprehensive, I’d suggest Kendra Little’s http://www.brentozar.com/blitzindex/

Blocking Wait Stats

-- Get index blocking wait stats
select 
  t.name as tableName, 
  i.name as indexName, 
  ios.row_lock_wait_count, 
  ios.row_lock_wait_in_ms, 
  ios.page_lock_wait_count,
  ios.page_lock_wait_in_ms
from sys.dm_db_index_operational_stats(db_id(), null, null, null) ios
join sys.indexes i
  on i.object_id = ios.object_id
  and i.index_id = ios.index_id
join sys.tables t
  on ios.object_id = t.object_id
where ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms > 0
order by ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms desc

Create Snapshot Of Stats

begin try
    drop table #IndexBlockingWaitStats
end try
begin catch
-- swallow error
end catch
 
select 
  [object_id], 
  index_id, 
  row_lock_wait_count, 
  row_lock_wait_in_ms, 
  page_lock_wait_count,
  page_lock_wait_in_ms
into #IndexBlockingWaitStats
from sys.dm_db_index_operational_stats(db_id(), null, null, null)

Get Waits Since Last Snapshot

-- Get delta results
select 
  t.name as tableName, 
  i.name as indexName, 
  ios.row_lock_wait_count  - iossnapshot.row_lock_wait_count    as row_lock_wait_count, 
  ios.row_lock_wait_in_ms  - iossnapshot.row_lock_wait_in_ms    as row_lock_wait_in_ms, 
  ios.page_lock_wait_count - iossnapshot.page_lock_wait_count   as page_lock_wait_count,
  ios.page_lock_wait_in_ms -  iossnapshot.page_lock_wait_in_ms  as page_lock_wait_in_ms
from sys.dm_db_index_operational_stats(db_id(), null, null, null) ios
join #IndexBlockingWaitStats iossnapshot
  on iossnapshot.[object_id] = ios.[object_id]
  and iossnapshot.index_id = ios.index_id
join sys.indexes i
  on i.[object_id] = ios.[object_id]
  and i.index_id = ios.index_id
join sys.tables t
  on ios.[object_id] = t.[object_id]
cross apply ( 
  select 
  ( ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms ) -
  ( iossnapshot.row_lock_wait_in_ms + iossnapshot.page_lock_wait_in_ms )
) as calc(totalwaittime)
where totalwaittime > 0
order by totalwaittime desc

Notes

  • There are many kinds of lock waits, this script focuses on waits on pages or rows. Other kinds of waits not shown here include objects (i.e. locks on tables), latches and IO latches.
  • This is only one small focused tool in a troubleshooting tool belt. Don’t depend on it too much
  • If you’re keen, you’ll notice I didn’t give info on schemas or on partitions, sounds like a fun exercise doesn’t it?
  • No illustration? Nope, or at least not yet. If I continue to find this script useful, then I plan on adding an illustration, because I use Browse By Illustration as my main navigation tool

April 1, 2015

Some Tweets I Drew

Filed under: SQLServerPedia Syndication,Tongue In Cheek — Michael J. Swart @ 8:00 am

Last week I asked for people to give me illustration ideas on twitter using the hashtag #SwartDrawsTweets. Thanks for everyone who gave me suggestions. Here are a few of them.

The Stig

Brent Ozar tweeted:


(Sorry Richie. Use the hashtag!)
What will he do next. The world wants to know

The Drive-thru

Then Aaron Bertrand tweeted:


@SQLBrit and other left-side drivers are probably wondering about that sign.

M. C. Escher

Ken Fisher had this idea


He probably had a different idea in mind, but I’ve always loved Escher’s Bond of Union.
Maybe not impossible, but linked servers are rarely part of a good solution

Bonus Regional Illustration

And of course I saved the best for last which all Canadians will get:
Regional filtering is a bizarre youtube feature. It's even more bizarre that people use it.

Using These Images

By the way, I really had fun doing these images, so share away. I waive any copyright I have on these three images. Copy them, modify them without attribution wherever you like. Profit if you can. Go nuts.

February 26, 2015

When Parameter Sniffing Caused Deadlocks

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:06 am

Last week I was asked to troubleshoot some deadlocks in production. I was surprised to find out that parameter sniffing was one of the causes. I describe the investigation below.

Parameter Sniffing

SQL Server does this neat trick when you give it a query with parameters. The query optimizer will take the parameter values into account when making cardinality estimates. It finds the best query plan it can for these values. This is called parameter sniffing.

But parameter sniffing combined with query plan caching means that SQL Server seems to only care about parameter values the first time it sees a query. So ideally the first parameter values should be typical parameter values.

When people talk about parameter sniffing problems, it’s usually because this assumption doesn’t hold. Either the query was compiled with atypical values, or maybe the data has an uneven distribution (meaning that “typical” parameter values don’t exist).

But remember that…

Does this smell funny to you?

The Problem

The problem I saw in production involved some fairly typical looking tables. They looked something like this:

CollectionERD

This is how they were defined.

create table dbo.Collections
(
  CollectionId bigint identity primary key, 
  Name nvarchar(20) default ('--unnamed--') not null,
  Extrastuff char(100) not null default ('')
);
 
create table dbo.CollectionItems
(
  CollectionItemId bigint identity primary key,
  CollectionId bigint not null references dbo.Collections(CollectionId),
  Name nvarchar(20) default ('--unnamed--') not null,
  ExtraStuff char(100) not null default ('')
);
 
create index ix_CollectionItems
on dbo.CollectionItems(CollectionId);

The errors we were getting were deadlock errors and the deadlock graphs we collected were always the same. They looked something like this:

CollectionsDeadlockGraph

See that procedure called s_CopyCollection? It was defined like this:

create procedure dbo.s_CopyCollection 
  @CollectionId bigint
as
 
  set nocount on;
 
  declare @NewCollectionId bigint;
 
  if @CollectionId = 0
     return;
 
  if not exists (select 1 from dbo.Collections where CollectionId = @CollectionId)
     return;
 
  set xact_abort on;
  begin tran;
 
    insert dbo.Collections (Name, Extrastuff)
    select Name, ExtraStuff
    from dbo.Collections
    where CollectionId = @CollectionId;
 
    set @NewCollectionId = SCOPE_IDENTITY();
 
    insert dbo.CollectionItems (CollectionId, Name, ExtraStuff)
    select @NewCollectionId, Name, ExtraStuff
    from dbo.CollectionItems
    where CollectionId = @CollectionId;
 
  commit;

It’s a pretty standard copy procedure right? Notice that this procedure exits early if @CollectionId = 0. That’s because 0 is used to indicate that the collection is in the “recycle bin”. And in practice, there can be many recycled collections.

Some Digging

I began by reproducing the problem on my local machine. I used this method to generate concurrent activity. But I couldn’t reproduce it! The procedure performed well and had no concurrency issues at all.

This meant more digging. I looked at the procedures behavior in production and saw that they were performing abysmally. So I grabbed the query plan from prod and here’s what that second insert statement looked like:

CollectionsBadPlan

This statement inserts into CollectionItems but it was scanning Collections. That was a little confusing. I knew that the insert needed to check for the existence of a row Collections in order to enforce the foreign key, but I didn’t think it had to scan the whole table. Compare that to what I was seeing on my local database:

CollectionGoodPlan

I looked at the compilation parameters (SQL Sentry Plan Explorer makes this easy) of the plan seen in production and saw that the plan was compiled with @CollectionId = 0. In this case, the assumption about parameter sniffing I mentioned earlier (that the compilation parameters should be typical parameters) did not hold.

This procedure was performing poorly in production (increasing the likelihood of overlapping executions times) but also, each one was taking shared locks on the whole Collections table right after having inserted into it. The whole procedure uses an explicit transaction and that’s a recipe for deadlocks.

Doing Something About It

Here are things I considered and some actions I took. My main goal was to avoid the bad plan shown above.

  • Never call the procedure with @CollectionId = 0. The early-exit in the procedure was not enough to avoid bad query plans. If the procedure never gets called with @CollectionId = 0, then SQL Server can never sniff the value 0.
  • I began to consider query hints. Normally I avoid them because I don’t like telling SQL Server “I know better than you”. But in this case I did. So I began to consider hints like: OPTIMIZE FOR (@CollectionId UNKNOWN).
  • I asked some experts. I know Paul White and Aaron Bertrand like to hang out at SQLPerformance.com. So I asked my question there. It’s a good site which is slightly better than dba.stackexchange when you want to ask about query plans.
  • Aaron Bertrand recommended OPTION (RECOMPILE). A fine option. I didn’t really mind the impact of the recompiles, but I like keeping query plans in cache when I can, just for reporting purposes (I can’t wait for the upcoming Query Store feature)
  • Paul White recommended a LOOP JOIN hint on the insert query. That makes the INSERT query look like this:
     insert dbo.CollectionItems (CollectionId, Name, ExtraStuff)
      select @NewCollectionId, Name, ExtraStuff
      from dbo.CollectionItems
      where CollectionId = @CollectionId
      option (loop join);

    That was something new for me. I thought LOOP JOIN hints were only join hints, not query hints.

  • Paul White also mentioned some other options, a FAST 1 hint or a plan guide and he also suggested OPTION (RECOMPILE).

So I stopped calling the procedure with @CollectionId = 0 and I used a query hint to guarantee the better plan shown above. The performance improved and the procedure was no longer vulnerable to inconsistent performance due to parameter sniffing.

In general, there seem to be only two ways to avoid deadlocks. The first way minimizes the chance that two queries are executing at the same time. The second way carefully coordinates the locks that are taken and the order they’re taken in. Most deadlock solutions boil down to one of these methods. I was happy with this solution because it did both.

Older Posts »

Powered by WordPress