Michael J. Swart

April 6, 2015

Finding Scalar Aggregate Indexed Views in Your Database

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:35 am

Fellow MVP Paul White recently blogged about a SQL Server bug. His blog post is called An Indexed View Bug with Scalar Aggregates

It’s a really well-written article (as always). After reading it, it’s natural to wonder whether your databases have any such indexed views?

Chances are that you don’t have any. These indexed views aren’t too common, but if you do have indexed views that use scalar aggregates, here’s a query that can help you find them.
The following query finds indexed views without GROUP BY that have exactly one row in any partition.

with IndexedViewIds as
(
  SELECT [object_id] from sys.indexes
  INTERSECT
  SELECT [object_id] from sys.views
), 
IndexedViewInfo as 
(
  SELECT 
    [object_id],
    OBJECT_SCHEMA_NAME([object_id]) as SchemaName,
    OBJECT_NAME([object_id]) as ViewName,
    OBJECT_DEFINITION([object_id]) as [Definition]
  FROM IndexedViewIds
)
SELECT 
    v.[object_id],
    v.SchemaName,
    v.ViewName,
    v.[Definition]
FROM IndexedViewInfo v
WHERE NOT EXISTS
  (
    SELECT * 
    FROM sys.partitions
    WHERE [object_id] = v.[object_id]
    AND [rows] <> 1
  )
  AND v.[definition] NOT LIKE '%GROUP BY%'

Notes

The query isn’t perfect. It’s actually possible (but rare) to get false positives here. This query doesn’t look for any aggregate keywords. So look for them in the SELECT list. Also make sure no GROUP BY clause exists.

It’s also possible (but rare) to miss some indexed views when GROUP BY gets mentioned, but not used. For example, if an indexed view definition contains the phrase GROUP BY in a comment, it won’t show up in this list.

(For my curious co-workers, none of our indexed views use scalar aggregates)

April 2, 2015

Look at Blocking By Index

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:33 am

This post is for me. It’s a script I find useful so I’m putting it in a place where I know to go look for it, my blog. You may find it useful too.

The script below extends the DMV sys.dm_db_index_operational_stats by focusing on lock waits and supplying index and table names. If you want to know about blocking by index, these queries can help.

If you want something more comprehensive, I’d suggest Kendra Little’s http://www.brentozar.com/blitzindex/

Blocking Wait Stats

-- Get index blocking wait stats
select 
  t.name as tableName, 
  i.name as indexName, 
  ios.row_lock_wait_count, 
  ios.row_lock_wait_in_ms, 
  ios.page_lock_wait_count,
  ios.page_lock_wait_in_ms
from sys.dm_db_index_operational_stats(db_id(), null, null, null) ios
join sys.indexes i
  on i.object_id = ios.object_id
  and i.index_id = ios.index_id
join sys.tables t
  on ios.object_id = t.object_id
where ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms > 0
order by ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms desc

Create Snapshot Of Stats

begin try
    drop table #IndexBlockingWaitStats
end try
begin catch
-- swallow error
end catch
 
select 
  [object_id], 
  index_id, 
  row_lock_wait_count, 
  row_lock_wait_in_ms, 
  page_lock_wait_count,
  page_lock_wait_in_ms
into #IndexBlockingWaitStats
from sys.dm_db_index_operational_stats(db_id(), null, null, null)

Get Waits Since Last Snapshot

-- Get delta results
select 
  t.name as tableName, 
  i.name as indexName, 
  ios.row_lock_wait_count  - iossnapshot.row_lock_wait_count    as row_lock_wait_count, 
  ios.row_lock_wait_in_ms  - iossnapshot.row_lock_wait_in_ms    as row_lock_wait_in_ms, 
  ios.page_lock_wait_count - iossnapshot.page_lock_wait_count   as page_lock_wait_count,
  ios.page_lock_wait_in_ms -  iossnapshot.page_lock_wait_in_ms  as page_lock_wait_in_ms
from sys.dm_db_index_operational_stats(db_id(), null, null, null) ios
join #IndexBlockingWaitStats iossnapshot
  on iossnapshot.[object_id] = ios.[object_id]
  and iossnapshot.index_id = ios.index_id
join sys.indexes i
  on i.[object_id] = ios.[object_id]
  and i.index_id = ios.index_id
join sys.tables t
  on ios.[object_id] = t.[object_id]
cross apply ( 
  select 
  ( ios.row_lock_wait_in_ms + ios.page_lock_wait_in_ms ) -
  ( iossnapshot.row_lock_wait_in_ms + iossnapshot.page_lock_wait_in_ms )
) as calc(totalwaittime)
where totalwaittime > 0
order by totalwaittime desc

Notes

  • There are many kinds of lock waits, this script focuses on waits on pages or rows. Other kinds of waits not shown here include objects (i.e. locks on tables), latches and IO latches.
  • This is only one small focused tool in a troubleshooting tool belt. Don’t depend on it too much
  • If you’re keen, you’ll notice I didn’t give info on schemas or on partitions, sounds like a fun exercise doesn’t it?
  • No illustration? Nope, or at least not yet. If I continue to find this script useful, then I plan on adding an illustration, because I use Browse By Illustration as my main navigation tool

February 26, 2015

When Parameter Sniffing Caused Deadlocks

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:06 am

Last week I was asked to troubleshoot some deadlocks in production. I was surprised to find out that parameter sniffing was one of the causes. I describe the investigation below.

Parameter Sniffing

SQL Server does this neat trick when you give it a query with parameters. The query optimizer will take the parameter values into account when making cardinality estimates. It finds the best query plan it can for these values. This is called parameter sniffing.

But parameter sniffing combined with query plan caching means that SQL Server seems to only care about parameter values the first time it sees a query. So ideally the first parameter values should be typical parameter values.

When people talk about parameter sniffing problems, it’s usually because this assumption doesn’t hold. Either the query was compiled with atypical values, or maybe the data has an uneven distribution (meaning that “typical” parameter values don’t exist).

But remember that…

Does this smell funny to you?

The Problem

The problem I saw in production involved some fairly typical looking tables. They looked something like this:

CollectionERD

This is how they were defined.

create table dbo.Collections
(
  CollectionId bigint identity primary key, 
  Name nvarchar(20) default ('--unnamed--') not null,
  Extrastuff char(100) not null default ('')
);
 
create table dbo.CollectionItems
(
  CollectionItemId bigint identity primary key,
  CollectionId bigint not null references dbo.Collections(CollectionId),
  Name nvarchar(20) default ('--unnamed--') not null,
  ExtraStuff char(100) not null default ('')
);
 
create index ix_CollectionItems
on dbo.CollectionItems(CollectionId);

The errors we were getting were deadlock errors and the deadlock graphs we collected were always the same. They looked something like this:

CollectionsDeadlockGraph

See that procedure called s_CopyCollection? It was defined like this:

create procedure dbo.s_CopyCollection 
  @CollectionId bigint
as
 
  set nocount on;
 
  declare @NewCollectionId bigint;
 
  if @CollectionId = 0
     return;
 
  if not exists (select 1 from dbo.Collections where CollectionId = @CollectionId)
     return;
 
  set xact_abort on;
  begin tran;
 
    insert dbo.Collections (Name, Extrastuff)
    select Name, ExtraStuff
    from dbo.Collections
    where CollectionId = @CollectionId;
 
    set @NewCollectionId = SCOPE_IDENTITY();
 
    insert dbo.CollectionItems (CollectionId, Name, ExtraStuff)
    select @NewCollectionId, Name, ExtraStuff
    from dbo.CollectionItems
    where CollectionId = @CollectionId;
 
  commit;

It’s a pretty standard copy procedure right? Notice that this procedure exits early if @CollectionId = 0. That’s because 0 is used to indicate that the collection is in the “recycle bin”. And in practice, there can be many recycled collections.

Some Digging

I began by reproducing the problem on my local machine. I used this method to generate concurrent activity. But I couldn’t reproduce it! The procedure performed well and had no concurrency issues at all.

This meant more digging. I looked at the procedures behavior in production and saw that they were performing abysmally. So I grabbed the query plan from prod and here’s what that second insert statement looked like:

CollectionsBadPlan

This statement inserts into CollectionItems but it was scanning Collections. That was a little confusing. I knew that the insert needed to check for the existence of a row Collections in order to enforce the foreign key, but I didn’t think it had to scan the whole table. Compare that to what I was seeing on my local database:

CollectionGoodPlan

I looked at the compilation parameters (SQL Sentry Plan Explorer makes this easy) of the plan seen in production and saw that the plan was compiled with @CollectionId = 0. In this case, the assumption about parameter sniffing I mentioned earlier (that the compilation parameters should be typical parameters) did not hold.

This procedure was performing poorly in production (increasing the likelihood of overlapping executions times) but also, each one was taking shared locks on the whole Collections table right after having inserted into it. The whole procedure uses an explicit transaction and that’s a recipe for deadlocks.

Doing Something About It

Here are things I considered and some actions I took. My main goal was to avoid the bad plan shown above.

  • Never call the procedure with @CollectionId = 0. The early-exit in the procedure was not enough to avoid bad query plans. If the procedure never gets called with @CollectionId = 0, then SQL Server can never sniff the value 0.
  • I began to consider query hints. Normally I avoid them because I don’t like telling SQL Server “I know better than you”. But in this case I did. So I began to consider hints like: OPTIMIZE FOR (@CollectionId UNKNOWN).
  • I asked some experts. I know Paul White and Aaron Bertrand like to hang out at SQLPerformance.com. So I asked my question there. It’s a good site which is slightly better than dba.stackexchange when you want to ask about query plans.
  • Aaron Bertrand recommended OPTION (RECOMPILE). A fine option. I didn’t really mind the impact of the recompiles, but I like keeping query plans in cache when I can, just for reporting purposes (I can’t wait for the upcoming Query Store feature)
  • Paul White recommended a LOOP JOIN hint on the insert query. That makes the INSERT query look like this:
     insert dbo.CollectionItems (CollectionId, Name, ExtraStuff)
      select @NewCollectionId, Name, ExtraStuff
      from dbo.CollectionItems
      where CollectionId = @CollectionId
      option (loop join);

    That was something new for me. I thought LOOP JOIN hints were only join hints, not query hints.

  • Paul White also mentioned some other options, a FAST 1 hint or a plan guide and he also suggested OPTION (RECOMPILE).

So I stopped calling the procedure with @CollectionId = 0 and I used a query hint to guarantee the better plan shown above. The performance improved and the procedure was no longer vulnerable to inconsistent performance due to parameter sniffing.

In general, there seem to be only two ways to avoid deadlocks. The first way minimizes the chance that two queries are executing at the same time. The second way carefully coordinates the locks that are taken and the order they’re taken in. Most deadlock solutions boil down to one of these methods. I was happy with this solution because it did both.

January 23, 2015

Designing Indexed Views for OLTP Workloads

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

When I look at indexed views defined on OLTP databases, I’m encouraged when their join diagrams resemble snowflake schemas.

It must be nice in Vermont this time of the year.

When you create an indexed view, SQL Server will enforce a number of restrictions. These restrictions ensure that your views are deterministic and easy to maintain. The restrictions are more than a recommendation, SQL Server simply won’t let you create the index if your view doesn’t meet those criteria.

Indexed Views Can Sometimes Cause Poor Performance

I once thought that if I followed Microsoft’s prerequisites for indexed views, then the maintenance of those indexed views was guaranteed to always be safe. I thought the restrictions would guarantee performance comparable to the maintenance of a regular index. But I was wrong, sometimes it can be much worse. Let’s look an example I invented for this post. Check out the following UPDATE statement. SQL Server reports eighteen logical reads:

use AdventureWorks2012
 
SET STATISTICS IO ON
 
-- An update of the Product table (no indexed views defined)
UPDATE Production.Product
SET Color = 'Midnight' 
WHERE Color = 'Black';
 
-- Table 'Product'. Scan count 1, logical reads 18

But when I create this indexed view:

CREATE VIEW dbo.v_AggregateQuantityByColor WITH SCHEMABINDING
AS 
 
SELECT 
  p.Color,
  SUM(th.Quantity) AS [total quantity],
  COUNT_BIG(*) AS [transaction count]
FROM Production.TransactionHistory th
JOIN Production.Product p
  ON th.ProductID = p.ProductID
GROUP BY p.Color;
 
GO
 
CREATE UNIQUE CLUSTERED INDEX ix_v_AggregateQuantityByColor
  ON dbo.v_AggregateQuantityByColor (Color)
 
GO

Then the same UPDATE statement becomes significantly more expensive requiring over 1000 reads:

use AdventureWorks2012
 
SET STATISTICS IO ON
 
-- An update of the Product table (maintenance of an indexed view is required)
UPDATE Production.Product
SET Color = 'Midnight' 
WHERE Color = 'Black';
 
/*
Table 'v_AggregateQuantityByColor'. Scan count 1, logical reads 6, 
Table 'Workfile'.                   Scan count 0, logical reads 0, 
Table 'Worktable'.                  Scan count 2, logical reads 377, 
Table 'TransactionHistory'.         Scan count 1, logical reads 797, 
Table 'Product'.                    Scan count 1, logical reads 18, 
*/

You can see the extra work caused by the indexed view in the query plan:

Indexed View Maintenance

click to embiggen

The maintenance cost for this UPDATE statement got significantly worse. If statements like this are executed frequently it could be disastrous. That’s one of the reasons that Microsoft promotes indexed views as ideal for read-heavy scenarios such as those seen in data warehousing.

But I think that indexed views still have a place in OLTP systems. It’s just that extra care must be taken so that no SQL statement causes indexed view maintenance to be significantly worse than the regular table index maintenance. I want to talk about some things I look for when I evaluate views meant for OLTP databases.

Look For A Join Diagram Like a Snowflake Schema

Look at your view’s select statement. Specifically focus on the tables in the FROM clause and draw a “join diagram” for yourself. I’ve got a shortcut for that work. I start by running a query like this:

SELECT * FROM dbo.v_AggregateQuantityByColor OPTION (EXPAND VIEWS);

This gets me the query plan for the statement. I open the query plan in SQL Sentry’s Plan Explorer which has a handy dandy Join Diagram tab.

Using your join diagram, ask yourself these questions:

  • Does the join diagram look like a snowflake schema with one “fact” table?
  • Do the joins correspond to defined indexed foreign keys?
  • Are the columns included in the “dimension” tables modified infrequently?

Whenever I’ve dealt with poor performance caused by indexed views, these views have always given a “no” to at least one of these questions.

Example

Let’s apply these questions to dbo.v_AggregateQuantityByColor from my example. Here’s the join diagram:

Join Diagram

This diagram does in fact look like a snowflake schema with the TransactionHistory table acting as the fact table and the Product table acting as the dimension table. The one join follows an actual foreign key FK_TransactionHistory_Product_ProductID. And this foreign key is indexed (IX_TransactionHistory_ProductID).

Now lets answer the last question “Are the columns included in the dimension tables modified infrequently?”. In the context of this question, that’s the Color column in the Product table. Now it’s impossible to actually tell how frequently colors get updated in the Product table because this is a hypothetical example. But it’s unlikely that any OLTP workload would update product colors that often. So lets give the answer: “infrequently updated”.

So according to my criteria, this indexed view gets the green light. Even though it seems like it could be expensive to maintain, I don’t have any automatic objections with it because product colors are rarely updated.

FAQ

Q: Is it possible to ignore these rules and still create effective indexed views?
A: Yes!

Q: Is it possible to follow these rules and still create indexed views that cause performance problems?
A: Yes!

Q: If I follow these rules, can I skip any performance testing steps?
A: No!

Q: So why the heck am I reading this post?
A: Lots of reasons. At the very least, it is useful when you want to identify indexed views and testing scenarios that deserve extra scrutiny.

December 3, 2014

Materialized Views in SQL Server

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:28 am

What’s the difference between Oracle’s “materialized views” and SQL Server’s “indexed views”? They both persist the results of a query, but how are they different? Sometimes it’s difficult to tell.

I'm on the left (or am I?)

I’m on the left (or am I?)

One difference is that SQL Server’s indexed views are always kept up to date. In SQL Server, if a view’s base tables are modified, then the view’s indexes are also kept up to date in the same atomic transaction.

Let’s take a look at Oracle now. Oracle provides something similar called a materialized view. If Oracle’s materialized views are created without the REFRESH FAST ON COMMIT option, then the materialized view is not modified when its base tables are. So that’s one major difference. While SQL Server’s indexed views are always kept current, Oracle’s materialized views can be static.

Static Materialized Views In SQL Server?

Yeah, we just call that a table. You can use a SELECT INTO statement and it’s pretty easy. In fact, for fun I wrote a procedure that does the work for you. Given the name of a view it can create or refresh a table:

/* This is a proof-of-concept and is written for illustration purposes, don't use this in production */
create procedure dbo.s_MaterializeView
  @viewName nvarchar(300),
  @yolo bit = 0 -- use @yolo = 1 to execute the SQL immediately
as
 
declare @persistedViewName nvarchar(300);
 
if not exists (select 1 from sys.views where object_id = object_id(@viewName))
  THROW 50000, N'That @viewName does not exist', 1;
 
select 
  @viewName = QUOTENAME(object_schema_name(object_id)) 
  + N'.'
  + QUOTENAME(object_name(object_id)),
  @persistedViewName = QUOTENAME(object_schema_name(object_id)) 
  + N'.'
  + QUOTENAME(N'persisted_' + object_name(object_id))
from sys.views
where object_id = object_id(@viewName);
 
set xact_abort on;
begin tran
  declare @sql nvarchar(2000);
  set @sql = N'
    IF OBJECT_ID(''' + @persistedViewName + N''') IS NOT NULL
      DROP TABLE ' + @persistedViewName + N';
 
    SELECT *
	INTO ' + @persistedViewName + N'
    FROM ' + @viewName + N';'
 
  if (@yolo = 1)
    exec sp_executesql @sql;  
  else 	
    print @sql;
commit

Which can be used to generate sql something like this:

    IF OBJECT_ID('[dbo].[persisted_vSomeView]') IS NOT NULL
      DROP TABLE [dbo].[persisted_vSomeView];
 
    SELECT *
	INTO [dbo].[persisted_vSomeView]
    FROM [dbo].[vSomeView];

Are Such Static Materialized Views Useful?

Yes:

  • They can be used to get around all the constraints placed on regular indexed views. And if you’ve ever implemented indexed views, you understand that that’s a lot of constraints. I think this benefit is what makes this whole blog post worth consideration.
  • Because it’s static, you can avoid all the potential performance pitfalls that accompany the maintenance of an indexed view (more on this next week).
  • Good or bad, the view doesn’t have to be created with SCHEMABINDING.
  • Indexing is strictly do-it-yourself. Chances are you want more than a single heap of data for your materialized view.

… and no:

  • Most obviously, the data is static, which is another way of saying stale. But notice how Microsoft promotes indexed views. They say that indexed views are best suited for improving OLAP, data mining and other warehousing workloads. Such workloads can typically tolerate staleness better than OLTP workloads. And so maybe materialized views are a feasible alternative to indexed views.
  • You have to manage when these views get refreshed. This means scheduling jobs to do extra maintenance work (yuck). For me that’s a really high cost but it’s less costly if I can incorporate it as part of an ETL process.
  • Using Enterprise Edition, SQL Server’s query optimizer can choose to expand indexed views or not. It can’t do that with these materialized views.

I didn’t write the procedure for any important reason, I just wrote it because it was fun. But I have used this materialized view technique in SQL Server at work and I’ve been quite successful with it. It’s not something that should be used often, but it’s always worth considering if you can understand the trade-offs.

November 20, 2014

Developers, Ready for a New SQL Server Version?

Filed under: Miscelleaneous SQL,Technical Articles — Michael J. Swart @ 2:04 pm

Is your application ready to have its SQL Server upgraded to the next version? It’s important to find out because if you’re ready (and you know it) you can proceed with confidence. But if you’re not ready, you can choose to adjust your application, postpone the upgrade or do both.

You Only Live Once

But there’s some good news, chances are really good that your application is in fact ready for the next version of SQL Server. Focusing on the database engine I want to explore ways to find out for sure.

First, Inspect Your App for Discontinued Features and Other Changes

Bookmark Microsoft’s docs on the subject: SQL Server Database Engine Backward Compatibility. This is your main resource for your upgrade “certification” effort. Microsoft describes Deprecated and Discontinued features and it also describes breaking and behavior changes. The list is well-written and well-organized. Microsoft has really done well with these.

Look through your own application’s code using a code search or other means and check to see whether your application is affected by these changes. Luckily these lists have always been manageable. Microsoft introduces a lot of features, but they don’t deprecate a lot. It’s in their best interest to maintain backwards compatibility.

Just like breakfast, it’s important not to skip this step, it’s the most important one. I’ve seen this inspection step uncover more potential upgrade issues than any of the following steps.

Test Your App on a Production-like Instance

Now it’s time to test your application with the new version. But that’s easier said than done. You’ll need to:

  • Get a database server (easy) whose specs are close to what’s running in prod (less easy).
  • Find a database with the same size data as prod (a restored backup of prod if you can swing it).
  • The ability to generate a workload with the same volume and variety as seen in prod (really tricky).
  • Now upgrade to the next version of SQL Server and test.

Even if you can’t perfectly reproduce production hardware, production data or production workloads, most people expect this kind of testing. For that reason alone, this step shouldn’t be skipped. It gives people confidence.

Use the Deprecated Features Performance Counter

SQL Server provides performance counters that track when it uses a deprecated feature. And because they’re SQL Server performance counters, you can inspect the values from a query window like this:

select * 
from sys.dm_os_performance_counters
where object_name like '%Deprecated Features%'
and cntr_value > 0

These counters get reset when SQL Server is restarted. So look at a production server that has been running for a while.

But this list can be tricky. Even against the cleanest, quietest database server, you’re never going to see an empty result set. There are always a number of “problems” that get reported. And it’s not easy to sort out which ones matter. The list indicates that someone used a deprecated feature of SQL Server and that someone could be:

  1. Your application
  2. SQL Server system processes
  3. Some admin or other user writing ad hoc SQL
  4. SQL Server Management Studio

Try to determine which problems come from your application and whether you want to do anything about it.

Upgrade Advisor

The SQL Server Upgrade Advisor is a tool that Microsoft provides to help people when upgrading to a new version. It’s in Microsoft’s interest to make the whole process as painless as possible and they do a pretty good job.

The Upgrade Advisor will also look at Analysis Services, Reporting Services and Integration Services, but you’ll be focusing on the Database Engine. Optionally you can provide a workload in the form of a system trace. It’s a nice tool, but it’s rare that it will catch anything in your app that you haven’t caught already with the previous steps.

Other Considerations

Undocumented Features
Are there any undocumented trace flags that you’re using? If you’re depending on behavior of such trace flags, now’s a time to take a fresh look at what you’re trying to accomplish. The behavior is, by definition, undocumented and can change any time.

Database Compatibility Levels Other Than Latest
I like to avoid any compatibility level that’s not the most current. Mostly this is a supportability thing.

As a vendor, I want to support the fewest number of versions possible. If I begin to entertain different compatibility levels, this multiplies the number of environments I have to support.

But maybe you’re not a vendor. If you have just one single database to support. If you choose to take small steps through compatibility levels and versions, you’re probably doubling the amount of testing required.

Known Issues
If you’re not going to the latest Service Pack and/or Cumulative Update, then you should be aware of the fixes you’re choosing not to take.
It’s actually very difficult to keep up to date on what known issues are out there and which ones are severe enough to worry about. For example, if I only use the SQL Server database engine, I’m not going to be impacted by a DAX performance bug in SSAS, but I do care an awful lot about Data Corruption Bugs.
I like to read what Aaron Bertrand has to say on SQLBlog.com or SQLPerformance.com. He’s a good resource when trying to keep up with known issues in SQL Server.

Unknown Issues
Otherwise known as Avoid-RTM. This advice is less relevant lately and I think it might be for a couple reasons. Just speculating here:

  • The cynical part of me says that Microsoft will just wait a little bit and call the CU that month SP1 just to appease the people who are waiting.
  • The optimist in me sees that Microsoft lately has a really nice cloud-first deployment model where Azure customers are kicking the tires behind the scenes so that when RTM comes around, parts of the database have seen a lot of production hours they wouldn’t have otherwise. In the past I have been burned by database engine bugs in 2005 and 2008. I haven’t been burned by 2008 R2 or 2012. It’s anecdotal, but it makes me hopeful.

Does Version Matter

Not really, I wrote this post with no particular version in mind.

But I do want to mention one particular point that has to do with SQL Server 2014. The new cardinality estimator really affects the behavior of the database engine. It affects plans chosen by the query optimizer and hence it affects the performance of applications that use it.
This is why I advise everyone to give the upgrade to 2014 a little more scrutiny than an upgrade to previous versions. Look for performance regressions. This is especially true if you or other developers are fond of writing queries with a lot of query hints. I understand that regressions are more common in queries that make heavy use of hints.

October 3, 2014

Watch Out for Misleading Behaviour From SQL Server

Takeaway: To get consistent behaviour from SQL Server, I share a set of statements I like to run when performing tuning experiments.

Inconsistent Behaviour From SQL Server?

I often have conversations where a colleague wants to understand why SQL Server performs faster in some cases and slower in other cases.

The conversation usually starts “Why does SQL Server perform faster when I…” (fill in the blank):

  1. … changed the join order of the query
  2. … added a transaction
  3. … updated statistics
  4. … added a comment
  5. … crossed my fingers
  6. … simply ran it again

What’s Going On?

It can actually seem like SQL Server performs differently based on its mood. Here are some reasons that can affect the duration of queries like the ones above

  • You changed something insignificant in the query. What you may be doing is comparing the performance of a cached plan with a newly compiled plan. Examples 1 – 4 might fall under this scenario. If that’s the case, then you took a broken thing and gave it a good thump. This percussive maintenance may be good for broken jukeboxes, but maybe not for SQL Server.
  • What about those last two? Say you hit F5 to execute a query in Management Studio, and wait a minute for your results. You immediately hit F5 again and watched the same query take fifteen seconds. Then I like to point out that maybe all that data is cached in memory.

In order to do tune queries effectively, we need consistent behaviour from SQL Server, if only to test theories and be able to rely on the results. SQL Server doesn’t seem to want to give us consistent behaviour…

So Is It Possible To Get Straight Answers?

Best line from all Star Wars

But maybe we can get straight answers from SQL Server. Here’s a test framework that I like to use before all experiments when I want consistent behaviour:

/* Only do this on dev sql servers! */
CHECKPOINT 
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SET STATISTICS IO, TIME ON
-- Ctrl+M in Management Studio to include actual query plan

The first two statements are meant to clear SQL Server’s cache of data. Because of write ahead logging, SQL Server will write data changes to disk immediately, but may take its time writing data changes to disk. Executing CHECKPOINT makes SQL Server do that immediately. After the checkpoint there should be no dirty buffers. That’s why DBCC DROPCLEANBUFFERS will succeed in dropping all data from memory.

The DBCC FREEPROCCACHE command will remove all cached query plans.

These commands give SQL Server a fresh starting point. It makes it easier to compare behaviour of one query with the behaviour of another.

The SET STATISTICS IO, TIME ON and the Ctrl+M are there in order to retrieve better information about the performance of the query. Often CPU time, Logical IO, and the actual query plan are more useful when tuning queries than elapsed time.

September 18, 2014

SQL Server Ignores Trailing Spaces In Identifiers

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:27 am

Takeaway: According to SQL Server, an identifier with trailing spaces is considered equivalent to the same identifier with those spaces removed. That was unexpected to me because that’s not how other programming languages work. My investigation was interesting and I describe that here.

The First Symptom

Here’s the setting, I work with a tool developed internally that reads metadata from a database (table names, column names, column types and that sort of thing). Recently the tool told me that a table had an unexpected definition. In this case, a column name had an extra trailing space. I expected the column name "Id" (2 characters), but my tool was reporting an actual value of  "Id " (notice the blank at the end, 3 characters). That’s what started my investigation.

But that’s really weird. What would lead to a space accidentally getting tacked on to a column name? I couldn’t think of any reason. I also noticed a couple other things. Redgate SQL Compare was reporting no discrepancies and the database users weren’t complaining at all, they seemed just fine. A bug in the in-house tool seemed most likely. My hunch was that there was a problem with the way we collecting or storing these column names (how did a space sneak in there?).

Where Are Column Names Stored?

I wanted to look at the real name of the column – straight from the source – so I ran:

SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME LIKE 'Id%'

It told me that my tool wasn’t wrong. That the column was actually named "Id " with the space. So maybe Red Gate is getting its metadata from somewhere else? I know of a few places to get column information. Maybe Red Gate is getting it from one of those? Specifically I wanted to look closer at these views:

  • sys.columns
  • sys.syscolumns
  • INFORMATION_SCHEMA.COLUMNS

Because these objects are views, I used sp_helptext to learn that all the column names ultimately come from a system table called sys.syscolpars. But sys.syscolpars is a system table and you can’t look at its contents unless you connect to the database using the dedicated administrator connection. And that’s exactly what I did.

I learned that there is only one version of column names, only one place on disk that sql server persists the name of the column. It’s interesting because this implies that Red Gate’s SQL Compare trims trailing spaces from identifier names.

But Doesn’t SQL Server Care?

Well, there’s one way to check:

CREATE TABLE [MyTest] ( [id ] INT );
INSERT INTO [MyTest] VALUES (1);
 
SELECT [id ], [id] -- one column name with a space, one column name without
FROM [MyTest]; 
-- returns a dataset with column names as specified in the query.
go

Just like Red Gate’s SQL Compare, it seems like SQL Server doesn’t care about trailing spaces in identifiers either.

Google? Stackoverflow? Want to Weigh In?

A quick search led me to the extremely relevant Is SQL Server Naming trailing space insensitive?.

And that question has answers which link to the Books Online page documenting Delimited Identifiers. That page claims that “SQL Server stores the name without the trailing spaces.” Hmmm, they either mean in memory, or the page is inaccurate. I just looked at the system tables a moment ago and the trailing spaces are definitely retained.

The stackoverflow question also led me to a reported defect, the Connect item Trailing space in column names. This item was closed as “by design”. So this behavior is deliberate.

What do other SQL Vendors do?

I want to do experiments on SQL databases from other vendors but my computer doesn’t have a large number of virtual machines or playground environments. But do you know who does? SQL Fiddle
It’s very easy to use this site to see what different database vendors do. I just pick a vendor and I can try out any SQL I want. It took very little effort to be able to compile this table:

RDBMS CREATE TABLE... SELECT "id ", "id"...
MySQL Incorrect column name 'id '
Oracle Success "id": invalid identifier
PostgreSQL Success Column "id" does not exist
SQLite Success could not prepare statement (1 no such column: id)
SQL Server Success
ID ID
1 1

And What Does the ANSI standard say?

Look at the variety of behaviors from each vendor. I wonder what the “standard” implementation should be.

Mmmm... SQL Syntax rules.

I googled “ANSI SQL 92″ and found its wikipedia page and that led me to the SQL-92 Standard itself.

ANSI (paraphrased) says that

<delimited identifier> ::= <double quote><one or more characters><double quote>

And it also says explicitly that delimited identifiers can include spaces.

What About String Comparisons In General?

During my experiments on SQL Server I found myself executing this query:

SELECT *
FROM sys.columns
WHERE name = 'Id'

I was surprised to find out that my three-character "Id " column came back in the results. This means that SQL Server ignores trailing spaces for all string comparisons, not just for identifiers.

I changed my google search and looked for “sql server string comparison trailing space”. This is where I found another super-relevant document from Microsoft: INF: How SQL Server Compares Strings with Trailing Spaces.

Microsoft pointed to the ANSI standard again. I mean they explained exactly where to look, they pointed straight to (Section 8.2, , General rules #3) which is the section where ANSI explains how the comparison of two character strings is determined. The ANSI standard says that for string comparisons, the shorter string is effectively padded with trailing spaces so that comparisons can always performed on strings with an equal number of characters. Why? I don’t know.

And that’s where identifier comparisons come in. I found another part of the standard (Syntax rule #11) which tells me that Identifiers are equivalent if they compare as equivalent according to regular string comparison rules. So that’s the link between string comparisons and identifier comparisons.

Summary

There’s a number of things I learned about string comparisons. But does any of this matter? Hardly. No one deliberately chooses to name identifiers using trailing spaces. And I could have decided to sum this whole article up in a single tweet (see the title).

But did you figure out the head fake? This blog post is actually about investigation. The investigation is the interesting thing. This post describes the tools I like to use and how I use them to find things out for myself including:

  • Queries against SQL Server itself, the obvious authority on SQL Server behavior.
    • Made use of sp_helptext
    • Made use of the Dedicated Adminstrator Connection to look at system tables
  • Microsoft’s Books Online (used this twice!)
  • Microsoft Connect
  • Google
  • Stackoverflow
  • SQLFiddle
  • Wikipedia
  • the ANSI Standard

Maybe none of these resources are new or exciting. You’ve likely used many of these in the past. But that’s the point, you can find out about any topic in-depth by being a little curious and a little resourceful. I love to hear about investigation stories. Often how people find things can be at least as interesting as the actual lesson.

September 9, 2014

Take Care When Scripting Batches

Filed under: Miscelleaneous SQL,SQL Scripts,Technical Articles — Michael J. Swart @ 1:22 pm

Takeaway: When performing long-running modifications, I’m sure many of you enjoy using batches to increase concurrency. But I want to talk about a pitfall to be aware of. If you’re not careful, the method you use to implement batching can actually worsen concurrency.

... we don't need no stinkin' batches either

Why Use Batches?

Even without an explicit transaction, all SQL statements are atomic – changes are all or nothing. So when you have long-running modifications to make, locks on data can be held for the duration of your query and that can be too long. Especially if your changes are intended for live databases.

But you can make your modifications in several smaller chunks or batches. The hope is that each individual batch executes quickly and holds locks on resources for a short period of time.

But care is needed. I’m going to give an example to show what I mean. The example uses the FactOnlineSales table in the ContosoRetailDW database (available as a download here). The FactOnlineSales table has

  • one clustered index on OnlineSalesKey and no other indexes,
  • 12 million rows,
  • and 46 thousand database pages

Metrics to Use
In this example, I want to know how long each query takes because this should let me know roughly how long locks are held.
But instead of duration, I’m going to measure logical reads. It’s a little more consistent and in the examples below it’s nicely correlated with duration.

The Straight Query

Suppose we want to remove sales data from FactOnlineSales for the “Worcester Company” whose CustomerKey = 19036. That’s a simple delete statement:

DELETE FactOnlineSales WHERE CustomerKey = 19036;

This delete statement runs an unacceptably long time. It scans the clustered index and performs 46,650 logical reads and I’m worried about concurrency issues.

Naive Batching

So I try to delete 1,000 rows at a time. This implementation seems reasonable on the surface:

DECLARE	
	@RC INT = 1;
 
WHILE (@RC > 0)
BEGIN
 
  DELETE TOP (1000) FactOnlineSales
  WHERE CustomerKey = 19036;
 
  SET @RC = @@ROWCOUNT
 
END

Unfortunately, this method does poorly. It scans the clustered index in order to find 1,000 rows to delete. The first few batches complete quickly, but later batches gradually get slower as it takes longer and longer to scan the index to find rows to delete. By the time the script gets to the last batch, SQL Server has to delete rows near the very end of the clustered index and to find them, SQL Server has to scan the entire table.

In fact, this last batch performs 46,521 logical reads (just 100 fewer reads than the straight delete). And the entire script performed 1,486,285 logical reads in total. If concurrency is what I’m after, this script is actually worse than the simple DELETE statement.

Careful Batching

But I know something about the indexes on this table. I can make use of this knowledge by keeping track of my progress through the clustered index so that I can continue where I left off:

DECLARE
	@LargestKeyProcessed INT = -1,
	@NextBatchMax INT,
	@RC INT = 1;
 
WHILE (@RC > 0)
BEGIN
 
  SELECT TOP (1000) @NextBatchMax = OnlineSalesKey
  FROM FactOnlineSales
  WHERE OnlineSalesKey > @LargestKeyProcessed
    AND CustomerKey = 19036
  ORDER BY OnlineSalesKey ASC;
 
  DELETE FactOnlineSales
  WHERE CustomerKey = 19036
    AND OnlineSalesKey > @LargestKeyProcessed
    AND OnlineSalesKey <= @NextBatchMax;
 
  SET @RC = @@ROWCOUNT;
  SET @LargestKeyProcessed = @NextBatchMax;
 
END

The delete statements in this script performed 46,796 logical reads in total but no individual delete statement performed more than 6,363.

Graphically that looks like:

Logical Reads Per Delete Statement

Logical Reads Per Delete

The careful batching method runs in roughly the same time as the straight delete statement but ensures that locks are not held for long.
The naive batching method runs with an order or complexity (compared to the expected complexity of n) and can hold locks just as long as the straight delete statement.
This underlines the importance of testing for performance.

May 8, 2014

I’m Going To Help You Become A Better Writer

Filed under: Technical Articles — Michael J. Swart @ 11:06 am

I’m offering free copy editing for your technical articles.

Anyone can become a better writer through practice, but you can speed up that process with a mentor.

Grammar, the one thing that Obi Wan didn't learn from Yoda

Copy Editing?

That’s right. I want to help you with your writing. If you submit your writing to me, I’ll go through it and offer suggestions. I won’t just point out spelling and grammar mistakes. I’ll point out sentences that can be reworded or cut. And I’ll point out paragraphs that can be rearranged. Basically I’ll point out any improvement I see given your intended audience and your style. So for example, if you’re writing a rap song about SQL Server’s Cardinality Estimator, I may let some grammar mistakes (and your backbone) slide.

I don’t like to offer unsolicited advice, so consider this an invitation to solicit my advice. I welcome everyone’s writing, I likely have some advice to offer anyone brave enough to submit something for me to look at. (Unless your name is Kevin Kline. Kevin, you’re doing fine.)

How?

Easy! Fill out a form. Request Copy Editing

Why Are You Doing This Michael?

Sentences which are awkward hurts my eyes and is making me feeling some uncomfortableness and I am wanting to stop it any way.

Actually, I remember proofreading some articles for John Sansom’s DBA Jumpstart project. There was some wincing, but I found the work very rewarding. I’ve continued to do this for others in an ad hoc manner. I’ve been helping a user group friend with proofreading and I find it easy and fun.

I’ve gotten a lot of help from others in the past few years. This is me paying it forward using my talents.

Older Posts »

Powered by WordPress