Designing Indexed Views for OLTP Workloads

January 23, 2015

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

When I look at indexed views defined on OLTP databases, I’m encouraged when their join diagrams resemble snowflake schemas.

It must be nice in Vermont this time of the year.

When you create an indexed view, SQL Server will enforce a number of restrictions. These restrictions ensure that your views are deterministic and easy to maintain. The restrictions are more than a recommendation, SQL Server simply won’t let you create the index if your view doesn’t meet those criteria.

Indexed Views Can Sometimes Cause Poor Performance

I once thought that if I followed Microsoft’s prerequisites for indexed views, then the maintenance of those indexed views was guaranteed to always be safe. I thought the restrictions would guarantee performance comparable to the maintenance of a regular index. But I was wrong, sometimes it can be much worse. Let’s look an example I invented for this post. Check out the following UPDATE statement. SQL Server reports eighteen logical reads:

use AdventureWorks2012
 
SET STATISTICS IO ON
 
-- An update of the Product table (no indexed views defined)
UPDATE Production.Product
SET Color = 'Midnight' 
WHERE Color = 'Black';
 
-- Table 'Product'. Scan count 1, logical reads 18

But when I create this indexed view:

CREATE VIEW dbo.v_AggregateQuantityByColor WITH SCHEMABINDING
AS 
 
SELECT 
  p.Color,
  SUM(th.Quantity) AS [total quantity],
  COUNT_BIG(*) AS [transaction count]
FROM Production.TransactionHistory th
JOIN Production.Product p
  ON th.ProductID = p.ProductID
GROUP BY p.Color;
 
GO
 
CREATE UNIQUE CLUSTERED INDEX ix_v_AggregateQuantityByColor
  ON dbo.v_AggregateQuantityByColor (Color)
 
GO

Then the same UPDATE statement becomes significantly more expensive requiring over 1000 reads:

use AdventureWorks2012
 
SET STATISTICS IO ON
 
-- An update of the Product table (maintenance of an indexed view is required)
UPDATE Production.Product
SET Color = 'Midnight' 
WHERE Color = 'Black';
 
/*
Table 'v_AggregateQuantityByColor'. Scan count 1, logical reads 6, 
Table 'Workfile'.                   Scan count 0, logical reads 0, 
Table 'Worktable'.                  Scan count 2, logical reads 377, 
Table 'TransactionHistory'.         Scan count 1, logical reads 797, 
Table 'Product'.                    Scan count 1, logical reads 18, 
*/

You can see the extra work caused by the indexed view in the query plan:

click to embiggen

The maintenance cost for this UPDATE statement got significantly worse. If statements like this are executed frequently it could be disastrous. That’s one of the reasons that Microsoft promotes indexed views as ideal for read-heavy scenarios such as those seen in data warehousing.

But I think that indexed views still have a place in OLTP systems. It’s just that extra care must be taken so that no SQL statement causes indexed view maintenance to be significantly worse than the regular table index maintenance. I want to talk about some things I look for when I evaluate views meant for OLTP databases.

Look For A Join Diagram Like a Snowflake Schema

Look at your view’s select statement. Specifically focus on the tables in the FROM clause and draw a “join diagram” for yourself. I’ve got a shortcut for that work. I start by running a query like this:

SELECT * FROM dbo.v_AggregateQuantityByColor OPTION (EXPAND VIEWS);

This gets me the query plan for the statement. I open the query plan in SQL Sentry’s Plan Explorer which has a handy dandy Join Diagram tab.

Using your join diagram, ask yourself these questions:

Does the join diagram look like a snowflake schema with one “fact” table?
Do the joins correspond to defined indexed foreign keys?
Are the columns included in the “dimension” tables modified infrequently?

Whenever I’ve dealt with poor performance caused by indexed views, these views have always given a “no” to at least one of these questions.

Example

Let’s apply these questions to dbo.v_AggregateQuantityByColor from my example. Here’s the join diagram:

This diagram does in fact look like a snowflake schema with the TransactionHistory table acting as the fact table and the Product table acting as the dimension table. The one join follows an actual foreign key FK_TransactionHistory_Product_ProductID. And this foreign key is indexed (IX_TransactionHistory_ProductID).

Now lets answer the last question “Are the columns included in the dimension tables modified infrequently?”. In the context of this question, that’s the Color column in the Product table. Now it’s impossible to actually tell how frequently colors get updated in the Product table because this is a hypothetical example. But it’s unlikely that any OLTP workload would update product colors that often. So lets give the answer: “infrequently updated”.

So according to my criteria, this indexed view gets the green light. Even though it seems like it could be expensive to maintain, I don’t have any automatic objections with it because product colors are rarely updated.

FAQ

Q: Is it possible to ignore these rules and still create effective indexed views?
A: Yes!

Q: Is it possible to follow these rules and still create indexed views that cause performance problems?
A: Yes!

Q: If I follow these rules, can I skip any performance testing steps?
A: No!

Q: So why the heck am I reading this post?
A: Lots of reasons. At the very least, it is useful when you want to identify indexed views and testing scenarios that deserve extra scrutiny.

-- Comments (2)

2 Comments »

I’ve gone through this evaluation a number of times and the time I’ve spent has led to a better understanding of the real cost of indexed views. In fact, I sometimes like to extend this process. For example, I like to mark my join diagrams with one to many labels, 1..*. And I replace * with a number that represents the actual worst case. In the case of AdventureWorks’ Person.vStateProvinceCountryRegion indexed view, the label is 1..96. And 96 may be tolerable.

Comment by Michael J. Swart — January 23, 2015 @ 8:42 am
Yeah, good post. Also think about the perf benefits. You can in effect index on columns of multiple tables. That’s a very powerful tool.

Comment by tobi — January 24, 2015 @ 5:10 am

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

January 23, 2015