I want to explain a trick I came up with that helps an OLTP database do part of the job that OLAP cubes do. (Isn’t that cute, the database engine thinks it’s a cube.*)
If for some reason you find yourself wanting to use an OLTP database (i.e. regular old database) for something that is better suited to an OLAP cube (like Analysis Services for example) then you’re in good company! It seems to be a very common challenge. Even Microsoft seems to recognize this by implementing T-SQL tricks that step over the line into OLAP territory. Grouping sets comes to mind; Also another sign is that CUBE is now a keyword.
Aggregating Data Along a Dimension Before it’s Needed.
One task that is deep in the heart of OLAP territory is reporting the aggregate of a measurement along a dimension. I want to explain how to do this efficiently inside an OLTP db.
In the following examples the OLTP database (and wannabe OLAP cube) will be the standard AdventureWorks database. I’ll be using the Sales.SalesOrderDetail table. The measurement here is the LineTotal field, and the dimension here is the Product.
So we can report on this kind of data easily by maintaining statistics using an indexed view. (BTW, this is a great use-case for indexed views).
Here’s the trick. If n is the measurement, then the only stats we need to store are the sums of n0, n1, n2, grouped by the dimension. In other words, we keep track of the count, the sum and the sum of squares.
CREATE VIEW Sales.v_SalesOrderDetailAggregates WITH SCHEMABINDING AS SELECT ProductID, SUM(LineTotal) AS Sum1, COUNT_BIG(*) AS Count1, SUM(LineTotal*LineTotal) AS Sum2 FROM Sales.SalesOrderDetail GROUP BY ProductID GO CREATE UNIQUE CLUSTERED INDEX IX_v_SalesOrderDetailAggregates ON Sales.v_SalesOrderDetailAggregates(ProductID) GO
Using the Indexed View
So that’s the clustered index I propose we create. As I’ll show later, this is enough. The work of aggregating the data is offloaded to the db. And the work is done once at creation and maintained automatically during any relevant transactions.
In the following code sample, I show how to use this view to report on many common stats (aggregates):
SET STATISTICS IO ON -- a report that aggregates LineTotal over Products SELECT ProductId, COUNT(LineTotal), SUM(LineTotal), AVG(LineTotal), STDEVP(LineTotal), STDEV(LineTotal), VARP(LineTotal), VAR(LineTotal) FROM Sales.SalesOrderDetail GROUP BY ProductID ORDER BY ProductID -- the same report (aggregate LineTotal over Products) -- that uses the indexed view. SELECT ProductID, [COUNT] = Count1, [SUM] = Sum1, [AVG] = Sum1 / Count1, [STDEVP] = 1.0/Count1 * SQRT(Count1*Sum2 - Sum1*Sum1), [STDEV] = 1.0/SQRT(Count1*(Count1-1)) * SQRT(Count1*Sum2 - Sum1*Sum1), [VARP] = (Sum2 - (Sum1*Sum1/Count1))/(Count1), [VAR] = (Sum2 - (Sum1*Sum1/Count1))/(Count1-1) FROM Sales.v_SalesOrderDetailAggregates WHERE Count1 > 1 ORDER BY ProductID /* results: (266 row(s) affected) Table 'SalesOrderDetail'. Scan count 1, logical reads 485, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. (266 row(s) affected) Table 'v_SalesOrderDetailAggregates'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. */
So why not expand the indexed view to include everything from the beginning? Well, there are a couple problems with that. First is that you’ll get an error message like:
Msg 10125, Level 16, State 1, Line 2 Cannot create index on view "AdventureWorks.Sales.v_SalesOrderDetailAggregates" because it uses aggregate "STDEV". Consider eliminating the aggregate, not indexing the view, or using alternate aggregates. For example, for AVG substitute SUM and COUNT_BIG, or for COUNT, substitute COUNT_BIG.
And so what I’ve shown here is that you can use a calculation involving COUNT_BIG, SUM(n) and SUM(n*n) to calculate common statistical aggregates like AVG, STDEV and VAR.
Besides, this strategy also takes the least space, and for a little extra work, you can aggregate on a grouping higher up in the dimension. That is, by using the entries in the index as subtotals in order to generate totals for larger groupings.
*– I’m probably missing an opportunity to include an awesome illustration for this post; something involving databases, cubes and square and round pegs.