Takeaway: When performing long-running modifications, I’m sure many of you enjoy using batches to increase concurrency. But I want to talk about a pitfall to be aware of. If you’re not careful, the method you use to implement batching can actually worsen concurrency.
Why Use Batches?
Even without an explicit transaction, all SQL statements are atomic – changes are all or nothing. So when you have long-running modifications to make, locks on data can be held for the duration of your query and that can be too long. Especially if your changes are intended for live databases.
But you can make your modifications in several smaller chunks or batches. The hope is that each individual batch executes quickly and holds locks on resources for a short period of time.
But care is needed. I’m going to give an example to show what I mean. The example uses the
FactOnlineSales table in the
ContosoRetailDW database (available as a download here). The
FactOnlineSales table has
- one clustered index on
OnlineSalesKeyand no other indexes,
- 12 million rows,
- and 46 thousand database pages
Metrics to Use
In this example, I want to know how long each query takes because this should let me know roughly how long locks are held.
But instead of duration, I’m going to measure logical reads. It’s a little more consistent and in the examples below it’s nicely correlated with duration.
The Straight Query
Suppose we want to remove sales data from
FactOnlineSales for the “Worcester Company” whose
CustomerKey = 19036. That’s a simple delete statement:
DELETE FactOnlineSales WHERE CustomerKey = 19036;
This delete statement runs an unacceptably long time. It scans the clustered index and performs 46,650 logical reads and I’m worried about concurrency issues.
So I try to delete 1,000 rows at a time. This implementation seems reasonable on the surface:
DECLARE @RC INT = 1; WHILE (@RC > 0) BEGIN DELETE TOP (1000) FactOnlineSales WHERE CustomerKey = 19036; SET @RC = @@ROWCOUNT END
Unfortunately, this method does poorly. It scans the clustered index in order to find 1,000 rows to delete. The first few batches complete quickly, but later batches gradually get slower as it takes longer and longer to scan the index to find rows to delete. By the time the script gets to the last batch, SQL Server has to delete rows near the very end of the clustered index and to find them, SQL Server has to scan the entire table.
In fact, this last batch performs 46,521 logical reads (just 100 fewer reads than the straight delete). And the entire script performed 1,486,285 logical reads in total. If concurrency is what I’m after, this script is actually worse than the simple DELETE statement.
But I know something about the indexes on this table. I can make use of this knowledge by keeping track of my progress through the clustered index so that I can continue where I left off:
DECLARE @LargestKeyProcessed INT = -1, @NextBatchMax INT, @RC INT = 1; WHILE (@RC > 0) BEGIN SELECT TOP (1000) @NextBatchMax = OnlineSalesKey FROM FactOnlineSales WHERE OnlineSalesKey > @LargestKeyProcessed AND CustomerKey = 19036 ORDER BY OnlineSalesKey ASC; DELETE FactOnlineSales WHERE CustomerKey = 19036 AND OnlineSalesKey > @LargestKeyProcessed AND OnlineSalesKey <= @NextBatchMax; SET @RC = @@ROWCOUNT; SET @LargestKeyProcessed = @NextBatchMax; END
The delete statements in this script performed 46,796 logical reads in total but no individual delete statement performed more than 6,363.
Graphically that looks like:
The careful batching method runs in roughly the same time as the straight delete statement but ensures that locks are not held for long.
The naive batching method runs with an order or n² complexity (compared to the expected complexity of n) and can hold locks just as long as the straight delete statement.
This underlines the importance of testing for performance.