Michael J. Swart

September 9, 2014

Take Care When Scripting Batches

Filed under: Miscelleaneous SQL,SQL Scripts,Technical Articles — Michael J. Swart @ 1:22 pm

Takeaway: When performing long-running modifications, I’m sure many of you enjoy using batches to increase concurrency. But I want to talk about a pitfall to be aware of. If you’re not careful, the method you use to implement batching can actually worsen concurrency.

... we don't need no stinkin' batches either

Why Use Batches?

Even without an explicit transaction, all SQL statements are atomic – changes are all or nothing. So when you have long-running modifications to make, locks on data can be held for the duration of your query and that can be too long. Especially if your changes are intended for live databases.

But you can make your modifications in several smaller chunks or batches. The hope is that each individual batch executes quickly and holds locks on resources for a short period of time.

But care is needed. I’m going to give an example to show what I mean. The example uses the FactOnlineSales table in the ContosoRetailDW database (available as a download here). The FactOnlineSales table has

  • one clustered index on OnlineSalesKey and no other indexes,
  • 12 million rows,
  • and 46 thousand database pages

Metrics to Use
In this example, I want to know how long each query takes because this should let me know roughly how long locks are held.
But instead of duration, I’m going to measure logical reads. It’s a little more consistent and in the examples below it’s nicely correlated with duration.

The Straight Query

Suppose we want to remove sales data from FactOnlineSales for the “Worcester Company” whose CustomerKey = 19036. That’s a simple delete statement:

DELETE FactOnlineSales WHERE CustomerKey = 19036;

This delete statement runs an unacceptably long time. It scans the clustered index and performs 46,650 logical reads and I’m worried about concurrency issues.

Naive Batching

So I try to delete 1,000 rows at a time. This implementation seems reasonable on the surface:

DECLARE	
	@RC INT = 1;
 
WHILE (@RC > 0)
BEGIN
 
  DELETE TOP (1000) FactOnlineSales
  WHERE CustomerKey = 19036;
 
  SET @RC = @@ROWCOUNT
 
END

Unfortunately, this method does poorly. It scans the clustered index in order to find 1,000 rows to delete. The first few batches complete quickly, but later batches gradually get slower as it takes longer and longer to scan the index to find rows to delete. By the time the script gets to the last batch, SQL Server has to delete rows near the very end of the clustered index and to find them, SQL Server has to scan the entire table.

In fact, this last batch performs 46,521 logical reads (just 100 fewer reads than the straight delete). And the entire script performed 1,486,285 logical reads in total. If concurrency is what I’m after, this script is actually worse than the simple DELETE statement.

Careful Batching

But I know something about the indexes on this table. I can make use of this knowledge by keeping track of my progress through the clustered index so that I can continue where I left off:

DECLARE
	@LargestKeyProcessed INT = -1,
	@NextBatchMax INT,
	@RC INT = 1;
 
WHILE (@RC > 0)
BEGIN
 
  SELECT TOP (1000) @NextBatchMax = OnlineSalesKey
  FROM FactOnlineSales
  WHERE OnlineSalesKey > @LargestKeyProcessed
    AND CustomerKey = 19036
  ORDER BY OnlineSalesKey ASC;
 
  DELETE FactOnlineSales
  WHERE CustomerKey = 19036
    AND OnlineSalesKey > @LargestKeyProcessed
    AND OnlineSalesKey <= @NextBatchMax;
 
  SET @RC = @@ROWCOUNT;
  SET @LargestKeyProcessed = @NextBatchMax;
 
END

The delete statements in this script performed 46,796 logical reads in total but no individual delete statement performed more than 6,363.

Graphically that looks like:

Logical Reads Per Delete Statement

Logical Reads Per Delete

The careful batching method runs in roughly the same time as the straight delete statement but ensures that locks are not held for long.
The naive batching method runs with an order or complexity (compared to the expected complexity of n) and can hold locks just as long as the straight delete statement.
This underlines the importance of testing for performance.

12 Comments »

  1. Ah yes, a lesson I’ve re-learned a few times. Usually it goes something like this: Start a batched operation, it’s running long so let it go over night (there’s lots of rows to go through after all), come back to work the next day to find it still running, realize I did something stupid, fix the batched query, completes in minutes.

    Comment by Aaron Cooper — September 9, 2014 @ 10:28 pm

  2. […] Take Care When Scripting Batches – Michael J. Swart (Blog|Twitter) […]

    Pingback by (SFTW) SQL Server Links 12/09/14 - John Sansom — September 12, 2014 @ 10:03 am

  3. Short and sweet. Thank you Michael.

    Comment by Srdjan — September 17, 2014 @ 2:20 pm

  4. Hi Michael,

    I really like this idea, and I’ve been applying it to a lot of my batched operations. Here’s a simplified version of an update I did. It ran through about 900k rows in 50 seconds. Let me know if you see a better way to do this.

    Thanks

    DECLARE @HighKey BIGINT = -1 ,
        @BatchMx BIGINT ,
        @Origin NVARCHAR(2000) = N'\\old\path\' ,
        @Destination NVARCHAR(2000) = N'\\new\path\';
     
     
    WHILE EXISTS ( SELECT TOP 1
                            *
                   FROM     [FILE] WITH ( NOLOCK )
                   WHERE    Location LIKE @Origin + N'%' )
        BEGIN
     
            SELECT TOP ( 10000 )
                    @BatchMx = FileID
            FROM    [FILE] WITH ( NOLOCK )
            WHERE   Location LIKE @Origin + N'%'
            ORDER BY FileID;
     
            UPDATE  F
            SET     F.Location = REPLACE(F.Location, @Origin, @Destination)
            FROM    [FILE] F
            WHERE   F.FileID > @HighKey
                    AND F.FileID <= @BatchMx
                    AND Location LIKE @Origin + N'%';
     
            SET @HighKey = @BatchMx;
     
        END

    Comment by Erik — November 18, 2014 @ 2:15 pm

  5. Hi Erik,
    Thanks for stopping by.
    In each loop, when you’re selecting the next @BatchMx value, you don’t restrict FileID to be larger than the previous @HighKey.
    This means you may find yourself seeing the same poor behavior I described with the Naive batching method. But it’s actually pretty easy to measure.

    Your locks will be reduced on each batch (because you’re limiting the number of rows that get updated) but I wonder about the performance of each batch:
    Run the script on a test machine and collect the stats (IO/CPU) of each statement (using a trace). Compare it to the stats you get for the corresponding straight query:

    UPDATE  F
    SET     F.Location = REPLACE(F.Location, @Origin, @Destination)
    FROM    [FILE] F
    WHERE   Location LIKE @Origin + N'%';

    Comment by Michael J. Swart — November 20, 2014 @ 9:22 am

  6. Hi Michael,

    Are you saying where I select top (N), I should be filtering on the @BatchMx variable?

    I’ve reviewed lots of information about this vs. other methods using SQL Sentry monitoring tools. The only bottlenecks I’ve noticed are when I don’t/can’t have indexes that support other WHERE conditions I may use. For instance, the last time I ran this I was filtering on a bit column that I wasn’t allowed to put an index on.

    I’ll try your suggestion out today and follow up.

    Thanks,
    Erik

    Comment by Erik — November 20, 2014 @ 9:48 am

  7. No, not on the @BatchMx variable. Where you select top (n), I would add this filter:

    AND FileID > @HighKey

    So presumably, SQL Server can search for keys to update where it left off.

    I don’t know everything about your [FILE] table, for example, if there’s an index on Location, then the naive batching method will be very efficient.

    But the main authority here is SQL Server. Test it and see. If there are no bottlenecks with the method you’re using, then it’s good.

    (Also, if you don’t have an index on Location, the WHILE statement can become expensive too)

    Michael

    Comment by Michael J. Swart — November 20, 2014 @ 9:59 am

  8. Okay, that makes way more sense. I’ll give that a go.

    No index on Location. It’s NVARCHAR(2000) – a pretty long UNC file path – and part of a vendor product that I can’t really alter.

    Thanks,
    Erik

    Comment by Erik — November 20, 2014 @ 10:12 am

  9. Hi Michael,

    If we could add an index on CustomerKey, which one of the three approaches would have been the best?
    I guess that the first one, but it may depend on the number of records to be deleted.
    Maybe if the number of records for CustomerKey = 19036 we should take into consideration the log size required to run a single delete, and then run it batched. Would, in this case with the new index, the “naive batching” be good?

    I guess yes.

    Thanks

    Comment by Jaime — April 23, 2015 @ 2:36 am

  10. Hi Jaime,

    If we had added an index on CustomerKey, then that does change the story.
    If we add an index on CustomerKey, then that index creation has the largest impact on the log. But I’m not actually worried about that.

    I added the index you suggested and using the first method (the straight query), I see that the number of logical reads has jumped to 410,000! It’s not immediately obvious why. It has to do with the fact that now we have two indexes to delete from. So instead of

    1. Scan the whole clustered index
    2. Delete matching rows as you find them

    we have

    1. Look up the customer rows using the new index and for each row:
    2. Look up the corresponding clustered index row and delete it
    3. Delete the row from the new index

    Those lookups make all the difference

    As for the second method (the naive batching method with new index). Because of that index, it automatically performs equally as well as the third method (careful batching with new index).

    Comment by Michael J. Swart — April 27, 2015 @ 10:07 am

  11. Hi Michael,

    Thanks for your answer!

    We were experiencing the problem with the lookups and the “surprising” increase of logical reads, and we had determined exactly the same reason.
    But it is somehow disturbing that SQL performs worse with a non-clustered index in the field(s) that you use to filter the records to be deleted.
    In this scenario, the lookup will always be needed, and the bigger the amount of rows to delete is, the worse the performance will be. Am I right?
    Then SQL should determine that it would be a better execution plan if it chose the clustered index to delete.

    Jaime

    Comment by Jaime — April 27, 2015 @ 10:27 am

  12. Hi Jaime,
    SELECT statements and other queries can always perform better when appropriate indexes are added. But we pay for this performance gain by being obliged to maintain the indexes. This means that DELETE, UPDATE and INSERT statements must perform an extra operation for every extra index that’s involved. And the coordination of these actions can get expensive as you’ve discovered.

    You asked another question about the performance of a delete operation and the execution plan that SQL Server chooses. I’m not quite sure I understood that. SQL Server usually chooses pretty well.

    Comment by Michael J. Swart — April 27, 2015 @ 11:07 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress