Watch Out For This Use Case When Using Read Committed Snapshot Isolation

October 5, 2023

Watch Out For This Use Case When Using Read Committed Snapshot Isolation

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 9:00 am

Takeaway: If you want to extract rows from a table periodically as part of an ETL operation and if you use Read Committed Snapshot Isolation (RCSI), be very careful or you may miss some rows.

David Rose thinks we were looking for "Mister Rose" not "missed rows".

Yesterday, Kendra Little talked a bit about Lost Updates under RCSI. It’s a minor issue that can pop up after turning on RCSI as the default behavior for the Read Committed isolation level. But she doesn’t want to dissuade you from considering the option and I agree with that advice.

In fact, even though we turned RCSI on years ago, by a bizarre coincidence, we only came across our first RCSI-related issue very recently. But it wasn’t update related. Instead, it has to do with an ETL process. To explain it better, consider this demo:

Set up a database called TestRCSI

CREATE DATABASE TestRCSI;
ALTER DATABASE TestRCSI SET READ_COMMITTED_SNAPSHOT ON;

Set up a table called LOGS

use TestRCSI;
 
CREATE TABLE LOGS (
    LogId INT IDENTITY PRIMARY KEY,
    Value CHAR(100) NOT NULL DEFAULT '12345'
);
 
INSERT LOGS DEFAULT VALUES;
INSERT LOGS DEFAULT VALUES;
INSERT LOGS DEFAULT VALUES;

Create a procedure to extract new rows

We want to extract rows from a table whose LogId is greater than any LogId we’ve already seen. That can be done with this procedure:

CREATE PROCEDURE s_FetchLogs ( @AfterLogId INT ) 
AS
    SELECT LogId, Value
    FROM LOGS
    WHERE LogId > @AfterLogid;
GO

That seems straightforward. Now every time you perform that ETL operation, just remember the largest LogId from the results. That value can be used the next time you call the procedure. Such a value is called a “watermark”.

Multiple sessions doing INSERTs concurrently

Things can get a little tricky if we insert rows like this:
Session A:

    INSERT LOGS DEFAULT VALUES; /* e.g. LogId=4  */

Session B:

    BEGIN TRAN
    INSERT LOGS DEFAULT VALUES; /* e.g. LogId=5  */
    /* No commit or rollback, leave this transaction open */

Session A:

    INSERT LOGS DEFAULT VALUES; /* e.g. LogId=6  */
    EXEC s_FetchLogs @AfterLogId = 3;

You’ll see:

And you may start to see what the issue is. Row 5 hasn’t been committed yet and if you’re wondering whether it will get picked up the next time the ETL is run, the answer is no. The max row in the previous results is 6, so the next call will look like this:

    EXEC s_FetchLogs @AfterLogId = 6;

It will leave the row with LogId = 5 behind entirely. This ETL process has missed a row.

What’s the deal?

It’s important to realize that there’s really no defect here. There is no isolation level that really guarantees “sequentiality” or “contiguousness” of inserted sequences this way. That property is not really guaranteed by any isolation level or by any of the letters in ACID. But it still is behavior that we want to understand and do something about.

Transactions do not really occur at a single point in time, they have beginnings and ends and we can’t assume the duration of a transaction is zero. Single-statement transactions are no exception. The important point is that the time a row is created is not the same time as it’s committed. And when several rows are created by many sessions concurrently, the order that rows are created are not necessarily the order that they’re committed!

With any version of READ COMMITTED, the rows created by other sessions only become visible after they’re committed and if the rows are not committed sequentially, they don’t become visible sequentially. This behavior is not particular to identity column values, it also applies to:

Sequence values
rowversion values
System-maintained LastModified columns
Really any kind of DateTime column with “Now” as the default

So if:

columns like these are used as watermarks for an ETL strategy
and the table experiences concurrent inserts
and Read Committed Snapshot Isolation is enabled

then the process is vulnerable to this missed row issue.

This issue feels like some sort of Phantom Read problem, but it’s not that exactly. Something different is going on in an interesting way. Rows are inserted in a table such that column values are expected to always increase. That expectation is the interesting thing. So when transactions are committed “out of order” then those rows become visible out of order. The expectation is not met and that’s the issue.

Solutions (pessimistic locking)

If you turn off RCSI and run the demo over again, you’ll notice that running s_FetchLogs in Session A will be blocked until the transaction in Session B is committed. When Session A is finally unblocked, we get the full results (including row 5) as expected:

Here’s why this works. Any newly created (but uncommitted) row will exist in the table. But the transaction that created it still has an exclusive lock on it. Without RCSI, if another session tries to scan that part of the index it will wait to grab a shared lock on that row. Problem solved.

But turning off RCSI is overkill. We can be a little more careful. For example, instead of leaving RCSI off all together, do it just for the one procedure like this:

CREATE OR ALTER PROCEDURE s_FetchLogs ( @AfterLogId INT ) 
AS
    SELECT LogId, Value
    FROM LOGS WITH(READCOMMITTEDLOCK)
    WHERE LogId > @AfterLogid;
GO

In the exact same way, this procedure will wait to see whether any uncommitted rows it encounters will be rolled back or committed. No more missing rows for your ETL process!

-- Comments (5)

5 Comments »

[…] Michael J. Swart shares a warning: […]

Pingback by RCSI and ID-Driven ETL – Curated SQL — October 6, 2023 @ 8:05 am
I’d be interested in knowing your thoughts on using Change Tracking to manage this. AFAIK CT changes are guaranteed to be sequential…

Comment by Toby — October 9, 2023 @ 1:50 am
There is another solution to this problem: using Rowversion with the function MIN_ACTIVE_ROWVERSION(). Each table you have in your ETL would need a rowversion column. Then, each query within the ETL would need to filter the rowversion column being less than the value returned by the MIN_ACTIVE_ROWVERSION() function. See https://learn.microsoft.com/en-us/sql/t-sql/functions/min-active-rowversion-transact-sql?view=sql-server-ver16.

Comment by Robert — October 10, 2023 @ 9:24 am
@Toby, Yes, Change tracking makes use of a SYS_CHANGE_VERSION in the tracking table which can be relied upon to be sequential.

@Robert, that’s very good! I wonder why I’ve never heard of that. I’m ditching @@DBTS today in favor of MIN_ACTIVE_ROWVERSION()!

Comment by Michael J. Swart — October 10, 2023 @ 1:04 pm
[…] wanted to talk about a nice article from Michael J. Swart about Read Committed Snapshot Isolation and one risk you can run into if you design ETL processes […]

Pingback by Shop Talk: 2023-10-23 – Shop Talk with TriPASS — October 26, 2023 @ 9:31 am