Takeaway: If you want to extract rows from a table periodically as part of an ETL operation and if you use Read Committed Snapshot Isolation (RCSI), be very careful or you may miss some rows.
Yesterday, Kendra Little talked a bit about Lost Updates under RCSI. It’s a minor issue that can pop up after turning on RCSI as the default behavior for the Read Committed isolation level. But she doesn’t want to dissuade you from considering the option and I agree with that advice.
In fact, even though we turned RCSI on years ago, by a bizarre coincidence, we only came across our first RCSI-related issue very recently. But it wasn’t update related. Instead, it has to do with an ETL process. To explain it better, consider this demo:
Set up a database called TestRCSI
CREATE DATABASE TestRCSI; ALTER DATABASE TestRCSI SET READ_COMMITTED_SNAPSHOT ON; |
Set up a table called LOGS
use TestRCSI; CREATE TABLE LOGS ( LogId INT IDENTITY PRIMARY KEY, Value CHAR(100) NOT NULL DEFAULT '12345' ); INSERT LOGS DEFAULT VALUES; INSERT LOGS DEFAULT VALUES; INSERT LOGS DEFAULT VALUES; |
Create a procedure to extract new rows
We want to extract rows from a table whose LogId is greater than any LogId we’ve already seen. That can be done with this procedure:
CREATE PROCEDURE s_FetchLogs ( @AfterLogId INT ) AS SELECT LogId, Value FROM LOGS WHERE LogId > @AfterLogid; GO |
That seems straightforward. Now every time you perform that ETL operation, just remember the largest LogId from the results. That value can be used the next time you call the procedure. Such a value is called a “watermark”.
Multiple sessions doing INSERTs concurrently
Things can get a little tricky if we insert rows like this:
Session A:
INSERT LOGS DEFAULT VALUES; /* e.g. LogId=4 */ |
Session B:
BEGIN TRAN INSERT LOGS DEFAULT VALUES; /* e.g. LogId=5 */ /* No commit or rollback, leave this transaction open */ |
Session A:
INSERT LOGS DEFAULT VALUES; /* e.g. LogId=6 */ EXEC s_FetchLogs @AfterLogId = 3; |
You’ll see:
And you may start to see what the issue is. Row 5 hasn’t been committed yet and if you’re wondering whether it will get picked up the next time the ETL is run, the answer is no. The max row in the previous results is 6, so the next call will look like this:
EXEC s_FetchLogs @AfterLogId = 6; |
It will leave the row with LogId = 5
behind entirely. This ETL process has missed a row.
What’s the deal?
It’s important to realize that there’s really no defect here. There is no isolation level that really guarantees “sequentiality” or “contiguousness” of inserted sequences this way. That property is not really guaranteed by any isolation level or by any of the letters in ACID. But it still is behavior that we want to understand and do something about.
Transactions do not really occur at a single point in time, they have beginnings and ends and we can’t assume the duration of a transaction is zero. Single-statement transactions are no exception. The important point is that the time a row is created is not the same time as it’s committed. And when several rows are created by many sessions concurrently, the order that rows are created are not necessarily the order that they’re committed!
With any version of READ COMMITTED, the rows created by other sessions only become visible after they’re committed and if the rows are not committed sequentially, they don’t become visible sequentially. This behavior is not particular to identity column values, it also applies to:
- Sequence values
- rowversion values
- System-maintained LastModified columns
- Really any kind of DateTime column with “Now” as the default
So if:
- columns like these are used as watermarks for an ETL strategy
- and the table experiences concurrent inserts
- and Read Committed Snapshot Isolation is enabled
then the process is vulnerable to this missed row issue.
This issue feels like some sort of Phantom Read problem, but it’s not that exactly. Something different is going on in an interesting way. Rows are inserted in a table such that column values are expected to always increase. That expectation is the interesting thing. So when transactions are committed “out of order” then those rows become visible out of order. The expectation is not met and that’s the issue.
Solutions (pessimistic locking)
If you turn off RCSI and run the demo over again, you’ll notice that running s_FetchLogs
in Session A will be blocked until the transaction in Session B is committed. When Session A is finally unblocked, we get the full results (including row 5) as expected:
Here’s why this works. Any newly created (but uncommitted) row will exist in the table. But the transaction that created it still has an exclusive lock on it. Without RCSI, if another session tries to scan that part of the index it will wait to grab a shared lock on that row. Problem solved.
But turning off RCSI is overkill. We can be a little more careful. For example, instead of leaving RCSI off all together, do it just for the one procedure like this:
CREATE OR ALTER PROCEDURE s_FetchLogs ( @AfterLogId INT ) AS SELECT LogId, Value FROM LOGS WITH(READCOMMITTEDLOCK) WHERE LogId > @AfterLogid; GO |
In the exact same way, this procedure will wait to see whether any uncommitted rows it encounters will be rolled back or committed. No more missing rows for your ETL process!