Michael J. Swart

October 5, 2023

Watch Out For This Use Case When Using Read Committed Snapshot Isolation

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 9:00 am

Takeaway: If you want to extract rows from a table periodically as part of an ETL operation and if you use Read Committed Snapshot Isolation (RCSI), be very careful or you may miss some rows.

David Rose thinks we were looking for "Mister Rose" not "missed rows".

Yesterday, Kendra Little talked a bit about Lost Updates under RCSI. It’s a minor issue that can pop up after turning on RCSI as the default behavior for the Read Committed isolation level. But she doesn’t want to dissuade you from considering the option and I agree with that advice.

In fact, even though we turned RCSI on years ago, by a bizarre coincidence, we only came across our first RCSI-related issue very recently. But it wasn’t update related. Instead, it has to do with an ETL process. To explain it better, consider this demo:

Set up a database called TestRCSI

CREATE DATABASE TestRCSI;
ALTER DATABASE TestRCSI SET READ_COMMITTED_SNAPSHOT ON;

Set up a table called LOGS

use TestRCSI;
 
CREATE TABLE LOGS (
    LogId INT IDENTITY PRIMARY KEY,
    Value CHAR(100) NOT NULL DEFAULT '12345'
);
 
INSERT LOGS DEFAULT VALUES;
INSERT LOGS DEFAULT VALUES;
INSERT LOGS DEFAULT VALUES;

Create a procedure to extract new rows

We want to extract rows from a table whose LogId is greater than any LogId we’ve already seen. That can be done with this procedure:

CREATE PROCEDURE s_FetchLogs ( @AfterLogId INT ) 
AS
    SELECT LogId, Value
    FROM LOGS
    WHERE LogId > @AfterLogid;
GO

That seems straightforward. Now every time you perform that ETL operation, just remember the largest LogId from the results. That value can be used the next time you call the procedure. Such a value is called a “watermark”.

Multiple sessions doing INSERTs concurrently

Things can get a little tricky if we insert rows like this:
Session A:

    INSERT LOGS DEFAULT VALUES; /* e.g. LogId=4  */

Session B:

    BEGIN TRAN
    INSERT LOGS DEFAULT VALUES; /* e.g. LogId=5  */
    /* No commit or rollback, leave this transaction open */

Session A:

    INSERT LOGS DEFAULT VALUES; /* e.g. LogId=6  */
    EXEC s_FetchLogs @AfterLogId = 3;

You’ll see:

Results showing two rows with LogId=4 and LogId=6

And you may start to see what the issue is. Row 5 hasn’t been committed yet and if you’re wondering whether it will get picked up the next time the ETL is run, the answer is no. The max row in the previous results is 6, so the next call will look like this:

    EXEC s_FetchLogs @AfterLogId = 6;

It will leave the row with LogId = 5 behind entirely. This ETL process has missed a row.

What’s the deal?

It’s important to realize that there’s really no defect here. There is no isolation level that really guarantees “sequentiality” or “contiguousness” of inserted sequences this way. That property is not really guaranteed by any isolation level or by any of the letters in ACID. But it still is behavior that we want to understand and do something about.

Transactions do not really occur at a single point in time, they have beginnings and ends and we can’t assume the duration of a transaction is zero. Single-statement transactions are no exception. The important point is that the time a row is created is not the same time as it’s committed. And when several rows are created by many sessions concurrently, the order that rows are created are not necessarily the order that they’re committed!

With any version of READ COMMITTED, the rows created by other sessions only become visible after they’re committed and if the rows are not committed sequentially, they don’t become visible sequentially. This behavior is not particular to identity column values, it also applies to:

So if:

  • columns like these are used as watermarks for an ETL strategy
  • and the table experiences concurrent inserts
  • and Read Committed Snapshot Isolation is enabled

then the process is vulnerable to this missed row issue.

This issue feels like some sort of Phantom Read problem, but it’s not that exactly. Something different is going on in an interesting way. Rows are inserted in a table such that column values are expected to always increase. That expectation is the interesting thing. So when transactions are committed “out of order” then those rows become visible out of order. The expectation is not met and that’s the issue.

Solutions (pessimistic locking)

If you turn off RCSI and run the demo over again, you’ll notice that running s_FetchLogs in Session A will be blocked until the transaction in Session B is committed. When Session A is finally unblocked, we get the full results (including row 5) as expected:

Results of a query which contain three rows with LogIds 4, 5 and 6

Here’s why this works. Any newly created (but uncommitted) row will exist in the table. But the transaction that created it still has an exclusive lock on it. Without RCSI, if another session tries to scan that part of the index it will wait to grab a shared lock on that row. Problem solved.

But turning off RCSI is overkill. We can be a little more careful. For example, instead of leaving RCSI off all together, do it just for the one procedure like this:

CREATE OR ALTER PROCEDURE s_FetchLogs ( @AfterLogId INT ) 
AS
    SELECT LogId, Value
    FROM LOGS WITH(READCOMMITTEDLOCK)
    WHERE LogId > @AfterLogid;
GO

In the exact same way, this procedure will wait to see whether any uncommitted rows it encounters will be rolled back or committed. No more missing rows for your ETL process!

August 16, 2023

Deploying Resource Governor Using Online Scripts

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:07 pm

When I deploy database changes, I like my scripts to be quick, non-blocking, rerunnable and resumable. I’ve discovered that:

  • Turning on Resource Governor is quick and online
  • Turning off Resource Governor is quick and online
  • Cleaning or removing configuration is easy
  • Modifying configuration may take some care

Turning on Resource Governor

Just like sp_configure, Resource Governor is configured in two steps. The first step is to specify the configuration you want, the second step is to ALTER RESOURCE GOVERNOR RECONFIGURE.
But unlike sp_configure which has a “config_value” column and a “run_value” column, there’s no single view that makes it easy to determine what values are configured, and what values are in use. It turns out that the catalog views are the configured values and the dynamic management views are the current values in use:

Catalog Views (configuration)

  • sys.resource_governor_configuration
  • sys.resource_governor_external_resource_pools
  • sys.resource_governor_resource_pools
  • sys.resource_governor_workload_groups

Dynamic Management Views (running values and stats)

  • sys.dm_resource_governor_configuration
  • sys.dm_resource_governor_external_resource_pools
  • sys.dm_resource_governor_resource_pools
  • sys.dm_resource_governor_workload_groups

When a reconfigure is pending, these views can contain different information and getting them straight is the key to writing rerunnable deployment scripts.

Turning on Resource Governor (Example)

Despite Erik Darling’s warning, say you want to restrict SSMS users to MAXDOP 1:

Plot a Course

use master;
 
IF NOT EXISTS (
	SELECT *
	FROM sys.resource_governor_resource_pools
	WHERE name = 'SSMSPool'
)
BEGIN
	CREATE RESOURCE POOL SSMSPool;
END
 
IF NOT EXISTS (
	SELECT *
	FROM sys.resource_governor_workload_groups
	WHERE name = 'SSMSGroup'
)
BEGIN
	CREATE WORKLOAD GROUP SSMSGroup 
	WITH (MAX_DOP = 1)
	USING SSMSPool;
END
 
IF ( OBJECT_ID('dbo.resource_governor_classifier') IS NULL )
BEGIN
	DECLARE @SQL NVARCHAR(1000) = N'
CREATE FUNCTION dbo.resource_governor_classifier() 
	RETURNS sysname 
	WITH SCHEMABINDING
AS
BEGIN
 
	RETURN 
		CASE APP_NAME()
			WHEN ''Microsoft SQL Server Management Studio - Query'' THEN ''SSMSGroup''
			ELSE ''default''
		END;
END';
	exec sp_executesql @SQL;
END;
 
IF NOT EXISTS (
	SELECT *
	FROM sys.resource_governor_configuration /* config */
	WHERE classifier_function_id = OBJECT_ID('dbo.resource_governor_classifier') )
   AND OBJECT_ID('dbo.resource_governor_classifier') IS NOT NULL
BEGIN
	ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.resource_governor_classifier); 
END

And when you’re ready, RECONFIGURE:

Make it so

IF EXISTS (
	SELECT *
	FROM sys.dm_resource_governor_configuration
	WHERE is_reconfiguration_pending = 1
) OR EXISTS (
	SELECT *
	FROM sys.resource_governor_configuration
	WHERE is_enabled = 0
)
BEGIN
	ALTER RESOURCE GOVERNOR RECONFIGURE;
END
GO

Turning off Resource Governor

Pretty straightforward, the emergency stop button looks like this:

ALTER RESOURCE GOVERNOR DISABLE;

If you ever find yourself in big trouble (because you messed up the classifier function for example), use the Dedicated Admin Connection (DAC) to disable Resource Governor. The DAC uses the internal workload group regardless of how Resource Governor is configured.

After you’ve disabled Resource Governor, you may notice that the resource pools and workload groups are still sitting there. The configuration hasn’t been cleaned up or anything.

Cleaning Up

Cleaning up doesn’t start out too bad, deal with the classifier function, then drop the groups and pools:

ALTER RESOURCE GOVERNOR DISABLE
ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = NULL); 
DROP FUNCTION IF EXISTS dbo.resource_governor_classifier;
 
IF EXISTS (
	SELECT *
	FROM sys.resource_governor_workload_groups
	WHERE name = 'SSMSGroup'
)
BEGIN
	DROP WORKLOAD GROUP SSMSGroup;
END
 
IF EXISTS (
	SELECT *
	FROM sys.resource_governor_resource_pools
	WHERE name = 'SSMSPool'
)
BEGIN
	DROP RESOURCE POOL SSMSPool;
END

You’ll be left in a state where is_reconfiguration_pending = 1 but since Resource Governor is disabled, it doesn’t really matter.

Modifying Resource Governor configuration

This is kind of a tricky thing and everyone’s situation is different. My advice would be to follow this kind of strategy:

  • Determine if the configuration is correct, if not:
    • Turn off Resource Governor
    • Clean up
    • Configure correctly (plot a course)
    • Turn on (make it so)

Somewhere along the way, if you delete a workload group that some session is still using, then ALTER RESOURCE GOVERNOR RECONFIGURE may give this error message:

Msg 10904, Level 16, State 2, Line 105
Resource governor configuration failed. There are active sessions in workload groups being dropped or moved to different resource pools.
Disconnect all active sessions in the affected workload groups and try again.

You have to wait for those sessions to end (or kill them) before trying again. But which sessions? These ones:

SELECT 
	dwg.name [current work group], 
	dwg.pool_id [current resource pool], 
	wg.name [configured work group], 
	wg.pool_id [configured resource pool],
	s.*
FROM 
	sys.dm_exec_sessions s
INNER JOIN 
	sys.dm_resource_governor_workload_groups dwg /* existing groups */
	ON dwg.group_id = s.group_id
LEFT JOIN 
	sys.resource_governor_workload_groups wg /* configured groups */
	ON wg.group_id = s.group_id
WHERE 
	isnull(wg.group_id, -1) <> dwg.pool_id
ORDER BY 
	s.session_id;

If you find your own session in that list, reconnect.
Once that list is empty feel free to try again.

October 12, 2022

You Can Specify Two Indexes In Table Hint?

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

Yes, It turns out that you can specify two indexes in a table hint:

SELECT Id, Reputation
FROM dbo.Users WITH (INDEX (IX_Reputation, PK_Users_Id))
WHERE Reputation > 1000

And SQL Server obeys. It uses both indexes even though the nonclustered index IX_Reputation is covering:
Two Indexes

But Why?

I think this is a solution looking for a problem.

Resolving Deadlocks?
My team wondered if this could be used as to help with a concurrency problem. We recently considered using it to resolve a particular deadlock but we had little success.

It’s useful to think that SQL Server takes locks on index rows instead of table rows. And so the idea we had was that perhaps taking key locks on multiple indexes can help control the order that locks are taken. But after some effort, it didn’t work at avoiding deadlocks. For me, I’ve had better luck using the simpler sp_getapplock.

Forcing Index Intersection?
Brent Ozar wrote about index intersection a while ago. Index intersection is a rare thing to find in a query plan. Brent can “count on one hand the number of times [he’s] seen this in the wild”.

In theory, I could force index intersection (despite the filter values):

SELECT Id
FROM dbo.Users WITH (INDEX (IX_UpVotes, IX_Reputation))
WHERE Reputation > 500000
AND UpVotes > 500000

But I wouldn’t. SQL Server choosing index intersection is already so rare. And so I think the need to force that behavior will be even rarer. This is not a tool I would use for tuning queries. I’d leave this technique alone.

Have You Used More Than One Index Hint?

I’d love to hear about whether specifying more than one index in a table hint has ever helped solve a real world problem. Let me know in the comments.

October 6, 2022

The Tyranny Of Cumulative Costs (Save and Forget Build Up)

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

50:50 Triangle

Using the right triangle above draw a vertical line separating the area of the triangle in to two parts with the same area.
The triangle on the left is 70.7% of the width of the original triangle.

Cumulative Storage Costs

Think of this another way. The triangle above is a graph of the amount of data you have over time. And if you pay for storage as an operational expense such as when you’re renting storage in the cloud (as opposed to purchasing physical drives). Then the cost of storage is the area of the graph. The monthly bills are ever-increasing, so half of the total cost of storage will be for the most recent 29%.

Put yet another way: If you started creating a large file in the cloud every day since March 2014, then the amount you paid to the cloud provider before the pandemic started is the same amount you paid after the pandemic started (as of August 2022).

How Sustainable is This?

If the amount of data generated a day isn’t that much, or the storage you’re using is cheap enough then it really doesn’t matter too much. As an example, AWS’s cheapest storage, S3 Glacier Deep Archive, works out to about $0.001 a month per GB.

But if you’re using Amazon’s Elastic Block Storage like the kind of storage needed for running your own SQL Servers in the cloud, the cost can be closer to $.08 a month per GB.

The scale on the triangle graph above really matters.

Strategies

This stresses the need for a data life-cycle policy. An exit story for large volumes of data. Try to implement Time-To-Live (TTL) or clean up mechanisms right from the beginning of even the smallest project. Here’s one quick easy example from a project I wrote that collects wait stats. The clean-up is a single line.

Look at Netflix does approaches this issue. I like how they put it. “Data storage has a lot of usage and cost momentum (i.e. save-and-forget build-up).”

Netflix stresses the importance of “cost visibility” and they use that to offer focused recommendations for cleaning up unused data. I recommend reading that whole article. It’s fascinating.

It’s important to implement such policies before that triangle graph gets too large.

September 28, 2022

When are Non-Updating Updates Treated Like Regular Updates?

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

Takeaway: I look at different features to see whether non-updates are treated the same as other updates. Most of the time they are.

According to Microsoft’s documentation, an UPDATE statement “changes existing data in a table or view”. But what if the values don’t actually change? What if affected rows are “updated” with the original values? Some call these updates non-updating. This leads to a philosophical question: “If an UPDATE statement doesn’t change any column to a different value, has the row been updated?”

I answer yes to that question. I consider all affected rows as “updated” regardless of whether the values are different. I think of the UPDATE statement as more of an OVERWRITE statement. I also think of “affected rows” rather than “changed rows”. In most cases SQL Server thinks along the same lines.

I list some features and areas of SQL Server and whether non-updating updates are treated the same or differently than other updates:

The Performance of Non-Updates Non-Updates treated differently than other Updates

In 2010, Paul White wrote The Impact of Non-Updating Updates where he points out optimizations Microsoft has made to avoid unnecessary logging when performing some non-updating updates. It’s a rare case where SQL Server actually does pay attention to whether values are not changing to avoid doing unnecessary work.

In the years since, I’ve noticed that this optimization hasn’t changed much except that Microsoft has extended these performance improvements to cases where RCSI or SI is enabled.

Regardless of this performance optimization, it’s still wise to limit affected rows as much as possible. In other words, I still prefer

UPDATE FactOnlineSales 
SET DiscountAmount = NULL
WHERE CustomerKey = 19036
AND DiscountAmount IS NOT NULL;

over this logically equivalent version:

UPDATE FactOnlineSales 
SET DiscountAmount = NULL
WHERE CustomerKey = 19036;

Although the presence of triggers and cascading foreign keys require extra care as we’ll see.

Triggers Non-Updates are treated the same as Updates

Speaking of triggers, remember that inside a trigger, non-updating rows are treated exactly the same as any other changing row. Just remember that:

  • Triggers are always invoked, even when there are zero rows affected or even when the table is empty.
  • For UPDATE statements, the UPDATE() function only cares about whether a column appeared in the SET clause. It can be useful for short-circuit logic.
  • The virtual tables inserted and deleted are filled with all affected rows (not just changed rows).

ON UPDATE CASCADE Non-Updates are treated the same as Updates

When foreign keys have ON UPDATE CASCADE set, Microsoft says “corresponding rows are updated in the referencing table when that row is updated in the parent table”.

Non-updating updates are no exception. To demonstrate, I create an untrusted foreign key and perform a non-updating update. It’s not a “no-op”, the constraint is checked as expected.

CREATE TABLE dbo.TestReferenced (
	Id INT PRIMARY KEY
);
 
INSERT dbo.TestReferenced (Id) VALUES (1), (2), (3), (4);
 
 
CREATE TABLE dbo.TestReferrer (
	Id INT NOT NULL
);
 
INSERT dbo.TestReferrer (Id) VALUES (2), (4), (6), (8);
 
ALTER TABLE dbo.TestReferrer 
WITH NOCHECK ADD FOREIGN KEY (Id) 
REFERENCES dbo.TestReferenced(Id)
ON UPDATE CASCADE;
 
-- trouble with this non-updating update:
UPDATE dbo.TestReferrer
SET Id = Id
WHERE Id = 8;
-- The UPDATE statement conflicted with the FOREIGN KEY constraint ...

@@ROWCOUNT Non-Updates are treated the same as Updates

SELECT @@ROWCOUNT returns the number of affected rows in the previous statement, not the number of changed rows.

Temporal Tables Non-Updates are treated the same as Updates

Non-updating updates still generate new rows in the history table. This can lead to puzzling results if you’re not prepared for them. For example, I can make some changes and query the history like this:

INSERT MyTest(Value) VALUES ('Mike')
UPDATE MyTest SET Value = 'Michael';
UPDATE MyTest SET Value = 'Michael';

When I take the union of rows in the table and the history table, I might see this output:

It reminds me of when my GPS says something like “In two miles, continue straight on Highway 81.” The value didn’t change, but there are still two distinct ranges.

Change Tracking Non-Updates are treated the same as Updates

Change tracking could be called “Overwrite Tracking” because all non-updating updates are tracked:

ALTER DATABASE CURRENT
set change_tracking = ON  
(change_retention = 2 days, auto_cleanup = on);
GO
 
create table dbo.test (id int primary key);
insert dbo.test (id) values (1), (2), (3); 
 
alter table dbo.test enable change_tracking with (track_columns_updated = on)  
 
-- This statement produces 0 rows:
SELECT t.id, c.*
FROM CHANGETABLE (CHANGES dbo.Test, 0) AS c  
JOIN dbo.Test AS t ON t.id = c.id ;
 
-- "update"
update dbo.test set id = id; 
 
-- This statement produces 3 rows:
SELECT t.id, c.*
FROM CHANGETABLE (CHANGES dbo.Test, 0) AS c  
JOIN dbo.Test AS t ON t.id = c.id;

Change Data Capture (CDC) Non-Updates treated differently than other Updates

Here’s a rare exception where a SQL Server feature is named properly. CDC does indeed capture data changes only when data is changing.
Paul White provided a handy set up for testing this kind of stuff. I reran his tests with CDC turned on. I found that:

  • When CDC is enabled, an update statement is always logged and the data buffers are always marked dirty.
  • But non-updating updates almost never show up as captured data changes, not even when the update was on a column in the clustering key.
  • I was able to generate some CDC changes for non-updates by updating the whole table with an idempotent expression (e.g. SET some_column = some column * 1)
    CREATE TABLE dbo.SomeTable
    (
        some_column integer NOT NULL,
        some_data integer NOT NULL,
    	index ix_sometable unique clustered (some_column)
    );
     
    UPDATE dbo.SomeTable SET some_column = some_column*1;

If you’re using this feature, this kind of stuff is important to understand! If you’re using CDC for DIY replication (God help you), then maybe the missing non-updates are acceptable. But if you’re looking for a kind of audit, or a way to analyze user-interactions with the database, then CDC doesn’t give the whole picture and is not the tool for you.

September 14, 2022

The Effect of a Slow Registry on SQL Server

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

I want to describe some symptoms that SQL Server may display when its Windows Registry is non-responsive or slow. From the symptoms, it’s hard to know that it’s a slow registry and so if a web search brought you here, hopefully this helps.

How does SQL Server use the Windows registry?

First, it’s useful to know a bit about how SQL Server uses the registry. We can watch registry activity using Process Monitor (procmon). On a fairly quiet local machine, I see these SQL Server processes “querying” registry keys:

  • There is some background process reading Query Store settings (every minute).
    HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQLServer\QueryStoreSettings
    QueryStore
  • There is also some background process writing uptime info (every minute).
    HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQLServer\uptime_pid
    Uptime
  • When a login is requested from a new connection, SQL Server will check to see if R Services are installed (aka Advanced Analytics).
    SQL Server will check SERVERPROPERTY('IsAdvancedAnalyticsInstalled') every time to see if it has to care about logins associated with something called implied authentication. This happens on every login which will be important later.
    HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL15.MSSQLSERVER\Setup\AdvancedAnalytics
    AdvancedAnalytics
  • If I use a function like HASHBYTES, SQL Server looks up some cryptography settings. These settings get queried only on the first call to HASHBYTES in each session.
    e.g. HKLM\SOFTWARE\Microsoft\Cryptography\Defaults\Provider\Microsoft Enhanced RSA and AES Cryptographic Provider
    Cryptography

That’s not an exhaustive list, there are many other ways SQL Server uses the Windows Registry. For example:

  • Many SQL Agent settings are stored there and are read regularly
  • xp_regread coming from using wizards in SQL Server Management Studio.
  • SERVERPROPERTY(N'MachineName') gets its info from HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Hostname
  • And many others.

What happens when the Windows registry is slow?

SQL Server’s use of the registry can be fairly quiet – even on a busy server – so you may not see any symptoms at all. But if the calls to the registry are slow in responding, here is what you might see:

  • New logins will ask whether Advanced Analytics Extensions is installed. Leading to a non-yielding scheduler and a memory dump. With some effort, you might find a stack trace like the one in the appendix below.
  • Any other kind of memory dump caused by non-yielding schedulers in which the saved stack trace ends with ntdll!NtOpenKeyEx. The AdvancedAnalytics is just one example but it’s the most common because it’s executed first on each login.
  • Queries calling HASHBYTES (or other cryptography functions) will be suspended and wait with PREEMPTIVE_OS_CRYPTACQUIRECONTEXT. I mostly see this when the login checks are skipped i.e. when an open connection from a connection pool is used.
  • Another symptom is Availability Group failovers (allegedly). It’s harder (for me) to do AG failover post mortems and tie them definitively to slow Windows registries

Why might the registry be slow?

I’m not sure. Perhaps it’s associated with some registry cleanup process. It may have something to do with an IO spike on the C: drive.

We rebuilt a virtual machine image from scratch which seems to avoid the problem. I’m keeping my fingers crossed.

I’d love to hear if you’ve come across anything similar.

Appendix: Sample call stack for non-yielding scheduler

00 ntdll!NtOpenKeyEx
01 KERNELBASE!LocalBaseRegOpenKey
02 KERNELBASE!RegOpenKeyExInternalW
03 KERNELBASE!RegOpenKeyExW
04 sqlmin!IniRegOpenKeyExW
05 sqlmin!GetServerProperty
06 sqlmin!IsAdvancedAnalyticsInstalled
07 sqllang!IsExtensibilityFeatureEnabled
08 sqllang!ImpliedAuthenticationManager::IsImpliedAuthenticationEnabled
09 sqllang!FindLogin
0a sqllang!login
0b sqllang!process_login_finish
0c sqllang!process_login
0d sqllang!process_commands_internal
0e sqllang!process_messages
0f sqldk!SOS_Task::Param::Execute
10 sqldk!SOS_Scheduler::RunTask
11 sqldk!SOS_Scheduler::ProcessTasks
12 sqldk!SchedulerManager::WorkerEntryPoint
13 sqldk!SystemThreadDispatcher::ProcessWorker
14 sqldk!SchedulerManager::ThreadEntryPoint
15 kernel32!BaseThreadInitThunk
16 ntdll!RtlUserThreadStart

September 7, 2022

This Function Generates UNPIVOT Syntax

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Just like PIVOT syntax, UNPIVOT syntax is hard to remember.
When I can, I prefer to pivot and unpivot in the application, but here’s a function I use sometimes when I want don’t want to scroll horizontally in SSMS.

CREATE OR ALTER FUNCTION dbo.GenerateUnpivotSql (@Sql NVARCHAR(MAX))
  RETURNS NVARCHAR(MAX) AS
BEGIN 
RETURN '
WITH Q AS 
(
  SELECT TOP (1) ' + 
  (
    SELECT 
      STRING_AGG(
        CAST(
          'CAST(' + QUOTENAME(NAME) + ' AS sql_variant) AS ' + QUOTENAME(NAME) 
          AS NVARCHAR(MAX)
        ), ',
    '
      )
    FROM sys.dm_exec_describe_first_result_set(@sql, DEFAULT, DEFAULT)
  ) + '
  FROM ( 
    ' + @sql + '
  ) AS O 
)
SELECT U.FieldName, U.FieldValue
FROM Q
UNPIVOT (FieldValue FOR FieldName IN (' +
  (
    SELECT STRING_AGG( CAST( QUOTENAME(name) AS NVARCHAR(MAX) ), ',
  ' ) 
  FROM sys.dm_exec_describe_first_result_set(@sql, DEFAULT, DEFAULT)
  ) + '
  )) AS U';
END
GO

And you might use it like this:

declare @sql nvarchar(max) ='SELECT * FROM sys.databases WHERE database_id = 2';
declare @newsql nvarchar(max) = dbo.GenerateUnpivotSql (@sql);
exec sp_executesql @sql;
exec sp_executesql @newsql;

to get results like this:
Results

Uses

I find this function useful whenever I want to take a quick look at one row without all that horizontal scrolling. Like when looking at sys.dm_exec_query_stats and other wide dmvs. This function is minimally tested, so caveat emptor.

October 1, 2021

A System-Maintained LastModifiedDate Column

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 11:13 am

I like rowversion columns, I like that they’re system-maintained and they provide a unique deterministic way to compare or order changes in a database. But they’re not timestamps (despite the alias).

I also like datetime2 columns which are called LastModifiedDate. They can indicate the date and time that a row was last modified. But I have to take care of maintaining the column myself. I either have to remember to update that column on every update or I have to use something like a trigger with the associated overhead.

But maybe there’s another option.

GENERATED ALWAYS AS ROW START
What if I use the columns that were meant to be used for temporal tables, but leave SYSTEM_VERSIONING off?

CREATE TABLE dbo.Test
(
	Id INT IDENTITY NOT NULL,
	Value VARCHAR(100) NOT NULL,
	LastModifiedDate DATETIME2 GENERATED ALWAYS AS ROW START NOT NULL,
	SysEndTime DATETIME2 GENERATED ALWAYS AS ROW END HIDDEN NOT NULL,	
	PERIOD FOR SYSTEM_TIME (LastModifiedDate, SysEndTime),
	CONSTRAINT PK_Test PRIMARY KEY CLUSTERED (Id)
)

It’s system maintained, it’s an actual datetime and I don’t have to worry about the overhead of triggers.
But it’s not a perfect solution:

  • SysEndTime is required but it’s an unused column here. It’s needed to define the period for SYSTEM_TIME and it’s always going to be DATETIME2’s maximum value. That’s why I made that column “hidden”. It’s got an overhead of 8 Bytes per row.
  • The value for LastModifiedDate will be the starting time of the transaction that last modified the row. That might lead to confusing behaviors illustrated by this example. Say that:
    • Transaction A starts
    • Transaction B starts
    • Transaction B modifies a row
    • Transaction A modifies the same row

    After all that, the last modified date will indicate the time that transaction A starts. In fact if I try these shenanigans with system versioning turned on, I get this error message when transaction A tries to modify the same row:

      Msg 13535, Level 16, State 0, Line 16
      Data modification failed on system-versioned table ‘MyDb.dbo.Test’ because transaction time was earlier than period start time for affected records.

Everything is tradeoffs. If you can live with the drawbacks of the system-generated last modified date column, then it might be an option worth considering.

August 9, 2021

Find Procedures That Use SELECT *

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

I have trouble with procedures that use SELECT *. They are often not “Blue-Green safe“. In other words, if a procedure has a query that uses SELECT * then I can’t change the underlying tables can’t change without causing some tricky deployment issues. (The same is not true for ad hoc queries from the application).

I also have a lot of procedures to look at (about 5000) and I’d like to find the procedures that use SELECT *.
I want to maybe ignore SELECT * when selecting from a subquery with a well-defined column list.
I also want to maybe include related queries like OUTPUT inserted.*.

The Plan

  1. So I’m going to make a schema-only copy of the database to work with.
  2. I’m going to add a new dummy-column to every single table.
  3. I’m going to use sys.dm_exec_describe_first_result_set_for_object to look for any of the new columns I created

Any of my new columns that show up, were selected with SELECT *.

The Script

use master;
DROP DATABASE IF EXISTS search_for_select_star;
DBCC CLONEDATABASE (the_name_of_the_database_you_want_to_analyze, search_for_select_star);
ALTER DATABASE search_for_select_star SET READ_WRITE;
GO
 
use search_for_select_star;
 
DECLARE @SQL NVARCHAR(MAX);
SELECT 
	@SQL = STRING_AGG(
		CAST(
			'ALTER TABLE ' + 
			QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + 
			'.' + 
			QUOTENAME(OBJECT_NAME(object_id)) + 
			' ADD NewDummyColumn BIT NULL' AS NVARCHAR(MAX)),
		N';')
FROM 
	sys.tables;
 
exec sp_executesql @SQL;
 
SELECT 
	SCHEMA_NAME(p.schema_id) + '.' + p.name AS procedure_name, 
	r.column_ordinal,
	r.name
FROM 
	sys.procedures p
CROSS APPLY 
	sys.dm_exec_describe_first_result_set_for_object(p.object_id, NULL) r
WHERE 
	r.name = 'NewDummyColumn'
ORDER BY 
	p.schema_id, p.name;
 
use master;
DROP DATABASE IF EXISTS search_for_select_star;

Update

Tom from StraightforwardSQL pointed out a nifty feature that Microsoft has already implemented.

Yes it does! You can use it like this:

select distinct SCHEMA_NAME(p.schema_id) + '.' + p.name AS procedure_name
from sys.procedures p
cross apply sys.dm_sql_referenced_entities(
	object_schema_name(object_id) + '.' + object_name(object_id), default) re
where re.is_select_all = 1

Comparing the two, I noticed that my query – the one that uses dm_exec_describe_first_result_set_for_object – has some drawbacks. Maybe the SELECT * isn’t actually included in the first result set, but some subsequent result set. Or maybe the result set couldn’t be described for one of these various reasons

On the other hand, I noticed that dm_sql_referenced_entities has a couple drawbacks itself. It doesn’t seem to capture select statements that use `OUTPUT INSERTED.*` for example.

In practice though, I found the query that Tom suggested works a bit better. In the product I work most closely with, dm_sql_referenced_entities only missed 3 procedures that dm_exec_describe_first_result_set_for_object caught. But dm_exec_describe_first_result_set_for_object missed 49 procedures that dm_sql_referenced_entities caught!

August 4, 2021

What To Avoid If You Want To Use MERGE

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 1:06 pm

Aaron Bertrand has a post called Use Caution with SQL Server’s MERGE Statement. It’s a pretty thorough compilation of all the problems and defects associated with the MERGE statement that folks have reported in the past. But it’s been a few years since that post and in the spirit of giving Microsoft the benefit of the doubt, I revisited each of the issues Aaron brought up.

Some of the items can be dismissed based on circumstances. I noticed that:

  • Some of the issues are fixed in recent versions (2016+).
  • Some of the issues that have been marked as won’t fix have been fixed anyway (the repro script associated with the issue no longer fails).
  • Some of the items are complaints about confusing documentation.
  • Some are complaints are about issues that are not limited to the MERGE statement (e.g. concurrency and constraint checks).

So what about the rest? In what circumstances might I decide to use a MERGE statement? What do I still need to avoid? In 2019 and later, if I’m using MERGE, I want to avoid:

It’s a shorter list than Aaron’s but there’s another gotcha. The same way that some of the items get addressed with time, new issues continue to pop up. For example, temporal tables are a relatively new feature that weren’t a thing when Aaron’s first posted his list. And so I also want to avoid:

If MERGE has trouble with old and new features , then it becomes clear that MERGE is a very complicated beast to implement. It’s not an isolated feature and it multiplies the number of defects that are possible.

Severity of Issues

There’s a large number of issues with a large variety of severity. Some of the issues are minor annoyances, or easily avoidable. Some of them are serious performance issues that are harder to deal with. But a few of the issues can be worse than that! If I ask SQL Server to UPDATE something, and SQL Server responds with (1 row affected) , then it better have affected that row! If it didn’t, then that’s a whole new level of severity.

That leads me to what I think is the worst, unfixed bug. To be safe from it, avoid MERGEs with:

The defect is called Merge statement Delete does not update indexed view in all cases. It’s still active as of SQL Server 2019 CU9 and it leaves your indexed view inconsistent with the underlying tables. For my case, I can’t predict what indexed views would be created in the future, so I would shorten my advice to say avoid MERGEs with:

Conclusion

When I first started writing this post, I thought the gist was going to be “MERGE isn’t so bad if you just avoid these things”. But it still is bad. Aaron’s list is a little out of date but his advice is not. If Aaron updated that post today, the list of defects and issues with MERGE would be different and just as concerning to me as the day he wrote it.

So just to simplify my advice to you. Avoid:

  • MERGE
Older Posts »

Powered by WordPress