Michael J. Swart

March 26, 2018

T-SQL Options for Comparing “Distinctness”

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:40 am

I had the privilege of listening to Itzik Ben Gan talk about “distinctness” in a talk he gave at PASS Summit. Distinctness is a relationship or comparison between two variables, just like equals (=). But unlike equality, distinctness treats NULLs in a more intuitive way (NULL is not distinct from NULL).

There’s often confusion because equality in SQL is not like equality in mathematics. In particular equality in SQL doesn’t follow the reflexive property (∀A, A=A).

Clear right?

I explore different syntax options to test whether values are distinct or not. Each method has its pros and cons.

The Setup

Consider this table.

CREATE TABLE Tasks
(
    TaskId INT NOT NULL IDENTITY CONSTRAINT PK_TASKS PRIMARY KEY,
    AssignedTeamId INT NULL,
    AssignedSubTeamId INT NULL,
    TaskDetails NVARCHAR(2000) NOT NULL,
    INDEX IX_TASKS_ASSIGNEDTEAMID (AssignedTeamId)
);

When a task is unassigned, the AssignedTeamId and AssignedSubTeamId columns can both be null.
Our goal will be to select an arbitrary TaskId given parameters @TeamId, @SubTeamId. And when parameters @TeamId and @SubTeamId are both null, I want to return an unassigned task.

The Equality Join (doesn’t compare distinctness)

I just want to post this here as an example that doesn’t work.

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
 
-- this will never return any rows when the parameters are null:
SELECT TOP 1 TaskId
  FROM Tasks
 WHERE AssignedTeamId = @TeamId
   AND AssignedSubTeamId = @SubTeamId

PROS: The syntax looks nice and clean.
CONS: It doesn’t work for nulls.

The Expanded WHERE Clause

Well, let’s just write it all out then.

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
 
SELECT TOP 1 TaskId
  FROM Tasks
 WHERE ( (AssignedTeamId IS NULL AND @TeamId IS NULL) OR AssignedTeamId = @TeamId )
   AND ( (AssignedSubTeamId IS NULL AND @SubTeamId IS NULL) OR AssignedSubTeamId = @SubTeamId )

There’s no way that syntax is sarg-able. But it turns out that it is. SQL Server works hard and says “I see what you’re trying to do there, I’ll help you out”.
PROS: It works and it’s sarg-able.
CONS: That syntax is sooo awkward.

Using INTERSECT Syntax

This is a tip that I got straight from Itzik Ben Gan who says he got the idea from Paul White. The idea is that INTERSECT doesn’t use the idea of equality, but of distinctness for it’s comparison. We can use this to create slightly nicer syntax.

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
 
SELECT TOP 1 TaskId
FROM tasks
WHERE EXISTS (
    SELECT assignedTeamId, assignedSubTeamId
    INTERSECT
    SELECT @TeamId, @SubTeamId
)

The syntax is slightly less awkward, and it’s sarg-able. Or should be… But there’s a problem with this query (see if you can find it before reading further). Compare the two query plans. First the expanded where clause:

The Expanded where clause produces an efficient seek.


Here’s what the query with the INTERSECT syntax produces:

The INTERSECT syntax produces an inefficient scan

The secret to this mystery lies in that filter operator. There’s an implicit conversion there from int to bigint and that can cause a scan of the entire index. With the expanded syntax, SQL Server can handle the conversion gracefully. With the INTERSECT syntax it cannot. This was a really hard-earned lesson for us this week.

Change the parameters @TeamId and @SubTeamId to INT to match and the query becomes sarg-able again.

PROS: More elegant syntax and sarg-able (when you’re careful)
CONS: This syntax causes performance issues with mismatched types. Take extra-special care to make sure types match up.

IS NOT DISTINCT FROM

Check it:

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
 
SELECT TOP 1 TaskId
FROM tasks
WHERE assignedTeamId IS NOT DISTINCT FROM @TeamId
  AND assignedSubTeamId IS NOT DISTINCT FROM @SubTeamId

Talk about elegant! That’s what we wanted from the beginning. It’s part of ANSI’s SQL 1999 standard. Paul White tells us it’s implemented internally as part of the query processor, but it’s not part of T-SQL! There’s a connect item for it… err. Or whatever they’re calling it these days. Go read all the comments and then give it a vote. There are lots of examples of problems that this feature would solve.

PROS: Super-elegant!
CONS: Invalid syntax (vote to have it included).

1 Comment »

  1. […] Read More (Community […]

    Pingback by T-SQL Options for Comparing 'Distinctness' - SSWUG.ORG — April 16, 2018 @ 3:00 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress