Do You Have Check Constraints On Your Start and End Dates?

November 21, 2016

Do You Have Check Constraints On Your Start and End Dates?

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:13 am

Adam Machanic tweeted this advice last week:
https://twitter.com/AdamMachanic/status/799365663781519360

Are you missing any of these check constraints? Run this query to check.
This query looks for any columns in the same table that begin with “Start” and “End”. It then looks for check constraints that reference both these columns. If it doesn’t find them, it suggests a check constraint.

WITH StartColumnNames AS
(
    SELECT object_id,
           column_id,
           name AS column_name
      FROM sys.columns
     WHERE name like 'start%'
),
EndColumnNames AS
(
    SELECT object_id,
           column_id,
           name AS column_name
      FROM sys.columns
     WHERE name like 'end%'
)
SELECT t.object_id,            
       OBJECT_SCHEMA_NAME(t.object_id) AS [schema_name], 
       t.[name] AS table_name,
       s.column_name AS start_column,
       e.column_name AS end_column,
       N'ALTER TABLE ' + QUOTENAME(OBJECT_SCHEMA_NAME(t.object_id)) + N'.' + QUOTENAME(t.name) + 
           N' ADD CONSTRAINT ' + 
           QUOTENAME(N'CK_' + t.name + N'_' + s.column_name + N'_' + e.column_name) + 
           N' CHECK (' + QUOTENAME(s.column_name) + N' <= ' + QUOTENAME(e.column_name) + N');' as check_suggestion
  FROM StartColumnNames s
  JOIN EndColumnNames e
       ON s.object_id = e.object_id
       AND s.column_id <> e.column_id
       AND REPLACE(s.column_name, 'start', 'end') = e.column_name        
  JOIN sys.tables t
       ON t.object_id = s.object_id
 WHERE NOT EXISTS 
       (
           SELECT *
             FROM sys.check_constraints c
             JOIN sys.sql_expression_dependencies start_dependency
                  ON start_dependency.referencing_id = c.object_id
                  AND start_dependency.referenced_id = t.object_id
                  AND start_dependency.referenced_minor_id = s.column_id
             JOIN sys.sql_expression_dependencies end_dependency
                  ON end_dependency.referencing_id = c.object_id
                  AND end_dependency.referenced_id = t.object_id
                  AND end_dependency.referenced_minor_id = e.column_id
            WHERE c.parent_object_id = t.object_id
       )

Caveats

Don’t blindly run scripts that you got from some random guy’s blog. Even if that someone is me. That’s terribly irresponsible.

But this query may be useful if you do want to look for a very specific, simple kind of constraint that may match your business specs. These constraints are just suggestions and may not match your business rules. For example, when I run this query on Adventureworks, I get one “missing” check constraint for HumanResources.Shift(StartTime, EndTime) and when I look at the contents of the Shift table, I get this data:

Notice that I can’t create a constraint on this table because of the night shift. The constraint doesn’t make sense here.

Creating constraints on existing tables may take time if the table is huge. Locks may be held on that table for an uncomfortably long time.

Of course if your table has data that would violate the constraint, you can’t create it. But now you have to make some other choices. You can correct or delete the offending data or you can add the constraint with NOCHECK.

-- Comments (8)

8 Comments »

I very strongly agree with this post, and how I have emerged this for years. But instead of putting time as a separate column, you really ought to use a timestamp column (datetime2(0) in T-SQL dialect. This handles the per the cyclic problem of going over midnight, and gives you correct temporal math.

I am also a big fan of “CHECK (i > 0)” and “CHECK (i >= 0)” to prevent negative numbers. Maybe we should have had a different data type, but it is too late for SQL now.

Comment by Joe Celko — November 28, 2016 @ 10:57 am
Thanks Joe,

I’ll echo what Adam wrote on twitter. He almost never sees such constraints and almost as often sees bad data there.

Comment by Michael J. Swart — November 28, 2016 @ 11:16 am
I am a little confused about the purpose. Is the issue strictly data quality or is there also a performance aspect similar to declaring foreign key relationships?

Comment by Ray Herring — December 1, 2016 @ 1:27 pm
While data quality would be sufficient in and of itself, yes, there are performance improvements. Basically a good optimizer in any SQL engine will grab the check () constraints and factor them into the execution plan. I strongly recommend that when you get the time to do some preventive maintenance on your schemas, you go table by table and column by column adding constraints which look obvious and even stupid to humans. Hence my example of (i > 0), and things like check (zip_code LIKE ‘[0-9][0-9][0-9][0-9][0-9]’) for your schema. Eventually he would have to put them in the front end if you do not do it here and have no way of knowing that it will always be done correctly in the front end because you cannot possibly check all of the future programs that have a been written yet. Remember in SQL optimizer can handle a few hundred predicates and you cannot 🙂

Comment by Joe Celko — December 1, 2016 @ 5:01 pm
Well put Joe, thanks.

Comment by Michael J Swart — December 1, 2016 @ 5:59 pm
Thanks Joe.

Comment by Ray — December 1, 2016 @ 6:09 pm

Just a quick demo showing how SQL Server is able to use a check constraint:

use tempdb
go
 
create table #test
(
	notNegative int not null,
	check (notNegative >= 0)
)
 
insert #Test
select message_id
from sys.messages;
 
set statistics io on
-- CTRL+M in SSMS to see query plan
select * 
from #Test
where notNegative < 0;

In this case, SQL Server performs no reads and returns an empty result set.
But you’ll notice that I’m querying data that can’t exists. Queries that can be optimized away like that are rare.

When I wrote this article, I was mainly concerned with data quality.
These constraints can often uncover application bugs early. I value that a lot more than the performance boost (if any) that they give to SQL Server queries.

Comment by Michael J Swart — December 5, 2016 @ 2:29 pm

Thank you again Michael. My confusion originally came because the context of Adam’s Tweet was about data quality but another blog noted it’s performance impacts and I just couldn’t connect the two… thinking it was deeper than that somehow. This helps!

Comment by Ryan Booz — December 5, 2016 @ 3:01 pm

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

November 21, 2016