Michael J. Swart

July 9, 2018

The Bare Minimum You Need To Know To Work With Git

Filed under: Technical Articles — Michael J. Swart @ 9:00 am

I don’t like using git for source control. It’s the worst source control system (except for all the others). My biggest beef is that many of the commands are unintuitive.

Look how tricky some of these common use cases can be: Top Voted Stackoverflow Questions tagged Git. The top 3 questions have over ten thousand votes! This shows a mismatch between how people want to use git and how git is designed to be used.

I want to show the set of commands that I use most. These commands cover 95% of my use of git.
stupid git

Initial Setup

One-time tasks include downloading git and signing up for github or bitbucket. My team uses github, but yours might use gitlab, bitbucket or something else.

Here’s my typical workflow. Say I want to work on some files in a project on a remote server:

Clone a Repository

My first step is to find the repository for the project. Assuming I’m not starting a project from scratch, I find and copy the location of the repository from a site like github or bitbucket. So the clone command looks like this:

git clone https://github.com/SomeProject/SomeRepo.git

This downloads all the files so I have my own copy to work with.

Create a Branch

Next I create a branch. Branches are “alternate timelines” for a repository. The real timeline or branch is called master. One branch can be checked out at a time, so after I create a branch, I check out that branch. In the diagram, I’ve indicated the checkout branch in bold. I like to immediately push that branch back to the remote server. I can always refer to the remote server as “origin”. All this is done with these commands:

git branch myBranch
git checkout myBranch 
git push -u origin myBranch

Change Stuff

Now it’s time to make changes. This has nothing to do with git but it’s part of my workflow. In my example here I’m adding a file B.txt.

Stage Changes

These changes aren’t part of the branch yet though! If I want them to be part of the branch. I have to commit my changes. That’s done in two parts. The first part is to specify the changes I want to commit. That’s called staging and it’s done with git add. I almost always want to commit everything, so the command becomes:

git add *

Commit Changes

The second part is to actually commit the files to the branch with a commit message:

git commit -m "my commit message"

Push Changes

I’m happy with the changes I made to my branch so I want to share them with the rest of the world starting with the remote server.

git push origin myBranch

Create a Pull Request and Merge to master

In fact I’m so happy with these changes, I want to include them in master, the real timeline. But not so fast! This is where collaboration and teamwork become important. I create a pull request and then if I get the approval of my teammates, I can merge.

It sounds like a chore, but luckily I don’t have to memorize any git commands for this step because of sites like github or bitbucket. They have a really nice web site and UI where teams can discuss changes before approving them. Once the changes are approved and merged, master now has the changes.

Once it’s merged. Just to complete the circle, I can pull the results of the merge back to my own computer with a pull

git pull
git checkout master

Other Use Cases

Where Am I?
To find out where I am in my workflow, I like to use:

git status

This one command can tell me what branch I’m on. Whether there are changes that can be pushed or pulled. What files have changed and what changes are staged.

Merge Conflicts
With small frequent changes, merge conflicts become rare. But they still happen. Merge conflicts are a pain in the neck and to this day I usually end up googling “resolving git merge conflicts”.

Can’t this Be Easier?

There are so many programs and utilities available whose only purpose is to make this stuff easier. But they don’t. They make some steps easy, and some steps impossible. Whenever I really screw things up, I delete everything and start from scratch at the cloning step. I find I have to do that more often when I use a tool that was supposed to make my life easier.

One Exception
The only exception to this rule is Visual Studio Code. It’s a real treat to use. I love it.

Maybe you like the command line. Maybe you have a favorite “git-helper” application. No matter how you use git, in every case, you still have to understand the workflow you’re using and that’s what I’ve tried to describe here.

Where To Next

If you want to really get good at this stuff. I recently learned of a great online resource (thanks Cressa!) at https://learngitbranching.js.org/. It’s a great interactive site that teaches more about branching. You will very quickly learn more than the bare minimum required. I recommend it.

June 15, 2018

ORDER BY newid() is an Unbiased Way To Randomize

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:47 am

Mike Bostock is a data-visualization specialist. And it really shows in his blog. Every article is really well designed (which makes sense… many of the articles are about design).

One of his articles, Visualizing Algorithms has some thoughts on shuffling at https://bost.ocks.org/mike/algorithms/#shuffling.

He says that sorting using a random comparator is a rotten way to shuffle things. Not only is it inefficient, but the resulting shuffle is really really biased. He goes on to visualize that bias (again, I really encourage you to go see his stuff).

Ordering by random reminded me of the common technique in SQL Server of ORDER BY newid(). So I wondered whether an obvious bias was present there. So I shuffled 100 items thousands of times and recreated the visualization of bias in a heat map (just like Mike did).

Here is the heatmap. If you can, try to identify any patterns.

Order By NewID Bias


    columns are the position before the shuffle,
    rows are the position after the shuffle,
    green is a positive bias and
    red is a negative bias.

I don’t think there is any bias here. The problem that introduces bias in Mike Bostock’s example is that his “random comparator” that he defined does not obey transitivity. His words. “A comparator must obey transitivity: if a > b and b > c, then a > c.”
But in SQL Server, because each row is assigned a newid(), ORDER BY newid() doesn’t have that flaw and so it doesn’t have that bias.

But Be Careful

Although the method is unbiased, ORDER BY newid() is still inefficient. It uses a sort which is an inefficient way of shuffling. There are alternative shuffle algorithms that are more efficient.
ORDER BY newid() is good for quick and dirty purposes. But if you value performance, shuffle in the app.

April 6, 2018

Are There Any System Generated Constraint Names Lurking In Your Database?

Names for constraints are optional meaning that if you don’t provide a name when it’s created or cannot afford one, one will be appointed to you by the system.
These system provided names are messy things and I don’t think I have to discourage you from using them. Kenneth Fisher has already done that in Constraint names, Say NO to the default.

But how do you know whether you have any?

Here’s How You Check

SELECT SCHEMA_NAME(schema_id) AS [schema name],
       OBJECT_NAME(object_id) AS [system generated object name],
       OBJECT_NAME(parent_object_id) AS [parent object name],
       type_desc AS [object type]
  FROM sys.objects
         type + '\_\_' + LEFT(OBJECT_NAME(parent_object_id),8) + '\_\_%' ESCAPE '\'
       OBJECT_NAME(object_id) LIKE 
          REPLACE(sys.fn_varbintohexstr(CAST(object_id AS VARBINARY(MAX))), '0x', '%\_\_') ESCAPE '\'

This will find all your messy system-named constraints.
For example, a table defined like this:

create table MY_TABLE
  CHECK (id >= 0)

Will give results like this:

Happy hunting.

Update: April 9, 2018
We can get this info from the system views a little easier as Rob Volk pointed out. I’ve also included the parent object’s type.

       OBJECT_NAME(constid) AS [system generated constraint name],
       (select type_desc from sys.objects where object_id = constid) as [constraint type],
       OBJECT_NAME(id) AS [parent object name],
       (select type_desc from sys.objects where object_id = id) as [parent object type]
  FROM sys.sysconstraints
 WHERE status & 0x20000 > 0
   AND OBJECT_NAME(id) NOT IN (N'__RefactorLog', N'sysdiagrams')
 ORDER BY [parent object type], [parent object name], [system generated constraint name];

March 26, 2018

T-SQL Options for Comparing “Distinctness”

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:40 am

I had the privilege of listening to Itzik Ben Gan talk about “distinctness” in a talk he gave at PASS Summit. Distinctness is a relationship or comparison between two variables, just like equals (=). But unlike equality, distinctness treats NULLs in a more intuitive way (NULL is not distinct from NULL).

There’s often confusion because equality in SQL is not like equality in mathematics. In particular equality in SQL doesn’t follow the reflexive property (∀A, A=A).

Clear right?

I explore different syntax options to test whether values are distinct or not. Each method has its pros and cons.

The Setup

Consider this table.

    AssignedTeamId INT NULL,
    AssignedSubTeamId INT NULL,
    TaskDetails NVARCHAR(2000) NOT NULL,

When a task is unassigned, the AssignedTeamId and AssignedSubTeamId columns can both be null.
Our goal will be to select an arbitrary TaskId given parameters @TeamId, @SubTeamId. And when parameters @TeamId and @SubTeamId are both null, I want to return an unassigned task.

The Equality Join (doesn’t compare distinctness)

I just want to post this here as an example that doesn’t work.

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
-- this will never return any rows when the parameters are null:
  FROM Tasks
 WHERE AssignedTeamId = @TeamId
   AND AssignedSubTeamId = @SubTeamId

PROS: The syntax looks nice and clean.
CONS: It doesn’t work for nulls.

The Expanded WHERE Clause

Well, let’s just write it all out then.

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
  FROM Tasks
 WHERE ( (AssignedTeamId IS NULL AND @TeamId IS NULL) OR AssignedTeamId = @TeamId )
   AND ( (AssignedSubTeamId IS NULL AND @SubTeamId IS NULL) OR AssignedSubTeamId = @SubTeamId )

There’s no way that syntax is sarg-able. But it turns out that it is. SQL Server works hard and says “I see what you’re trying to do there, I’ll help you out”.
PROS: It works and it’s sarg-able.
CONS: That syntax is sooo awkward.

Using INTERSECT Syntax

This is a tip that I got straight from Itzik Ben Gan who says he got the idea from Paul White. The idea is that INTERSECT doesn’t use the idea of equality, but of distinctness for it’s comparison. We can use this to create slightly nicer syntax.

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
FROM tasks
    SELECT assignedTeamId, assignedSubTeamId
    SELECT @TeamId, @SubTeamId

The syntax is slightly less awkward, and it’s sarg-able. Or should be… But there’s a problem with this query (see if you can find it before reading further). Compare the two query plans. First the expanded where clause:

The Expanded where clause produces an efficient seek.

Here’s what the query with the INTERSECT syntax produces:

The INTERSECT syntax produces an inefficient scan

The secret to this mystery lies in that filter operator. There’s an implicit conversion there from int to bigint and that can cause a scan of the entire index. With the expanded syntax, SQL Server can handle the conversion gracefully. With the INTERSECT syntax it cannot. This was a really hard-earned lesson for us this week.

Change the parameters @TeamId and @SubTeamId to INT to match and the query becomes sarg-able again.

PROS: More elegant syntax and sarg-able (when you’re careful)
CONS: This syntax causes performance issues with mismatched types. Take extra-special care to make sure types match up.


Check it:

DECLARE @TeamId bigint = NULL,
    @SubTeamId bigint = NULL;
FROM tasks
  AND assignedSubTeamId IS NOT DISTINCT FROM @SubTeamId

Talk about elegant! That’s what we wanted from the beginning. It’s part of ANSI’s SQL 1999 standard. Paul White tells us it’s implemented internally as part of the query processor, but it’s not part of T-SQL! There’s a connect item for it… err. Or whatever they’re calling it these days. Go read all the comments and then give it a vote. There are lots of examples of problems that this feature would solve.

PROS: Super-elegant!
CONS: Invalid syntax (vote to have it included).

January 17, 2018

SHA1 Collisions in SQL Server

Takeaway: It’s been frowned on for a while, but SHA1 is definitely broken for security purposes.

In October of 2010, Michael Coles created a contest on his blog called “Find a Hash Collision, Win $100“. The contest was part of a discussion at the time about whether the SHA1 hash was useful for detecting changes. For what it’s worth, I still think SHA1 is valuable as a consistency check if not for security.

At the time no SHA1 hash collisions were known, but in 2017, the news broke that some researchers finally generated a collision. So I looked up the research paper and downloaded the files. I used OPENROWSET to get the binary strings and I created my entry for Michael Coles’ contest:

--  Begin script
DECLARE @A varbinary(8000),
      @B varbinary(8000),
      @hA binary(20),
      @hB binary(20);
-- Replace the ? below with binary strings
SELECT @A = 0x255044462D312E330A25E2E3CFD30A0A0A312030206F626A0A3C3C2F57696474682032203020522F4865696768742033203020522F547970652034203020522F537562747970652035203020522F46696C7465722036203020522F436F6C6F7253706163652037203020522F4C656E6774682038203020522F42697473506572436F6D706F6E656E7420383E3E0A73747265616D0AFFD8FFFE00245348412D3120697320646561642121212121852FEC092339759C39B1A1C63C4C97E1FFFE017F46DC93A6B67E013B029AAA1DB2560B45CA67D688C7F84B8C4C791FE02B3DF614F86DB1690901C56B45C1530AFEDFB76038E972722FE7AD728F0E4904E046C230570FE9D41398ABE12EF5BC942BE33542A4802D98B5D70F2A332EC37FAC3514E74DDC0F2CC1A874CD0C78305A21566461309789606BD0BF3F98CDA8044629A10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFE00FE0000000000000000FFE000104A46494600010101004800480000FFFE00134372656174656420776974682047494D50FFDB00430001010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFDB00430101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFC20011080008000803011100021101031101FFC40014000100000000000000000000000000000008FFC40014010100000000000000000000000000000009FFFE0006FFFE002FFFDA000C03010002100310000001539DC51FFFC4001510010100000000000000000000000000001626FFFE0006FFFE0033FFDA0008010100010502A953FFC4001F1100000309000000000000000000000000141517001316274563658695FFFE0006FFFE0041FFDA0008010301013F019A8AA56D533BB238739E612166B90E605BFFC4001E11000004070000000000000000000000000014151713274563658594FFFE0006FFFE0040FFDA0008010201013F01984E1555C155B66CDC3E04A21A444C40FFC4001E10000101090000000000000000000000001413001215164462648594FFFE0006FFFE0033FFDA0008010100063F02AD9A4DB175DCE6086D743B05BFFFC40014100100000000000000000000000000000000FFFE0006FFFE0012FFDA0008010100013F216001FFFE0006FFFE002BFFDA000C030100020003000000101FFFC40014110100000000000000000000000000000000FFFE0006FFFE0028FFDA0008010301013F106980FFC40014110100000000000000000000000000000000FFFE0006FFFE0028FFDA0008010201013F106BC7FFC40014100100000000000000000000000000000000FFFE0006FFFE0014FFDA0008010100013F10153FFFD9414E4745FFE000104A46494600010101004800480000FFFE00134372656174656420776974682047494D50FFDB00430001010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFDB00430101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFC20011080008000803011100021101031101FFC40014000100000000000000000000000000000009FFC4001501010100000000000000000000000000000607FFDA000C03010002100310000001524A5FFF00FFC40014100100000000000000000000000000000000FFDA00080101000105027FFFC40018110002030000000000000000000000000000F037A7B7FFDA0008010301013F01DFDBFD9FCFFFC40018110002030000000000000000000000000000F036A6B6FFDA0008010201013F01CA3546287FFFC40018100002030000000000000000000000000000F03767A7FFDA0008010100063F02A99C99898FFFC40014100100000000000000000000000000000000FFDA0008010100013F217FFFDA000C030100020003000000101FFFC40014110100000000000000000000000000000000FFDA0008010301013F100FFFC40014110100000000000000000000000000000000FFDA0008010201013F100FFFC40014100100000000000000000000000000000000FFDA0008010100013F100FFFD90A656E6473747265616D0A656E646F626A0A0A322030206F626A0A380A656E646F626A0A0A332030206F626A0A380A656E646F626A0A0A342030206F626A0A2F584F626A6563740A656E646F626A0A0A352030206F626A0A2F496D6167650A656E646F626A0A0A362030206F626A0A2F4443544465636F64650A656E646F626A0A0A372030206F626A0A2F4465766963655247420A656E646F626A0A0A382030206F626A0A313639330A656E646F626A0A0A392030206F626A0A3C3C0A20202F54797065202F436174616C6F670A20202F5061676573203130203020520A3E3E0A656E646F626A0A0A0A31302030206F626A0A3C3C0A20202F54797065202F50616765730A20202F436F756E7420310A20202F4B696473205B3131203020525D0A3E3E0A656E646F626A0A0A31312030206F626A0A3C3C0A20202F54797065202F506167650A20202F506172656E74203130203020520A20202F4D65646961426F78205B302030203820385D0A20202F43726F70426F78205B302030203820385D0A20202F436F6E74656E7473203132203020520A20202F5265736F75726365730A20203C3C0A202020202F584F626A656374203C3C2F496D302031203020523E3E0A20203E3E0A3E3E0A656E646F626A0A0A31322030206F626A0A3C3C2F4C656E6774682033303E3E0A73747265616D0A710A2020382030203020382030203020636D0A20202F496D3020446F0A510A656E6473747265616D0A656E646F626A0A0A0A0A787265660A30203133200A303030303030303030302036353533352066200A30303030303030303137203030303030206E200A30303030303031383631203030303030206E200A30303030303031383739203030303030206E200A30303030303031383937203030303030206E200A30303030303031393232203030303030206E200A30303030303031393435203030303030206E200A30303030303031393732203030303030206E200A30303030303031393939203030303030206E200A30303030303032303230203030303030206E200A30303030303032303736203030303030206E200A30303030303032313432203030303030206E200A30303030303032333039203030303030206E200A0A747261696C6572203C3C202F526F6F74203920302052202F53697A652031333E3E0A0A7374617274787265660A323339310A2525454F460A,
       @B = 0x255044462D312E330A25E2E3CFD30A0A0A312030206F626A0A3C3C2F57696474682032203020522F4865696768742033203020522F547970652034203020522F537562747970652035203020522F46696C7465722036203020522F436F6C6F7253706163652037203020522F4C656E6774682038203020522F42697473506572436F6D706F6E656E7420383E3E0A73747265616D0AFFD8FFFE00245348412D3120697320646561642121212121852FEC092339759C39B1A1C63C4C97E1FFFE017346DC9166B67E118F029AB621B2560FF9CA67CCA8C7F85BA84C79030C2B3DE218F86DB3A90901D5DF45C14F26FEDFB3DC38E96AC22FE7BD728F0E45BCE046D23C570FEB141398BB552EF5A0A82BE331FEA48037B8B5D71F0E332EDF93AC3500EB4DDC0DECC1A864790C782C76215660DD309791D06BD0AF3F98CDA4BC4629B10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FFFE00FE0000000000000000FFE000104A46494600010101004800480000FFFE00134372656174656420776974682047494D50FFDB00430001010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFDB00430101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFC20011080008000803011100021101031101FFC40014000100000000000000000000000000000008FFC40014010100000000000000000000000000000009FFFE0006FFFE002FFFDA000C03010002100310000001539DC51FFFC4001510010100000000000000000000000000001626FFFE0006FFFE0033FFDA0008010100010502A953FFC4001F1100000309000000000000000000000000141517001316274563658695FFFE0006FFFE0041FFDA0008010301013F019A8AA56D533BB238739E612166B90E605BFFC4001E11000004070000000000000000000000000014151713274563658594FFFE0006FFFE0040FFDA0008010201013F01984E1555C155B66CDC3E04A21A444C40FFC4001E10000101090000000000000000000000001413001215164462648594FFFE0006FFFE0033FFDA0008010100063F02AD9A4DB175DCE6086D743B05BFFFC40014100100000000000000000000000000000000FFFE0006FFFE0012FFDA0008010100013F216001FFFE0006FFFE002BFFDA000C030100020003000000101FFFC40014110100000000000000000000000000000000FFFE0006FFFE0028FFDA0008010301013F106980FFC40014110100000000000000000000000000000000FFFE0006FFFE0028FFDA0008010201013F106BC7FFC40014100100000000000000000000000000000000FFFE0006FFFE0014FFDA0008010100013F10153FFFD9414E4745FFE000104A46494600010101004800480000FFFE00134372656174656420776974682047494D50FFDB00430001010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFDB00430101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101FFC20011080008000803011100021101031101FFC40014000100000000000000000000000000000009FFC4001501010100000000000000000000000000000607FFDA000C03010002100310000001524A5FFF00FFC40014100100000000000000000000000000000000FFDA00080101000105027FFFC40018110002030000000000000000000000000000F037A7B7FFDA0008010301013F01DFDBFD9FCFFFC40018110002030000000000000000000000000000F036A6B6FFDA0008010201013F01CA3546287FFFC40018100002030000000000000000000000000000F03767A7FFDA0008010100063F02A99C99898FFFC40014100100000000000000000000000000000000FFDA0008010100013F217FFFDA000C030100020003000000101FFFC40014110100000000000000000000000000000000FFDA0008010301013F100FFFC40014110100000000000000000000000000000000FFDA0008010201013F100FFFC40014100100000000000000000000000000000000FFDA0008010100013F100FFFD90A656E6473747265616D0A656E646F626A0A0A322030206F626A0A380A656E646F626A0A0A332030206F626A0A380A656E646F626A0A0A342030206F626A0A2F584F626A6563740A656E646F626A0A0A352030206F626A0A2F496D6167650A656E646F626A0A0A362030206F626A0A2F4443544465636F64650A656E646F626A0A0A372030206F626A0A2F4465766963655247420A656E646F626A0A0A382030206F626A0A313639330A656E646F626A0A0A392030206F626A0A3C3C0A20202F54797065202F436174616C6F670A20202F5061676573203130203020520A3E3E0A656E646F626A0A0A0A31302030206F626A0A3C3C0A20202F54797065202F50616765730A20202F436F756E7420310A20202F4B696473205B3131203020525D0A3E3E0A656E646F626A0A0A31312030206F626A0A3C3C0A20202F54797065202F506167650A20202F506172656E74203130203020520A20202F4D65646961426F78205B302030203820385D0A20202F43726F70426F78205B302030203820385D0A20202F436F6E74656E7473203132203020520A20202F5265736F75726365730A20203C3C0A202020202F584F626A656374203C3C2F496D302031203020523E3E0A20203E3E0A3E3E0A656E646F626A0A0A31322030206F626A0A3C3C2F4C656E6774682033303E3E0A73747265616D0A710A2020382030203020382030203020636D0A20202F496D3020446F0A510A656E6473747265616D0A656E646F626A0A0A0A0A787265660A30203133200A303030303030303030302036353533352066200A30303030303030303137203030303030206E200A30303030303031383631203030303030206E200A30303030303031383739203030303030206E200A30303030303031383937203030303030206E200A30303030303031393232203030303030206E200A30303030303031393435203030303030206E200A30303030303031393732203030303030206E200A30303030303031393939203030303030206E200A30303030303032303230203030303030206E200A30303030303032303736203030303030206E200A30303030303032313432203030303030206E200A30303030303032333039203030303030206E200A0A747261696C6572203C3C202F526F6F74203920302052202F53697A652031333E3E0A0A7374617274787265660A323339310A2525454F460A;
      @hB = HASHBYTES('SHA1', @B);
                  THEN '@A Equals @B'
                  ELSE '@A Is Not Equal To @B'
                  END AS AB_Equal,
            CASE WHEN @hA = @hB
                  THEN '@hA Equals @hB'
                  ELSE '@hA Is Not Equal To @hB'
                  END AS Hash_Equal;
-- End script

This gives me the output that wins the contest:

Unfortunately upon closer inspection, I see that the rules of the contest say that entries must be received prior to midnight U.S. Eastern Standard Time on October 31, 2010.

Rats, 7 years too late!

January 15, 2018

100 Percent Online Deployments: Stage and Switch

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:42 pm
100 Percent Online Deployments
How to deploy schema changes without scheduled downtime

In the first draft of this series, this post didn’t exist. I wanted to show a really simple example of a column switch and include it in the Blue-Green (Details) post. I planned for something simple. But I ran into some hiccups that I though were pretty instructive, so I turned it into the post you see here.

The Plan

For this demo, I wanted to use the WideWorldImporters database. In table Warehouse.ColdRoomTemperatures I wanted to change the column

ColdRoomSensorNumber INT NOT NULL,


ColdRoomSensorLabel NVARCHAR(100) NOT NULL,

because maybe we want to track sensors via some serial number or other code.

The Blue-Green plan would be simple:

The Trouble

But nothing is ever easy. Even SQL Server Data Tools (SSDT) gives up when I ask it to do this change with this error dialog:

Never Easy

There’s two things going on here (and one hidden thing):

  1. The first two messages point out that a procedure is referencing the column ColdRoomSensorNumber with schemabinding. The reason it’s using schemabinding is because it’s a natively compiled stored procedure. And that tells me that the table Warehouse.ColdRoomTemperatures is an In-Memory table. That’s not all. I noticed another wrinkle. The procedure takes a table-valued parameter whose table type contains a column called ColdRoomSensorLabel. We’re going to have to replace that too. Ugh. Part of me wanted to look for another example.
  2. The last message tells me that the table is a system versioned table. So there’s a corresponding archive table where history is maintained. That has to be dealt with too. Luckily Microsoft has a great article on Changing the Schema of a System-Versioned Temporal Table.
  3. One last thing to worry about is a index on ColdRoomSensorNumber. That should be replaced with an index on ColdRoomSensorLabel. SSDT didn’t warn me about that because apparently, it can deal with that pretty nicely.

So now my plan becomes:

Blue The original schema

Aqua After the pre-migration scripts are run

An extra step is required here to update the new column and keep the new and old columns in sync.

Green After the switch, we clean up the old objects and our schema change is finished:

Without further ado, here are the scripts:

Pre-Migration (Add Green Objects)

In the following scripts, I’ve omitted the IF EXISTS checks for clarity.

-- Add the four green objects
ALTER TABLE Warehouse.ColdRoomTemperatures
ADD ColdRoomSensorLabel NVARCHAR(100) NOT NULL 
    CONSTRAINT DF_Warehouse_ColdRoomTemperatures_ColdRoomSensorLabel DEFAULT '';
ALTER TABLE Warehouse.ColdRoomTemperatures
ADD INDEX IX_Warehouse_ColdRoomTemperatures_ColdRoomSensorLabel (ColdRoomSensorLabel);
CREATE TYPE Website.SensorDataList_v2 AS TABLE(
    SensorDataListID int IDENTITY(1,1) NOT NULL,
    ColdRoomSensorLabel VARCHAR(100) NULL,
    RecordedWhen datetime2(7) NULL,
    Temperature decimal(18, 2) NULL,
CREATE PROCEDURE Website.RecordColdRoomTemperatures_v2
    @SensorReadings Website.SensorDataList_v2 READONLY
    --straight-forward definition left as exercise for reader

Pre-Migration (Populate and Keep in Sync)

Normally, I would use triggers to keep the new and old column values in sync like this, but you can’t do that with In-Memory tables. So I altered the procedure Website.RecordColdRoomTemperatures to achieve something similar. The only alteration I made is to set the ColdRoomSensorLabel value in the INSERT statement:

ALTER PROCEDURE Website.RecordColdRoomTemperatures
@SensorReadings Website.SensorDataList READONLY
    LANGUAGE = N'English'
        DECLARE @NumberOfReadings int = (SELECT MAX(SensorDataListID) FROM @SensorReadings);
        DECLARE @Counter int = (SELECT MIN(SensorDataListID) FROM @SensorReadings);
        DECLARE @ColdRoomSensorNumber int;
        DECLARE @RecordedWhen datetime2(7);
        DECLARE @Temperature decimal(18,2);
        -- note that we cannot use a merge here because multiple readings might exist for each sensor
        WHILE @Counter <= @NumberOfReadings
            SELECT @ColdRoomSensorNumber = ColdRoomSensorNumber,
                   @RecordedWhen = RecordedWhen,
                   @Temperature = Temperature
            FROM @SensorReadings
            WHERE SensorDataListID = @Counter;
            UPDATE Warehouse.ColdRoomTemperatures
                SET RecordedWhen = @RecordedWhen,
                    Temperature = @Temperature
            WHERE ColdRoomSensorNumber = @ColdRoomSensorNumber;
            IF @@ROWCOUNT = 0
                INSERT Warehouse.ColdRoomTemperatures
                    (ColdRoomSensorNumber, ColdRoomSensorLabel, RecordedWhen, Temperature)
                VALUES (@ColdRoomSensorNumber, 
                        'HQ-' + CAST(@ColdRoomSensorNumber AS NVARCHAR(50)), 
            SET @Counter += 1;
        THROW 51000, N'Unable to apply the sensor data', 2;
        RETURN 1;

That keeps the values in sync for new rows. But now it’s time to update the values for existing rows. In my example, I imagine that the initial label for the sensors are initially: “HQ-1”, “HQ-2”, etc…

UPDATE Warehouse.ColdRoomTemperatures
SET ColdRoomSensorLabel = 'HQ-' + CAST(ColdRoomSensorNumber as nvarchar(50));

Eagle-eyed readers will notice that I haven’t dealt with the history table here. If the history table is large use batching to update it. Or better yet, turn off system versioning and then turn it back on immediately using a new/empty history table (if feasible).


After a successful switch, the green application is only calling Website.RecordColdRoomTemperatures_v2. It’s time now to clean up. Again, remember that order matters.

DROP PROCEDURE Website.RecordColdRoomTemperatures;
DROP TYPE Website.SensorDataList;
ALTER TABLE Warehouse.ColdRoomTemperatures
DROP INDEX IX_Warehouse_ColdRoomTemperatures_ColdRoomSensorNumber;
ALTER TABLE Warehouse.ColdRoomTemperatures
DROP COLUMN ColdRoomSensorNumber;

January 12, 2018

100 Percent Online Deployments: Keep Changes OLTP-Friendly

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:00 am
100 Percent Online Deployments
How to deploy schema changes without scheduled downtime

Using the Blue-Green deployment method, database changes are decoupled from applications changes. That leaves us with one last challenge to tackle. The schema changes have to be performed while the application is online. It’s true that you can’t always write an online script for every kind of schema change you want.

I got the moves like Jagger

The challenge of writing online schema changes is essentially a concurrency problem and the guiding principle I follow is: Do whatever you need to do, but avoid excessive blocking.

Locks Are Hot Potatoes

You can’t hold them for long. This applies to schema changes too. Logically if you don’t hold a lock long, you can’t block activity. One exception might be the SCH-M lock which can participate in blocking chains:

SCH-M locks

There are two main kinds of SQL queries. SELECT/INSERT/UPDATE/DELETE statements are examples of Data Manipulation Language (DML). CREATE/ALTER/DROP statements are examples of Data Definition Language (DDL).

With schema changes – DDL – we have the added complexity of the SCH-M lock. It’s a kind of lock you don’t see with DML statements. DML statements take and hold schema stability locks (SCH-S) on the tables they need. This can cause interesting blocking chains between the two types where new queries can’t start until the schema change succeeds:

Some suggestions:

  • Don’t rebuild indexes while changing schema
  • Rely on the OLTP workload which has many short queries. In an OLTP workload, the lead blocker shouldn’t be a lead blocker for long. Contrast that with an OLAP workload with long-running and overlapping queries. OLAP workloads can’t tolerate changing tables without delays or interruptions.
  • When using Enterprise Edition, use ONLINE=ON for indexes. It takes and holds a SCH-M lock only briefly.

Changes to Big Tables

Scripts that change schema are one-time scripts. If the size of the table is less than 50,000 rows, I write a simple script and then move on.

If the table is larger, look for metadata-only changes. For example, these changes are metadata-only changes:

If a table change is not a meta-data change, then it’s a size-of-data change. Then it’s time to get creative. Look for my other post in this series for an example of batching and an example of a column switcheroo.

Pragmatism Example

If you think “good enough” is neither, you may want to skip this section. There are some schema changes that are still very difficult or impossible to write online. With some creativity, we’ve always been able to mitigate these issues with shortcuts and I want to give an example which I think is pretty illustrative.

When a colleague asked for a rowversion column on a humongous table. We avoided that requirement by instead creating a datetime column called LastModifiedDate. Since 2012, new columns with constant default values are online. So we added the column with a constant default, and then changed the default value to something more dynamic:

alter table dbo.MYTABLE
add LastModifiedDate DATETIME NOT NULL 
    CONSTRAINT DF_TABLE_LastModifiedDate DEFAULT '20000101'
alter table dbo.MYTABLE
drop CONSTRAINT DF_TABLE_LastModifiedDate;
alter table dbo.MYTABLE
add CONSTRAINT DF_TABLE_LastModifiedDate 
      DEFAULT GETUTCDATE() for LastModifiedDate;

It’s a cool situation because it seems like the column has two defaults, one constant default for rows with missing values. And another definition to be used for new rows:

select pc.default_value, d.definition as [default definition]
from sys.system_internals_partitions p
join sys.system_internals_partition_columns pc 
	on p.partition_id = pc.partition_id
join sys.default_constraints d
	on d.parent_object_id = p.object_id
	and d.parent_column_id = pc.partition_column_id
where p.object_id = object_id('MYTABLE')
and pc.partition_column_id = 2
/* Gives 
default_value  default definition
-------------  ------------------
2000-01-01     (getutcdate())

So be creative and pragmatic. Successful 100% online schema changes involve creativity and close collaboration between everyone involved.

January 10, 2018

100 Percent Online Deployments: Blue Green Details

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:54 am
100 Percent Online Deployments
How to deploy schema changes without scheduled downtime

So now for the nitty gritty. In my last post, Blue-Green Deployment, I talked about replacing old blue things with new green things as an alternative to altering things. But Blue-Green doesn’t work with databases, so I introduced the Blue-Aqua-Green method. This helps keep databases and other services online 24/7.

The Aqua Database

What does the Aqua database look like? It’s a smaller version of Blue-Green, but only for those database objects that are being modified. Borrowing some icons from Management Studio’s Object Explorer, here’s what one Blue-Aqua-Green migration might look like:

Start with a database in the original blue state:

After the pre-migration scripts, the database is in the aqua state, the new green objects have been created and are ready for traffic from the green application servers. Any type of database object can use the Blue-Green method. Even objects as granular as indexes or columns.

Finally when the load has switched over to the green servers and they’re nice and stable, run the post-migration steps to get to the green state.

Blue-Green for Database Objects

How is the Blue-Green method applied to each kind of database object? With care. Each kind of object has its own subtle differences.

Procedures are very easy to Blue-Green. Brand new procedures are added during the pre-migration phase. Obsolete procedures are dropped during the post-migration phase.

If the procedure is changing but is logically the same, then it can be altered during the pre-migration phase. This is common when the only change to a procedure is a performance improvement.

But if the procedure is changing in other ways. For instance, when a new parameter is added, or dropped, or the resultset is changing. Then use the Blue-Green method to replace it: During the pre-migration phase, create a new version of the procedure. It must be named differently and the green version of the application has to be updated to call the new procedure. The original blue version of the procedure is deleted during the post-migration phase. It’s not always elegant calling a procedure something like s_USERS_Create_v2 but it works.

Views are treated the same as procedures with the exception of indexed views.
That SCHEMA_BINDING keyword is a real thorn in the side of Blue-Green and online migrations in general. If you’re going to use indexed views, remember that you can’t change the underlying tables as easily.

Creating an index on a view is difficult because (ONLINE=ON) can’t be used. If you want to get fancy go look at How to Create Indexed Views Online.

The creation of other indexes are nice and easy if you have Enterprise Edition because you can use the (ONLINE=ON) keyword. But if you’re on Standard Edition, you’re a bit stuck. In SQL Server 2016 SP1, Microsoft included a whole bunch of Enterprise features into Standard, but ONLINE index builds didn’t make the cut.

If necessary, the Blue-Green process works for indexes that need to be altered too. The blue index and the green index will exist at the same time during the aqua phase, but that’s usually acceptable.

Creating constraints like CHECKS and FOREIGN KEYS can be tricky because they require size-of-data scans. This can block activity for the duration of the scan.

My preferred approach is to use the WITH NOCHECK syntax. The constraint is created and enabled, but existing data is not looked at. The constraint will be enforced for any future rows that get updated or inserted.

That seems kind of weird at first. The constraint is marked internally as not trusted. For peace of mind, you could always run a query on the existing data.

The creation of tables doesn’t present any problems, it’s done in the pre-migration phase. Dropping tables is done in the post-migration phase.

What about altering tables? Does the Blue-Green method work? Replacing a table while online is hard because it involves co-ordinating changes during the aqua phase. One technique is to create a temporary table, populate it, keep it in sync and cut over to it during the switch. It sounds difficult. It requires time, space, triggers and an eye for detail. Some years ago, I implemented this strategy on a really complicated table and blogged about it if you want to see what that looks like.

If this seems daunting, take heart. A lot of this work can be avoided by going more granular: When possible, Blue-Green columns instead.

New columns are created during the pre-migration phase. If the table is large, then the new columns should be nullable or have a default value. Old columns are removed during the post-migration phase.

Altering columns is sometimes easy. Most of the time altering columns is quick like when it only involves a metadata change.

But sometimes it’s not easy. When altering columns on a large table, it may be necessary to use the Blue-Green technique to replace a column. Then you have to use triggers and co-ordinate the changes with the application, but the process is much easier than doing it for a whole table. Test well and make sure each step is “OLTP-Friendly”. I will give an example of the Blue-Green method for a tricky column in the post “Stage and Switch”.

Computed persisted columns can be challenging. When creating persisted computed columns on large tables, they can lock the table for too long. Sometimes indexed views fill the same need.

Technically, data changes are not schema changes but migration scripts often require data changes to so it’s important to keep those online too. See my next post “Keep Changes OLTP-Friendly”


Easy things should be easy and hard things should be possible and this applies to writing migration scripts. Steve Jones asked me on twitter about “some more complex idempotent code”. He would like to see an example of a migration script that is re-runnable when making schema changes. I have the benefit of some helper migration scripts procedures we wrote at work. So a migration script that I write might look something like this:

declare @Columns migration.indexColumnSet;
INSERT @Columns (column_name, is_descending_key, is_included_column)
VALUES ('UserId', 0, 0)
exec migration.s_INDEX_AlterOrCreateNonClustered_Online
	@ObjectName = 'SOME_TABLE',
	@IndexName = 'IX_SOME_TABLE_UserId',
	@IndexColumns = @Columns;

We’ve got these helper scripts for most standard changes. Unfortunately, I can’t share the definition of s_INDEX_AlterOrCreateNonClustered_Online because it’s not open source. But if you know of any products or open source scripts that do the same job, let me know. I’d be happy to link to them here.

Where To Next?

So that’s Blue-Green, or more accurately Blue-Aqua-Green. Decoupling database changes from application changes allows instant cut-overs. In the next post Keep Changes OLTP-Friendly I talk about what migration scripts are safe to run concurrently with busy OLTP traffic.

January 8, 2018

100 Percent Online Deployments: Blue-Green Deployment

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:06 pm
100 Percent Online Deployments
How to deploy schema changes without scheduled downtime

The Blue-Green technique is a really effective way to update services without requiring downtime. One of the earliest references I could find for the Blue-Green method is in a book called Continuous Delivery by Humble and Farley. Martin Fowler also gives a good overview of it at BlueGreenDeployment. Here’s a diagram of the typical blue green method (adapted from Martin Fowler).

When using the Blue-Green method, basically nothing gets changed. Instead everything gets replaced. We start by setting up a new environment – the green environment – and then cut over to it when we’re ready. Once we cut over to the new environment successfully, we’re free to remove the original blue environment. The technique is all about replacing components rather than altering components.


Before I talk about the database (databases), notice a couple things. We need a router: Load balancers are used to distribute requests but can also be used to route requests. This enables the quick cut-over. The web servers or application servers have to be stateless as well.

What About The Database Switch?

The two databases in the diagram really threw me for a loop the first time I saw this. This whole thing only works if you can replace the database on a whim. I don’t know about you, but this simply doesn’t work for us. The Continuous Delivery book suggests putting the database into read-only mode. Implementing a temporary read-only mode for applications is difficult and rare (I’ve only ever heard of Stackoverflow doing something like this succesfully).

But we don’t do that. We want our application to be 100% online for reads and writes. We’ve modified the Blue-Green method to work for us. Here’s how we change things:

Modified Blue-Green: The Aqua Database

Leave the database where it is and decouple the database changes from the applications changes. Make database changes ahead of time such that the db can serve blue or green servers. We call this forward compatible database version “aqua”.

The changes that are applied ahead of time are “pre-migration” scripts. The changes we apply afterwards are “post-migration” scripts. More on those later. So now our modified Blue-Green migration looks like this:

Start with the original unchanged state of a system:

Add some new green servers:

Apply the pre-migration scripts to the database. The database is now in an “aqua” state:

Do the switch!

Apply the post-migration scripts to the database. The database is now in the new “green” state:

Then remove the unused blue servers when you’re ready:

We stopped short of replacing the entire database. That “aqua” state for the database is the Blue-Green technique applied to individual database objects. In my next post, I go into a lot more detail about this aqua state with examples of what these kind of changes look like.

Ease in!

It takes a long time to move to a Blue-Green process. It took us a few years. But it’s possible to chase some short-term intermediate goals which pay off early:

Start with the goal of minimizing downtime. For example, create a pre-migration folder. This folder contains migration scripts that can be run online before the maintenance window. The purpose is to reduce the amount of offline time. New objects like views or tables can be created early, new indexes too.

Process changes are often disruptive and the move to Blue-Green is no different. It’s good then to change the process in smaller steps (each step with their own benefits).

After adding the pre-migration folder, continue adding folders. Each new folder involves a corresponding change in process. So over time, the folder structure evolves:

  • The original process has all changes made during an offline maintenance window. Make sure those change scripts are checked into source control and put them in a folder called offline: (offline)
  • Then add a pre-migration folder as described above: (pre, offline)
  • Next add a post-migration folder which can also be run while online: (pre, offline, post)
  • Drop the offline step to be fully online: (pre, post)


Automated deployments allow for more frequent deployments. Automated tools and scripts are great at taking on the burden of menial work, but they’re not too good at thinking on their feet when it comes to troubleshooting unexpected problems. That’s where safety comes in. By safety, I just mean that as many risks are mitigated as possible. For example:

Re-runnable Scripts
If things go wrong, it should be easy to get back on track. This is less of an issue if each migration script is re-runnable. By re-runnable, I just mean that the migration script can run twice without error. Get comfortable with system tables and begin using IF EXISTS everywhere:

-- not re-runnable:

The re-runnable version:

-- re-runnable:
                 FROM sys.indexes 
                WHERE name = 'IX_RUN_MANY' 
                  AND OBJECT_NAME(object_id) = 'RUN_MANY' 
                  AND OBJECT_SCHEMA_NAME(object_id) = 'dbo')

Avoid Schema Drift
Avoid errors caused by schema drift by asserting the schema before a deployment. Unexpected schema definitions lead to one of the largest classes of migration script errors. Errors that surprise us like “What do you mean there’s a foreign key pointing to the table I want to drop? That didn’t happen in staging!”

Schema drift is real and almost inevitable if you don’t look for it. Tools like SQL Compare are built to help you keep an eye on what you’ve got versus what’s expected. I’m sure there are other tools that do the same job. SQL Compare is just a tool I’ve used and like.

Schema Timing
When scripts are meant to be run online, duration becomes a huge factor so it needs to be measured.

When a large number of people contribute migration scripts, it’s important to keep an eye on the duration of those scripts. We’ve set up a nightly restore and migration of a sample database to measure the duration of those scripts. If any script takes a long time and deserves extra scrutiny, then it’s better to find out early.

Measuring the duration of these migration scripts helps us determine whether they are “OLTP-Friendly” which I elaborate on in Keep Changes OLTP Friendly.

Tackle Tedious Tasks with Automation

That’s a lot of extra steps and it sounds like a lot of extra work. It certainly is and the key here is automation. Remember that laziness is one of the three great virtues of a programmer. It’s the “quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs…”. That idea is still true today.

Coming Next: Blue-Green Deployment (Details).

December 20, 2017

When Measuring Timespans, try DATEADD instead of DATEDIFF

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:25 am

Recently I tackled an issue where a DateTime field was getting updated too often. The query looked something like this:

WHERE SomeTableId = @SomeTableId;

So I decided to give up accuracy for concurrency. Specifically, I decided to only update MyDateTime if the existing value was more than a second ago.

First Attempt: Use DATEDIFF With SECOND

My first attempt looked like this:

WHERE SomeTableId = @SomeTableId

But I came across some problems. I assumed that the DATEDIFF function I wrote worked this way: Subtract the two dates to get a timespan value and then return the number of seconds (rounded somehow) in that timespan.

But that’s not how it works. The docs for DATEDIFF say:

“Returns the count (signed integer) of the specified datepart boundaries crossed between the specified startdate and enddate.”

There’s no rounding involved. It just counts the ticks on the clock that are heard during a given timespan.

Check out this timeline. It shows three timespans and the DATEDIFF values that get reported:

But that’s not the behavior I want.


Using milliseconds gets a little more accurate:

WHERE SomeTableId = @SomeTableId

And it would be good for what I need except that DATEDIFF using MILLISECOND will overflow for any timespan over a month. For example,

SELECT DATEDIFF (millisecond, '2017-11-01', '2017-12-01')

gives this error:

Msg 535, Level 16, State 0, Line 1
The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.

SQL Server 2016 introduced DATEDIFF_BIG to get around this specific problem. But I’m not there yet.


I eventually realized that I don’t actually need to measure a timespan. I really just need to answer the question “Does a particular DateTime occur before one second ago?” And I can do that with DATEADD

WHERE SomeTableId = @SomeTableId

Update: Adam Machanic points out another benefit to this syntax. The predicate AND MyDateTime < DATEADD(SECOND, -1, GETUTCDATE()) syntax is SARGable (unlike the DATEDIFF examples). Even though there might not be a supporting index or SQL Server might not choose to use such an index it in this specific case, I prefer this syntax even more.

So How About You?

Do you use DATEDIFF at all? Why? I'd like to hear about what you use it for. Especially if you rely on the datepart boundary crossing behavior.

Older Posts »

Powered by WordPress