Michael J. Swart

April 17, 2013

The Sch-M lock is Evil

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

A necessary evil, but still evil. Why? Because it it won’t share with Sch-S and Sch-S is taken and held for absolutely everything (Yes, even your NOLOCK queries). And that can lead to some interesting concurrency problems. Let me explain.

Sch-M

Sch-M is an abbreviation for Schema Modification. It refers to a kind of lock that is taken on a table/index or other object whenever you want to modify that object. SQL Server allows only one Sch-M lock on an object at a time. So if you want to modify a table, your process waits to takes a Sch-M lock on that table. Once that lock is granted, the modification is completed and then the lock is released.

Sch-S

Sch-S is an abbreviation for Schema Stability. It is a kind of lock that is taken on an object when a process doesn’t want that object to change its definition. It makes sense. If I’m reading a set of rows through a table, I don’t want a column to disappear on me half way through. SQL Server allows many Sch-S locks on a table.

Sch-S vs. Sch-M

But Sch-S locks are incompatible with Sch-M locks. This means that when you want to modify a table, you’re not granted a Sch-M lock immediately. You have to wait for everyone using that table to finish using it. You’re essentially put on hold until the existing queries complete and their existing Sch-S locks are released. This also means that while you’re waiting, every query who wants to begin using that table is waiting in line behind you too. Basically “Everybody outta the pool while the lifeguards change shifts.” But that’s usually acceptable right? Database schema modifications are a big enough change to require a maintenance window.

Index Rebuilds Are Table Modifications

It’s true, if you have the luxury of maintenance windows for your DB changes, you’ll be alright. But you also have to consider your database maintenance plans (automated or otherwise). Those plans can launch index rebuilds while the database is online. And all index rebuilds also count as table modifications and take Sch-M locks. An index rebuild has syntax like this:

ALTER INDEX [PK_MyTable] ON [MyTable] REBUILD WITH (ONLINE=ON)

Hopefully you’ve remembered that ONLINE=ON part. When you use that part, the index is rebuilt in the background and at the end of that processing time, a Sch-M lock is taken and released very quickly.

But maybe you’re not so lucky. Maybe you’re not running 2012 yet and have an index that includes blobs. Or maybe you’re running on Standard Edition. In those cases you won’t be able to use the ONLINE=ON feature. In that case, the Sch-M lock is taken by the rebuild process and it’s held the entire time that index is rebuilt. During the rebuild, that index is now truly offline. No access for you.

You Can Get Stuck

Just like I did. A while ago, I was asked to help with this exact situation. An index rebuild had been running for hours it was offline and the Sch-M lock that was held was preventing anybody from using or even looking at that table. I was stuck between a rock and a hard place. I had to choose between letting the index rebuild complete (which could take hours) or cancelling the job (whose rollback could take hours). There was nothing I could do to avoid additional hours of downtime. There was another bit of irony in my situation. We didn’t care about the existing data in that table. A truncate table or drop/recreate table would have suited us just fine.

… Like Really Stuck

It occurred to me to try something out. What if I created an identical empty table with a different name. We didn’t need any data in the locked table. So using a new table could work. And because the locked table is only accessed by stored procedures, I can modify those procedures to use the new table instead of the locked one.

Nope!

For some reason, the ALTER PROCEDURE requires a Sch-S lock on the old table, the table it no longer refers to. The sprocs can’t operate or be modified without a Sch-S lock on that locked table. This sketch illustrates my situation.

Make sure your index plans never attempt to rebuild big indexes offline.

Extras

Some relevant DBAReactions gifs:

April 3, 2013

Altering Text Columns: Only a Metadata Change?

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

Say you want to change the type of a text column using the ALTER TABLE … ALTER COLUMN syntax. It is valuable to know how much work SQL Server will have to do to fulfill your request. When your tables are large, it can mean the difference between a maintenance window that lasts five minutes, or one that lasts five hours or more.

I give a list of exactly when you’ll feel that pain and when you won’t.

A joke about char(max)

(BTW, CHAR(MAX) columns are impossible, you’ll get the gag if you figure out why)

When is the Whole Table Processed?

Here are conditions which require processing the entire table:

  • switching from unicode to non-unicode or vice versa.
  • changing a column from nullable to not nullable.
  • going from fixed length field to variable length field or vice versa.
  • decreasing the maximum length of a field.
  • increasing the maximum length of a fixed length field.
  • converting from limited length columns to unlimited or vice versa. (e.g. varchar(10) to varchar(max)).
  • collation modifications on non-unicode columns that change character set or code page. (See Collation Hell Part 3 by Dan Guzman)

On large tables, any of the above alterations will be a heavy hitter and will take time and fill transaction log (except that a shrink of fixed length fields seems to only require a scan).

What Changes are Metadata Only Changes?

That’s a lot of conditions! What’s alterations are left?

Not much:

  • Increasing the maximum length of a variable length column.
  • Changing the type from text to varchar(max).
  • Changing the type from ntext to nvarchar(max).
  • Any of the above while making a non-nullable field nullable.
  • Any of the above with a change in collation (with some big caveats, see Collation Hell Part 3 by Dan Guzman).

These changes are metadata only changes which means SQL Server doesn’t have to touch any of the actual data. So the size of the table will not impact the time it takes SQL Server to process the ALTER TABLE command. But see some notes about concurrency below.

Some Notes

Some notes about the above:

  • I ignored and make no claims about migrations where text or ntext is the target column type because the exceptions are strange and that scenario seems to fall under “Why would you want to do that?”
  • The above applies to only versions I’ve tested. Specifically 2008, and 2012.
  • The metadata-only changes I described above is not entirely on-line. There are still concurrency concerns to watch out for. These ALTER statements still request Schema modification (Sch-M) locks on the table, and once granted, only hold them briefly. But if you try to alter a column on a live environment and some long running query blocks your ALTER TABLE statement, then other queries that need access to the table will be blocked as well. 
  • Terms I used
    • fixed length: char(x), nchar(x)
    • variable length: varchar(x), nvarchar(x)
    • unlimited length: varchar(max), nvarchar(max), text, ntext
    • unicode: nchar(x), nvarchar(x), nvarchar(max), ntext
    • non-unicode: char(x), varchar(x), varchar(max), text

March 19, 2013

Checking Out Amazon Redshift

In order to refresh my memory about what I learned in University, I’ve been watching a course that UC Berkeley made available on Youtube. It’s been a good course so far. Recently I got to the topic of logical fallacies. And so I’m reminded that the following is not a valid argument: Jimmy advises X; Jimmy profits if you follow advice X; Therefore, Not X. It’s not valid in the academic sense. In the practical sense, I have to remember to not distrust all commercials and marketing videos.

But it’s hard, especially when Googling for “Big Data” or “Analytics”, I find it very difficult to find anything to read that’s both unbiased and useful. Maybe it’s because I’m still skeptical about any solution that is promoted by the people who stand to profit from following their advice (I’m trying not to discount their advice, I just squint at it a little).

So when Amazon came out with a publicly available Beta for their new Redshift Datawarehouse service (accompanied by a slick marketing video), I decided to kick the tires. Here’s some notes I made:

Amazon’s Redshift Commercial

Their commercial is here. And I have to say it reminded me of an infomercial. You know the kind where they try to sell fancy mop gadgets by showing how ridiculously clumsy people can be with regular mops. The Amazon commercial is light on technical details but I don’t think it was meant for an audience like me and you. I made a conscious effort not to hold that against them.

Warehouse in the cloud

Having a warehouse in the cloud makes a lot of sense in some ways. Pay-as-you-go pricing is what you want when you prefer operational costs over capital costs. Many businesses who don’t have I.T. as a core competency will find this a pretty attractive alternative to running servers or an appliance themselves. But it can get expensive quick, Amazon advertises less than $1000 / Terabyte / year. But with upfront costs for reserved pricing and a two terabyte minimum. The smallest rate you can get away with is $3000 per year for three years. In practice it will likely be significantly more. I can’t see anyone choosing Redshift without doing due diligence on the pricing, but it’s probably not going to be dirt cheap.

Connecting to Redshift
Star Trek's Jean Luc Picard says "Very well Data, open a channel"
Connections are made to Redshift only through ODBC (or JDBC) using Postgres drivers. Some challenges:

  • Picking my way through 32 bit vs 64 bit was tricky. Getting and using the right driver took some care.
  • Uploading through this connection is slow. So don’t try it this way. I mean it’s possible, but it’s simply not recommended. I learned this lesson not by reading through Amazon docs, but by attempting to run an SSIS job to funnel data into an ODBC destination. It was actually working, but it was impractical because it was so slow.

Creating the warehouse schema wasn’t too hard: I scripted a subset of tables from an existing warehouse. I stripped indexes, constraints and identities. There were a couple syntax differences (int becomes integer, bit becomes boolean, and GO becomes semicolon) but it went pretty smooth. For Redshift, in lieu of indexes, you choose columns on each table for the distribution key and the sort key. Underneath the covers, Redshift uses ParAccel and so if you’re familiar with that, you’ve got a great head start. Otherwise, Amazon’s Developer Guide is a good start. I’d like to hear more from others who know more about modeling for Redshift; it seems like a regular star schema will work well here.

Loading data is best through S3

I created a SSIS package that takes data from my relational warehouse. It takes that data and dumps it into delimited files (Hat tip to @peschkaj for advice on delimiter values). Then I gzipped the whole thing. I loaded those gzipped files into Amazon S3 and loaded data into Redshift using those files. Initially, I was afraid of Amazon S3 (What? I have to sign up for and learn about another service?) but working with Amazon S3 was dead simple.

Consuming data

I connected a Tableau client to Redshift using an ODBC connection. This Tableau discussion tells me that’s the current way to connect Tableau and Redshift. There are quite a few SQL limitations imposed by the ODBC protocol. So the Tableau experience was not too smooth. Tableau has a new Redshift connector coming out in a matter of weeks which should clear these limitations.
The point is that Amazon claims Redshift integrates seamlessly with a list of reporting apps (including Tableau). I believe it really will by the time Redshift is released, it’s just right now there’s a pretty big seam.

Next?

I’m going to get around to evaluating Google Big Query and of course Microsoft’s HDInsight. You might notice that lately, I find myself in a phase of learning (fun!) and so this post and the next few will be a description of my exploration.

February 28, 2013

Follow up on Ad Hoc TVP contention

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Last month I posted PAGELATCH Contention on 2:1:103. It described a troubleshooting experience I had. I was troubleshooting a performance problem that was felt when applications issued very frequent ad hoc queries that used Table Valued Parameters (TVPs).

I’ve recently learned some extra things from the SQL Server team at Microsoft that I wanted to pass on. So this post acts as a kind of update to that one.

I know. I've drawn Columbo before, but it's my blog.

Reproduction

One thing I did was to come up with an application that reproduces the contention on 2:1:103 in the same way as production. I share that here now.

What this reproduction does is launch fifty processes. Each process uses multiple threads to hit a db server with an ad hoc query that uses a TVP.  For the curious here it is: Download AdHocTVPContention.zip and then open readme.txt. This reproduction is a mashup of

  • A simple batch script which launches an executable 50 times at once. A technique I gleaned from Paul Randal in a demo of his.
  • A multi-threaded executable which sends queries asynchronously. A framework I first developed in a post I wrote on concurrency.
  • The ad hoc TVP query itself. It was a bit of a modification of my example at Table Valued Parameters, A Short Complete Example. The only difference is that the command is a query instead of a stored procedure.

What I’ve Learned Since Then

The Microsoft team helped me understand a few things. The main thing is that the particular version of SQL Server (including service pack level) matters a lot:

SQL Server 2005 and Earlier

TVPs weren’t supported then. (This post isn’t for you).

SQL Server 2008 and R2

In my earlier post, I dismissed a fix provided by Microsoft. Fix: Poor performance in SQL Server 2008 R2 when table-valued functions use many table variables. At the time, I dismissed it for a number of reasons:

  • The workaround listed on that kb article page did not help us.
  • The symptoms that were described at that issue did not match my own scenario (i.e. I wasn’t using table-valued functions).
  • The title mentioned R2 and I thought (mistakenly) that this fix was not applicable to SQL Server 2008. (Update April 5, 2013: The fix used to be available as a hotfix for 2008 SP3, but as of March 2013, this fix is now included in a cumulative update. Upgrade to SP3 CU10 if you’re using 2008 and want this fix. Microsoft has also updated the title of the fix to mention 2008)

Microsoft helped me learn that although the symptoms in that fix differ from mine, the improvement in that fix involves a change to code that also improves my scenario. So with the fix applied, on my test environment, I saw that SQL Server could handle at least 6 times the volume of ad hoc queries than it could before (your mileage may vary).

SQL Server 2012

But there’s one other thing I noticed. When I ran the reproduction on a SQL Server 2012 environment, the process just flew through the workload! Temp table creation was no longer a bottleneck; there was no more contention on temp table creation. The Microsoft Engineer I worked with noticed this too and eventually it prompted a CSS blog entry. You can read all about this 2012 performance improvement at Temp table caching improvement for table valued parameters in SQL Server 2012

Partner vs MVP

I’m very lucky to have many investigation options available to me when I’m digging into a particular quirk of SQL Server behaviour. There’s the public options like #sqlhelp or dba.stackexchange or Microsoft support. And for me there’s some private options like the MVP mailing list. But this month, I feel lucky to work as a Microsoft partner. It’s through that connection that I learned so much.

Too Long; Didn’t Read

If you’re experiencing 2:1:103 contention because of too frequent ad hoc TVP queries then:

December 5, 2012

Well That Wasn’t Obvious

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

I’ve discovered that non-intuitive lessons stick in the brain the best. I share three examples that I either learned recently (or at least re-learned recently).

Uniqueifiers on Indexes That I Thought Shouldn’t Need Them

Do you remember what a Uniquifier is? SQL Server uses these hidden values on rows to keep non-clustered indexes in sync with their non-unique clustered indexes. Think of this questionable schema:

create table Actors
(
	ActorId int identity not null,
	LastName nvarchar(100) not null,
	FirstName nvarchar(100) not null,
	AgentId int null,
	primary key nonclustered (ActorId)
)
 
create clustered index IX_Actors
	on Actors(LastName, FirstName)

Then add some data:

insert Actors (LastName, FirstName, AgentId)
values ('Douglas', 'Michael', 3),
       ('Douglas', 'Michael', 4),
       ('Keaton', 'Michael', 5),
       ('Douglas', 'Kirk', 4)

I can demonstrate the uniquifier value here because of the non-unique clustered index:

select sys.fn_PhysLocFormatter (%%physloc%%) as [physical rid], *
from Actors -- in my case, this shows page 231
 
DBCC TRACEON (3604);
DECLARE @dbid int = DB_ID(); 
DBCC PAGE(@dbid, 1, 231, 3); 
 
/* showing (among other things):
...
Slot 1 Offset 0x60 Length 51
UNIQUIFIER = 0                       
LastName = Douglas                   
FirstName = Michael                  
ActorId = 1                          
AgentId = 3                          
KeyHashValue = (16035ff378a3)        
 
Slot 2 Offset 0x93 Length 55
UNIQUIFIER = 1                       
LastName = Douglas                   
FirstName = Michael                  
ActorId = 2                          
AgentId = 4                          
KeyHashValue = (16035e22af06)        
...
*/

Cool! Just like real life (See Michael Douglas (I – XXVI) at IMDB). This uniquifier value behaves very much like the roman numerals you see at IMDB.

Here’s the part that wasn’t obvious… what if I chose a clustered index whose columns include the primary key…

drop index IX_Actors ON Actors;
create clustered INDEX IX_Actors
	ON Actors(LastName, FirstName, ActorId)

That should be unique right? The index includes the primary key column (ActorId) that guarantees it. So SQL Server shouldn’t need the uniquifier right?

But it does! 

SQL Server doesn’t count on the index being unique unless you say so. I actually discovered the above while using Kendra Little’s sp_BlitzIndex, a tool so underexposed it’s pasty.

UPDATE Modifies Column Values Only Once

What does the following code produce?

-- set up test table 
declare @test TABLE (id int, value int);
declare @source TABLE (id int, increment int);
 
insert @test (id, value)
values (1, 0), (2, 0)
 
insert @source (id, increment)
values (1, 10),(1, 20),(1, 30),(2, 100),(2, 100),(2, 100)
 
-- "sum": add each value in the source to the existing value
update @test 
set value = value + increment
from @test t
join @source s
    on s.id = t.id
 
-- check the results
select id, value
from @test
/* The results:
id          value
----------- -----------
1           10
2           100
*/

Surprised? I think this example (or one very much like it) surprises those of us who started our careers as programmers. Many of us follow up this lesson with the related “UPDATE statements don’t have GROUP BY clauses” and then the lesson “How do I use CTEs?”

Indexed Views Don’t Support Max/Min Aggregates

Indexed views support a couple aggregate functions like COUNT_BIG() and SUM(). And with some trickery you can calculate AVG() and STDEV(). But SQL Server restricts the use of MAX() and MIN() in indexed views despite how useful they’d be.

It might help to understand why. SQL Server maintains indexed views as physical database objects and it can maintain aggregate values like COUNT() and SUM() by processing only the changing rows in the base table while safely ignoring the rest of the table. But I can’t say the same for MAX() or MIN(). If SQL Server supported MAX() and MIN() in indexed views, then when you delete the row in the base table that represents the MAX() value, SQL Server would have to search the rest of the base table to find the new MAX() value.

Check out the microsoft.connect feature suggestion Expand aggregate support in indexed views (MIN/MAX). Aaron Bertrand created this Connect suggestion and I like it because it shows how effective constructive feedback can be. I like it because of its description, comments and the useful workaround. The Microsoft team even gave some insight into how they almost included this feature in SQL 2008. This connect item only seems to be lacking an E.T.A. so go and cast your vote!

Bonus Content

I didn’t draw any illustrations this week so I’m including some bonus content (admittedly written by others):

Non-obvious Things From Twitter Friends

Request for Tweets: What thing in sql surprised you? Something not obvious. For me it was how the TRY and CATCH didn't always.

 I love how the log shipping "use default data and log folders" ... doesn't

how about how INSERT <tbl> VALUES (1),(2),(3) limits you to 1000 entries

Rolling time-window filtered index. The cool surprise - building each new index uses the existing one, so it's really quick.

Trivia

  • Did you know Michael Keaton (born Michael Douglas) changed his name to avoid confusion with that other guy? Pretty wise.
  • Jes Schultz Borland reminded me recently that good writers use more active verbs. I took that advice to heart and turned the writing of this article into an exercise. I avoided using words like is, was or are here and I think it turned out pretty well.

October 31, 2012

Triggers On Views? What For?

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

What’s up with triggers on views? What kind of patchwork monster is this?

Happy Halloween!

I’m sharing something that I learned this week. It’s something I learned while talking to a friend at work. He wondered whether SQL Server had any feature similar to the rule system over at Postgres. (I thought no at first, but keep reading).  You know, I continue to learn tons while talking to friends at work even though I’m the in-house SQL Server specialist. I don’t mind at all.

Aside: Have I ever mentioned how amazing the people I work with are? I work for a great company and we’re hiring like crazy.

Back to the post. Here’s what I learned:

  • I knew that insert/update/delete statements could be executed on views if the view was simple enough. If it was simple enough we call the view “updatable”.
  • I thought that triggers couldn’t be defined on views but
  • I was wrong and …
  • I learned that not only are triggers allowed on views but INSTEAD-OF triggers will work on any kind of view, updatable or not.

Example

Here’s what I mean.

create table A (id int primary key)
create table B (id int primary key)
go
 
create view AB as
    select id, 'a' as tablename from A
    union all
    select id, 'b' as tablename from B
go

If I were to try to insert rows into the view, I get this error message:

'UNION ALL view ~ is not updatable because a partitioning column was not found.'

Other views give other errors, but the message is the same: “Your view is too complicated to be updatable”. That’s okay. I never really expected inserts to work on anything but the simplest views so I’m not too disappointed.

But like I said, here’s how you would make this view updatable – or rather insertable – with an INSTEAD OF trigger:

create trigger t_AB
on AB
instead of insert
as
    if exists (select 1 from inserted where isnull(tablename,'') not in ('a','b'))
         raiserror (N'column ''tablename'' must be either ''a'' or ''b''' 16, 1);
    else
    begin
        insert A (id) select id from inserted where tablename = 'a'
        insert B (id) select id from inserted where tablename = 'b' 
    end
GO

Boom! No more this-view-is-not-updatable errors.

But When Would I Use This?

I thought of the perfect use case. This strategy helps with  SCHEMA REFACTORING. Say your schema contains this table:

create table ANIMALS 
(
    Name varchar(100) primary key,
    IsVertebrate bit not null,
    VertebraeCount int
)

and you’ve decided to change it to look like this:

create table ANIMALS_BASE
(
    Name varchar(100) primary key
)
 
create table VERTEBRATES
(
    Name varchar(100) primary key
        references ANIMALS_BASE(Name),
    VertebraeCount int not null
)

After you’ve done that, you can create a view called ANIMALS with triggers to accommodate applications that still expect the old schema. That would look something like:

CREATE VIEW ANIMALS
AS
    SELECT AB.Name, 
        CASE WHEN V.Name IS NULL THEN CAST(0 AS BIT) ELSE CAST(1 AS BIT) END as IsVertebrate,
        V.VertebraeCount
    FROM ANIMALS_BASE AB
    LEFT JOIN VERTEBRATES V
        ON AB.Name = V.Name

Next create the insert, delete and update triggers. I only show the insert trigger here. The delete and update shouldn’t be too hard to write either.

create trigger t_ANIMALS_i
on ANIMALS
instead of insert
as
    INSERT ANIMALS_BASE (Name) SELECT Name FROM inserted;
    INSERT VERTEBRATES (Name, VertebraeCount)
        SELECT Name, VertebraeCount 
        FROM inserted 
        WHERE IsVertebrate = 1;

Here’s a few ways that this helps:

  • You might want to use this strategy when you don’t have control over every application using your database.
  • This helps with migrations because applications are now free to expect the new or old schema. The applications can transition gradually to the new way of doing things.

Even if you believe every application uses only the new schema. You can still make sure gradually by logging whenever someone uses the old schema. Just add some logging statement to the definition of the triggers. You can log or track trigger use using any way you’re comfortable with. One suggestion is to use the user-settable performance counter via sp_increment_counter1 then Poof! You’ve got your own deprecated-features performance counter just like Microsoft.

October 11, 2012

You Can “Do Science” Anywhere

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Consider the Following

Did you know you can do science anywhere. Science is about learning how Nature behaves. Well what if we replace Mother Nature with SQL Server? SQL Server’s not as pretty, but we can still learn things. It’s 2012! When we wonder about things – even databases – we don’t have to guess.

Consider the following...

The Question

To keep things simple, I’m going to follow the scientific method here and this is step one. The question. All good science starts with a question. The question itself doesn’t matter. I just want to talk about the process.

So I take the question from this DBA Stackexchange question: Why are NULLS sorted first?

Good question! The SQL Standard does not actually prescribe any behavior here (says Wikipedia). So the actual behavior is left up to the RDBMS vendor. The likely answer to the question is that NULLS are sorted that way because Microsoft had to pick something and they probably just made a choice based on what other RDBMS’s were doing or other implementation considerations. Or who knows, maybe they just flipped a coin.

But that leads me to this question: Are they always sorted first? If I reverse the order of a sort by using ORDER BY column DESC do Null values still show up first? If I look at the manual, Microsoft tells me “Null values are treated as the lowest possible values” so that ORDER BY column DESC will place Null values at the end.

But the way the stack overflow question is worded makes me wonder a little and I’d like to find out for sure. So to repeat:

“If I sort the results of a query using ORDER BY column DESC do Null values come last?”

Hypothesis / Prediction

I’m placing my bets with Microsoft’s documentation and so I answer YES.

I predict that any query written with “ORDER BY column DESC” will place the Null values at the end. If I’m wrong, then it would seem like null values always first is the behavior here (Note: NULLS FIRST is actually mentioned in the SQL Standards as an extension to the ORDER BY clause, but it’s not implemented by SQL Server. This also lets me believe I’m not crazy for wondering.)

Test

That’s simple enough:

select *
from (values (1),(2),(3),(null)) as v(value)
order by value desc

The prediction says that the fourth value in the results will be NULL. And tada! The prediction is correct.

Analysis

There’s nothing surprising here, but maybe it raises some new questions. Like:

  • What do other vendors do? (Turns out they do the same. This lends weight to Microsoft picking this behavior because it was common practice)
  • What’s up with this NULLS FIRST keyword?

There’s a publish part here. The default assumption (that Microsoft documentation is often correct) is still valid so there’s nothing surprising to publish. Probably not worth a blog post (despite what you’re reading).

Boiled Down

Normally the thought processes aren’t so structured, but as long as the question comes first and the guess is checked against experiment, it counts.

It seems like common sense. And maybe I could have picked a better example. Science helps satisfy our curiosity. But it also can save us from kidding ourselves. Without this kind of thinking we risk turning into superstitious pigeons. I once saw someone perform a task (release and renew his ip address) three times out of habit “just so it takes”. That way lies madness.

September 27, 2012

My Short Struggle With SQL Agent

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

You know I’ve always thought of SQL Agent as one of the “Agent Programs” from the movie the Matrix:

So what follows is a mini-lesson about the SQL Agent. I recently got into trouble when I created a SQL Agent Job step. The only thing this step needed to do was to move some text files from one network drive to another.

I just wanted to use the dos command:

move "\\server1\somedir\*.txt" "\\server2\somedir"

But if there were no files copied, the step failed because the return code was 1 instead of 0. It succeeded in copying all the files that were there (all zero of them), but apparently that’s unacceptable.

I then tried using robocopy. Robocopy is the “robust copy” command for Windows. It’s got all the bells and whistles… so I tried:

robocopy "\\server1\somedir\" "\\server2\somedir\" *.txt /mov

but the return code for robocopy is very complicated. Even more so than regular move. I’m not the first to deal with this headache… Check out this topic on SQL Server Central’s forum. The solution there uses a batch script to accomplish the task. If it’s successful, it returns a consistent error code. But doesn’t it seem like there should be a better way?

Only One Successful Return Code For Cmd Steps

Yep, the problem stems from the fact that only one return code is allowed for SQL Agent Job steps that are Operating System commands. That’s shown on this screen:

There can be only one ... error code.

Powershell Beats ‘em All

Long story short… I went with a powershell script. It was the simplest:

move-item "\\server1\somedir\*.txt" "\\server2\somedir"

And I don’t worry about return codes here. The powershell script succeeds or it doesn’t. Times when the script fails include when the move would overwrite an existing file or when permissions prevent the file move and that’s what I wanted.

In powershell, simple things are simple. Hard things are possible.

September 26, 2012

A Quick SQL Challenge

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:21 pm

So I was recently involved in an issue where a particular Unicode string was causing some problems in an application. The string was coming from a production database. I was asked to help reproduce this. It boiled down to this:

Can you write a SQL Script which produces a string containing every possible unicode character?

That’s every character from 0×0000 (NUL) all the way up to 0xffff (which is not even legal unicode). If you want a crack at it, start with the code here:

declare @allNCHARS nvarchar(max);
 
/*
Your answer would fit here.
*/
 
UPDATE MYTABLE 
SET Value = @allNCHARS
WHERE Id = 1;

My own answer is in the comments Maybe you can come up with something more elegant.
But it’s a handy string. If you’re in QA, it can help exercise a lot of different text issues (code pages, encodings, conversions etc…).

In my own case, the string actually helped me. I was able to reproduce the issue on a local developer machine. A colleague of mine then did more detective work and squashed that bug, so it was a bit of a tag team effort there.

September 12, 2012

When I Use Nested Joins

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

I want to explain how I avoid RIGHT OUTER joins in favor of LEFT OUTER joins and how I avoid OUTER joins in favor of INNER joins.

There's no place bitchin-er than Kitchener

Inner joins have no direction, but outer joins do so that we have three kinds of joins:

  • INNER JOIN (JOIN for short)
  • LEFT OUTER JOIN (LEFT JOIN for short)
  • RIGHT OUTER JOIN (RIGHT JOIN for short)

I leave out FULL OUTER JOINS for now because I never use them. And in fact RIGHT OUTER JOINS can always be written as LEFT OUTER JOINS, so in practice I only use just the two kinds of joins:

  • INNER JOIN
  • LEFT OUTER JOIN

Now as a rule of thumb, inner joins are more efficient than outer joins so it would be better to write queries that avoid outer joins. Let’s get to an example. My goal here is to write a query that gives me a list of employees and their director (if any) based on this schema.

CREATE TABLE STAFF
(
    Id NVARCHAR(20) NOT NULL PRIMARY KEY,
    Name NVARCHAR(400) NOT NULL,
    Department NVARCHAR(20),
    Role NVARCHAR(20) NOT NULL
)
 
CREATE TABLE BOSSES
(
    EmployeeId NVARCHAR(20)
        REFERENCES STAFF(Id),
    BossId nvarchar(20)
        REFERENCES STAFF(Id),
    PRIMARY KEY (EmployeeId, BossId)
)

BOSSES is a table that contains not just direct reports, but all direct and indirect reports (making it handy for this query).

Using a RIGHT JOIN

The answer is fairly straightforward. I join the BOSSES table with the STAFF table to give me all the directors and their reports:
Two inner joins

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Boss
join BOSSES B
    on B.BossId = S_Boss.Id
join STAFF S_Employee
	on B.EmployeeId = S_Employee.Id
where S_Boss.Role = 'Director'

But wait, this isn’t a complete list of employees. What about those in the company that don’t report to any director (not even indirectly). This is where the right outer join comes in:
An inner join and a right join

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Boss
join BOSSES B
    on B.BossId = S_Boss.Id
RIGHT join STAFF S_Employee
    on B.EmployeeId = S_Employee.Id
    and S_Boss.Role = 'Director'

Notice that the S_Boss.Role filter can’t belong to the where clause any more. If it did, we’d lose director-less employees again and we’d be back where we started.

Using LEFT JOINS Only

That works… but for the sake of style, let’s only use left joins. I prefer using only left joins in this case because logically, the results are meant to be the set of employees. So I like to start with that list of employees and then join in the bosses if necessary. That’s why I start with the set of employees as the first table. The other joined tables aren’t the stars of this query; they’re only there to help provide that director attribute. So I have this:
Two left joins

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Employee
left join BOSSES B
    on B.EmployeeId = S_Employee.Id
left join STAFF S_Boss
    on B.BossId = S_Boss.Id
    and S_Boss.Role = 'Director'

But you notice that I’m now using two left joins… Really I only want the one outer join that I was using in the first example. Turns out I can do that:

Using Nested Joins

Well that just looks like this:
One left join

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Employee
left join BOSSES B
    join STAFF S_Boss
        on B.BossId = S_Boss.Id
        and S_Boss.Role = 'Director'
    on B.EmployeeId= S_Employee.Id

This is logically equivalent to the right join but it uses left joins instead and only uses one outer join. Great!

A lot about this example is only about style. Maybe you prefer the right join example and think that I’m breaking some grammar rule here. It feels like I’m splitting an infinitive or something. Let me know what your preference is.

Older Posts »

Powered by WordPress