Michael J. Swart

September 18, 2014

SQL Server Ignores Trailing Spaces In Identifiers

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:27 am

Takeaway: According to SQL Server, an identifier with trailing spaces is considered equivalent to the same identifier with those spaces removed. That was unexpected to me because that’s not how other programming languages work. My investigation was interesting and I describe that here.

The First Symptom

Here’s the setting, I work with a tool developed internally that reads metadata from a database (table names, column names, column types and that sort of thing). Recently the tool told me that a table had an unexpected definition. In this case, a column name had an extra trailing space. I expected the column name "Id" (2 characters), but my tool was reporting an actual value of  "Id " (notice the blank at the end, 3 characters). That’s what started my investigation.

But that’s really weird. What would lead to a space accidentally getting tacked on to a column name? I couldn’t think of any reason. I also noticed a couple other things. Redgate SQL Compare was reporting no discrepancies and the database users weren’t complaining at all, they seemed just fine. A bug in the in-house tool seemed most likely. My hunch was that there was a problem with the way we collecting or storing these column names (how did a space sneak in there?).

Where Are Column Names Stored?

I wanted to look at the real name of the column – straight from the source – so I ran:

SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME LIKE 'Id%'

It told me that my tool wasn’t wrong. That the column was actually named "Id " with the space. So maybe Red Gate is getting its metadata from somewhere else? I know of a few places to get column information. Maybe Red Gate is getting it from one of those? Specifically I wanted to look closer at these views:

  • sys.columns
  • sys.syscolumns
  • INFORMATION_SCHEMA.COLUMNS

Because these objects are views, I used sp_helptext to learn that all the column names ultimately come from a system table called sys.syscolpars. But sys.syscolpars is a system table and you can’t look at its contents unless you connect to the database using the dedicated administrator connection. And that’s exactly what I did.

I learned that there is only one version of column names, only one place on disk that sql server persists the name of the column. It’s interesting because this implies that Red Gate’s SQL Compare trims trailing spaces from identifier names.

But Doesn’t SQL Server Care?

Well, there’s one way to check:

CREATE TABLE [MyTest] ( [id ] INT );
INSERT INTO [MyTest] VALUES (1);
 
SELECT [id ], [id] -- one column name with a space, one column name without
FROM [MyTest]; 
-- returns a dataset with column names as specified in the query.
go

Just like Red Gate’s SQL Compare, it seems like SQL Server doesn’t care about trailing spaces in identifiers either.

Google? Stackoverflow? Want to Weigh In?

A quick search led me to the extremely relevant Is SQL Server Naming trailing space insensitive?.

And that question has answers which link to the Books Online page documenting Delimited Identifiers. That page claims that “SQL Server stores the name without the trailing spaces.” Hmmm, they either mean in memory, or the page is inaccurate. I just looked at the system tables a moment ago and the trailing spaces are definitely retained.

The stackoverflow question also led me to a reported defect, the Connect item Trailing space in column names. This item was closed as “by design”. So this behavior is deliberate.

What do other SQL Vendors do?

I want to do experiments on SQL databases from other vendors but my computer doesn’t have a large number of virtual machines or playground environments. But do you know who does? SQL Fiddle
It’s very easy to use this site to see what different database vendors do. I just pick a vendor and I can try out any SQL I want. It took very little effort to be able to compile this table:

RDBMS CREATE TABLE... SELECT "id ", "id"...
MySQL Incorrect column name 'id '
Oracle Success "id": invalid identifier
PostgreSQL Success Column "id" does not exist
SQLite Success could not prepare statement (1 no such column: id)
SQL Server Success
ID ID
1 1

And What Does the ANSI standard say?

Look at the variety of behaviors from each vendor. I wonder what the “standard” implementation should be.

Mmmm... SQL Syntax rules.

I googled “ANSI SQL 92” and found its wikipedia page and that led me to the SQL-92 Standard itself.

ANSI (paraphrased) says that

<delimited identifier> ::= <double quote><one or more characters><double quote>

And it also says explicitly that delimited identifiers can include spaces.

What About String Comparisons In General?

During my experiments on SQL Server I found myself executing this query:

SELECT *
FROM sys.columns
WHERE name = 'Id'

I was surprised to find out that my three-character "Id " column came back in the results. This means that SQL Server ignores trailing spaces for all string comparisons, not just for identifiers.

I changed my google search and looked for “sql server string comparison trailing space”. This is where I found another super-relevant document from Microsoft: INF: How SQL Server Compares Strings with Trailing Spaces.

Microsoft pointed to the ANSI standard again. I mean they explained exactly where to look, they pointed straight to (Section 8.2, , General rules #3) which is the section where ANSI explains how the comparison of two character strings is determined. The ANSI standard says that for string comparisons, the shorter string is effectively padded with trailing spaces so that comparisons can always performed on strings with an equal number of characters. Why? I don’t know.

And that’s where identifier comparisons come in. I found another part of the standard (Syntax rule #11) which tells me that Identifiers are equivalent if they compare as equivalent according to regular string comparison rules. So that’s the link between string comparisons and identifier comparisons.

Summary

There’s a number of things I learned about string comparisons. But does any of this matter? Hardly. No one deliberately chooses to name identifiers using trailing spaces. And I could have decided to sum this whole article up in a single tweet (see the title).

But did you figure out the head fake? This blog post is actually about investigation. The investigation is the interesting thing. This post describes the tools I like to use and how I use them to find things out for myself including:

  • Queries against SQL Server itself, the obvious authority on SQL Server behavior.
    • Made use of sp_helptext
    • Made use of the Dedicated Adminstrator Connection to look at system tables
  • Microsoft’s Books Online (used this twice!)
  • Microsoft Connect
  • Google
  • Stackoverflow
  • SQLFiddle
  • Wikipedia
  • the ANSI Standard

Maybe none of these resources are new or exciting. You’ve likely used many of these in the past. But that’s the point, you can find out about any topic in-depth by being a little curious and a little resourceful. I love to hear about investigation stories. Often how people find things can be at least as interesting as the actual lesson.

10 Comments »

  1. […] SQL Server Ignores Trailing Spaces In Identifiers – Michael J. Swart (Blog|Twitter) […]

    Pingback by (SFTW) SQL Server Links 19/09/14 - John Sansom — September 19, 2014 @ 3:36 am

  2. The way SQL Server handles trailing spaces is really annoying. Because of that I had to change our whole application to strip all trailing spaces from all fields. If you have trailing spaces entered by a user in a field, e.g. customer code that is checked agains the database, SQL Server will find the one without the spaces and you think it’s ok. Then you save it and later compare codes in the application code, they won’t match anymore…

    Comment by Toni — September 20, 2014 @ 12:51 pm

  3. Hey Toni, I wonder if you use a binary collation will that cause SQL Server to be trailing-spaces-sensitive. I’ll have to test it out the next time I’m at a computer.

    Comment by Michael J. Swart — September 20, 2014 @ 2:50 pm

  4. Nope, binary collation pads spaces too:

    SELECT 
      CASE 
        WHEN N'ID ' COLLATE Latin1_General_Bin = N'ID' THEN 'Equivalent'
        ELSE 'Not Equivalent'
      END AS result
    -- returns 'Equivalent'

    Comment by Michael J. Swart — September 22, 2014 @ 2:21 pm

  5. It’s not collation, it’s data type:

     SELECT
       CASE
         WHEN CAST( 'ID ' as varchar( 3 ) ) = CAST( 'ID' as varchar( 3 ) ) THEN 'Equivalent'
         ELSE 'Not Equivalent'
      END AS varchar_result ,
       CASE
         WHEN CAST( 'ID ' as char( 3 ) ) = CAST( 'ID' as char( 3 ) ) THEN 'Equivalent'
         ELSE 'Not Equivalent'
      END AS char_result

    Comment by William Clardy — October 27, 2014 @ 7:32 am

  6. Hey William,
    Your query returns

    varchar_result char_result
    -------------- -----------
    Equivalent     Equivalent
    

    I’m not sure what you wanted to point out.

    Comment by Michael J. Swart — October 27, 2014 @ 9:05 am

  7. I need to remember to give the caffeine IV time to take effect before publicly embarassing myself in the morning — I ran that query on my personal system at home, and I could have sworn it came back with “Not Equivalent” and “Equivalent”.

    That said, the explanation behind the behavior is spelled out in Knowledgebase article 316626: “SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2, , General rules #3) on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them… The only exception to this rule is the LIKE predicate. When the right side of a LIKE predicate expression features a value with a trailing space, SQL Server does not pad the two values to the same length before the comparison occurs.”

    The actual language of General Rule 3)a) is, “If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a .”

    Verifying that the behavior described in KB316626 still applies (at least through SQL2008R2), I ran this query:

    --=====================================================
    DECLARE @var1 varchar( 3 ) = 'ID' ,
             @var2 varchar( 3 ) = 'ID ' ,
             @var3 char( 3 ) = 'ID' ,
             @var4 char( 2 ) = 'ID '
     
     SELECT
       '"' + @var1 + '"' AS [@var1] ,
       '"' + @var2 + '"' AS [@var2] ,
       '"' + @var3 + '"' AS [@var3] ,
       '"' + @var4 + '"' AS [@var4] ,
       CASE
         WHEN @var1 LIKE @var2 THEN 'Equivalent'
         ELSE 'Not Equivalent'
      END AS varchar_result ,
       CASE
         WHEN @var3 LIKE @var4 THEN 'Equivalent'
         ELSE 'Not Equivalent'
      END AS char_result
    --=====================================================

    and got back:

    --=====================================================
    @var1 @var2 @var3 @var4 varchar_result char_result
    ----- ----- ----- ----- -------------- --------------
    "ID"  "ID " "ID " "ID"  Not Equivalent Equivalent
    --=====================================================
    

    [updated formatting – ed]

    Comment by William Clardy — October 27, 2014 @ 12:48 pm

  8. Hi William,
    I think you reinforce what I say when I said “The investigation is the interesting thing”.
    Thanks for the extra insight. It’s a crazy topic.

    Comment by Michael J. Swart — October 27, 2014 @ 1:14 pm

  9. Trailing spaces definitely *do* matter in database names, since they make maintenance plan backups barf when you tell it to make a sub-directory for each DB in the instance.

    Comment by Ron — March 3, 2016 @ 8:17 pm

  10. Thanks for this article. Just came across a strange scenario where I noticed that the column I was inserted into was called [ColumnName ] with an extra space. But the objects in c# and using BulkInsert had no such space. I couldn’t understand how this was working, and whoever created the table must have fat-fingered the name. But this article explained it concisely, and now I can sleep better (and use this for fun in an interview question).

    Comment by Henry — December 20, 2018 @ 10:12 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress