Takeaway: According to SQL Server, an identifier with trailing spaces is considered equivalent to the same identifier with those spaces removed. That was unexpected to me because that’s not how other programming languages work. My investigation was interesting and I describe that here.
The First Symptom
Here’s the setting, I work with a tool developed internally that reads metadata from a database (table names, column names, column types and that sort of thing). Recently the tool told me that a table had an unexpected definition. In this case, a column name had an extra trailing space. I expected the column name
"Id" (2 characters), but my tool was reporting an actual value of
"Id " (notice the blank at the end, 3 characters). That’s what started my investigation.
But that’s really weird. What would lead to a space accidentally getting tacked on to a column name? I couldn’t think of any reason. I also noticed a couple other things. Redgate SQL Compare was reporting no discrepancies and the database users weren’t complaining at all, they seemed just fine. A bug in the in-house tool seemed most likely. My hunch was that there was a problem with the way we collecting or storing these column names (how did a space sneak in there?).
Where Are Column Names Stored?
I wanted to look at the real name of the column – straight from the source – so I ran:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE COLUMN_NAME LIKE 'Id%'
It told me that my tool wasn’t wrong. That the column was actually named
"Id " with the space. So maybe Red Gate is getting its metadata from somewhere else? I know of a few places to get column information. Maybe Red Gate is getting it from one of those? Specifically I wanted to look closer at these views:
Because these objects are views, I used sp_helptext to learn that all the column names ultimately come from a system table called
sys.syscolpars is a system table and you can’t look at its contents unless you connect to the database using the dedicated administrator connection. And that’s exactly what I did.
I learned that there is only one version of column names, only one place on disk that sql server persists the name of the column. It’s interesting because this implies that Red Gate’s SQL Compare trims trailing spaces from identifier names.
But Doesn’t SQL Server Care?
Well, there’s one way to check:
CREATE TABLE [MyTest] ( [id ] INT ); INSERT INTO [MyTest] VALUES (1); SELECT [id ], [id] -- one column name with a space, one column name without FROM [MyTest]; -- returns a dataset with column names as specified in the query. go
Just like Red Gate’s SQL Compare, it seems like SQL Server doesn’t care about trailing spaces in identifiers either.
Google? Stackoverflow? Want to Weigh In?
A quick search led me to the extremely relevant Is SQL Server Naming trailing space insensitive?.
And that question has answers which link to the Books Online page documenting Delimited Identifiers. That page claims that “SQL Server stores the name without the trailing spaces.” Hmmm, they either mean in memory, or the page is inaccurate. I just looked at the system tables a moment ago and the trailing spaces are definitely retained.
The stackoverflow question also led me to a reported defect, the Connect item Trailing space in column names. This item was closed as “by design”. So this behavior is deliberate.
What do other SQL Vendors do?
I want to do experiments on SQL databases from other vendors but my computer doesn’t have a large number of virtual machines or playground environments. But do you know who does? SQL Fiddle
It’s very easy to use this site to see what different database vendors do. I just pick a vendor and I can try out any SQL I want. It took very little effort to be able to compile this table:
And What Does the ANSI standard say?
Look at the variety of behaviors from each vendor. I wonder what the “standard” implementation should be.
ANSI (paraphrased) says that
<delimited identifier> ::= <double quote><one or more characters><double quote>
And it also says explicitly that delimited identifiers can include spaces.
What About String Comparisons In General?
During my experiments on SQL Server I found myself executing this query:
SELECT * FROM sys.columns WHERE name = 'Id'
I was surprised to find out that my three-character
"Id " column came back in the results. This means that SQL Server ignores trailing spaces for all string comparisons, not just for identifiers.
I changed my google search and looked for “sql server string comparison trailing space”. This is where I found another super-relevant document from Microsoft: INF: How SQL Server Compares Strings with Trailing Spaces.
Microsoft pointed to the ANSI standard again. I mean they explained exactly where to look, they pointed straight to (Section 8.2, , General rules #3) which is the section where ANSI explains how the comparison of two character strings is determined. The ANSI standard says that for string comparisons, the shorter string is effectively padded with trailing spaces so that comparisons can always performed on strings with an equal number of characters. Why? I don’t know.
And that’s where identifier comparisons come in. I found another part of the standard (Syntax rule #11) which tells me that Identifiers are equivalent if they compare as equivalent according to regular string comparison rules. So that’s the link between string comparisons and identifier comparisons.
There’s a number of things I learned about string comparisons. But does any of this matter? Hardly. No one deliberately chooses to name identifiers using trailing spaces. And I could have decided to sum this whole article up in a single tweet (see the title).
But did you figure out the head fake? This blog post is actually about investigation. The investigation is the interesting thing. This post describes the tools I like to use and how I use them to find things out for myself including:
- Queries against SQL Server itself, the obvious authority on SQL Server behavior.
- Made use of sp_helptext
- Made use of the Dedicated Adminstrator Connection to look at system tables
- Microsoft’s Books Online (used this twice!)
- Microsoft Connect
- the ANSI Standard
Maybe none of these resources are new or exciting. You’ve likely used many of these in the past. But that’s the point, you can find out about any topic in-depth by being a little curious and a little resourceful. I love to hear about investigation stories. Often how people find things can be at least as interesting as the actual lesson.