Careful Hashing

February 15, 2012

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", encodings, hash functions, hashbytes, md5, sha1 — Michael J. Swart @ 12:00 pm

Cryptographic hashing functions like MD5 or SHA-1 are examples of one-way functions that take any data you give it and return a fixed set of bytes (for example, 16 bytes for MD5 and 20 bytes for SHA-1).

I recently had trouble coming up with the same hash value for what I thought were the exact same inputs. Compare the following two examples:

// C# example
byte[] byteArray = Encoding.Unicode.GetBytes("I'm a lumberjack and I'm okay.");
 
SHA1 sha = new SHA1CryptoServiceProvider();
byte[] hashedPasswordBytes = sha.ComputeHash(byteArray);
 
Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
//30-F1-CE-3E-7E-1F-17-27-A7-B5-97-39-44-D0-FE-EA-DA-69-53-2B

That’s different than this:

-- SQL example
select HASHBYTES('SHA1', 'I''m a lumberjack and I''m okay.')
--0x4D6936CEDD0DE794AC86F9A4099DCBB4EFED8E1F

So what gives? Before reading any further, can you spot the problem? In both examples above, two things are happening. First a string of characters is being converted into a string of bytes and then that string of bytes gets hashed. I’ve learned that it’s not the hash function that’s the problem, it’s converting the character string into binary that is inconsistent.

String Conversions

Maybe that’s not surprising, but still, it’s one more thing to be careful about: converting strings to bytes. SQL Server’s wide characters (NCHARs and NVARCHARs) are equivalent to .Net’s Encoding.Unicode and SQL Server’s single non-unicode characters (CHAR and VARCHAR) (single-byte characters) are equivalent to .net’s Encoding.ASCII.

So for completeness sake, these should be equivalent:

-- wide characters (remember unicode string literals are prefixed with 'N')
select HASHBYTES('SHA1', N'I''m a lumberjack and I''m okay.')
--0x30F1CE3E7E1F1727A7B5973944D0FEEADA69532B
 
-- non-unicode characters:
select HASHBYTES('SHA1', 'I''m a lumberjack and I''m okay.')
--0x4D6936CEDD0DE794AC86F9A4099DCBB4EFED8E1F

And

// Unicode
byte[] byteArray = Encoding.Unicode.GetBytes("I'm a lumberjack and I'm okay.");
SHA1 sha = new SHA1CryptoServiceProvider();
byte[] hashedPasswordBytes = sha.ComputeHash(byteArray);
Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
//30-F1-CE-3E-7E-1F-17-27-A7-B5-97-39-44-D0-FE-EA-DA-69-53-2B
 
//ASCII
byteArray = Encoding.ASCII.GetBytes("I'm a lumberjack and I'm okay.");
hashedPasswordBytes = sha.ComputeHash(byteArray);
Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
//4D-69-36-CE-DD-0D-E7-94-AC-86-F9-A4-09-9D-CB-B4-EF-ED-8E-1F

This means that if you’re going to hash strings in SQL Server and compare them to strings that are hashed in C#, you want to use

Encoding.Unicode for NCHARS and NVARCHARS
Encoding.ASCII for CHARS and VARCHARS

This also means that for this use case, I wouldn’t feel safe using other encodings (such as Encoding.UTF8) because SQL Server can’t duplicate that string to binary conversion.

Caveats

Don’t trust me and test it out for yourself!
Collations matter a little (sort order is not relevant here). The code page for your SQL Server collation should match the code page of the .net encoding when dealing with single byte characters. Unicode strings should not care about collation.
If you do find any counter-examples for anything I’ve written here, let me know in the comments.

-- Comments (4)

4 Comments »

As the person who kept bringing Collation when we talked about this it is interesting that it has little to no effect. Thanks for yet another excellent post.

Comment by Scott MacLellan — February 15, 2012 @ 2:16 pm
[…] Careful Hashing | Michael J. Swart […]

Pingback by SQLQuill – Link Round Up – February 2012 Edition « SQL Feather and Quill — February 16, 2012 @ 10:03 am
[…] Careful Hashing – A cautionary tale this week from Michael J. Swart (Blog|Twitter). […]

Pingback by Something for the Weekend – SQL Server Links 17/02/12 — February 17, 2012 @ 9:13 am
Thank you. Very clear and helpful !

Comment by Rick Willemain — February 16, 2015 @ 12:35 pm

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

February 15, 2012

Careful Hashing

String Conversions

Caveats

4 Comments »

Leave a comment