Not too long ago, I wrote an article on the aggregate function CHECKSUM_AGGREGATE. This function works very well in conjunction with CHECKSUM or BINARY_CHECKSUM.
When evaluating the functions HASHBYTES, BINARY_CHECKSUM and CHECKSUM, I discovered a few limitations that were useful to know when deciding on a solution.
The most obvious limitation to these checksum functions is well documented in Books on-line. For example, the documentation for BINARY_CHECKSUM says that BINARY_CHECKSUM(*) return different values for most but not all changes to a row.
The documentation says that these functions don’t support non-comparable database types such as text, ntext, image, cursor, and xml. What’s interesting is that NVARCHAR(max) columns are supported, but I discovered some interesting limitations using a script similar to the following:
WITH numbers AS ( ... ) SELECT num, CHECKSUM(REPLICATE(N'a', num)) FROM numbers WHERE num < 5000
Here’s what I discovered:
This works, but only up to 255 n-characters (0xff n-characters). This was the most surprising result. 255 seems awfully short. I’ll have to doublecheck my tests and then I plan to create a microsoft.connect suggestion that BINARY_CHECKSUM supports longer strings.
CHECKSUM also works but it truncates strings to 4000 n-characters. Update Mar. 6, 2008: As it turns out, CHECKSUM doesn’t truncate strings, it’s the REPLICATE function that truncates strings to 8000 bytes!
I tested this with ‘MD5’. HASHBYTES works as expected except that like CHECKSUM, it truncates characters to 4000 n-chars before applying the function. Update Mar. 6, 2008: HASHBYTES doesn’t truncate strings to 8000 bytes, but it does raise an error if the string exceeds that length. The error is String or binary data would be truncated.
It looks like for 2008, Books On-Line has added a remarks section to HASHBYTES. The remarks explain that the input is limited to 8000 bytes. It also turns out that there is a microsoft.connect issue (issue 273429) that suggests that HASHBYTES should support longer strings. This issue has a few workarounds that get around the limitations. It’s one of the few issues on Microsoft.Connect that actually contain useful information in the workaround section. It’s good to see.
So there you are. I’m confident that the existing behaviour is good enough for most uses. But it would be dangerous to recommend solutions based on these functions ithout understanding the limitations of the procedures.