Michael J. Swart

August 22, 2024

Large Uniquifier Values

Hey what’s up? It’s been a while.

If you define a clustered index that’s not unique, SQL Server will add a hidden 4-byte column called UNIQUIFIER. You can’t see it directly but it’s there. When you add a row whose key is a duplicate of an existing row, the new row gets a new unique value for it’s uniqueifier. If you add over 2.1 billion rows with the same key, the uniquifier value exceeds the limit and you will see error 666.

A while ago, we nearly got into trouble because of a bad choice for clustering key that went undetected for so long.

Is your database nearing trouble?

Here’s a script that might help you find out.

select top 100
  OBJECT_SCHEMA_NAME(t.object_id) as [schema],
  t.name as [table],
  i.name as [index],
  i.type_desc,
  i.is_unique,
  s.range_high_key, 
  s.equal_rows
from 
  sys.indexes i
inner join
  sys.tables t
  on i.object_id = t.object_id
cross apply 
  sys.dm_db_stats_histogram(i.object_id, i.index_id) s
left join
  sys.index_columns ic
  on ic.object_id = i.object_id
  and ic.index_id = i.index_id
  and ic.index_column_id = 2
where 
  i.index_id = 1
  and i.is_unique = 0  
  and ic.index_id is null
order by s.equal_rows desc

This query looks at clustered indexes that are not unique. It looks through the stats histogram and reports any with a high “EQUAL” range count.
If any of the equal_rows values are nearing 2 billion, then look out.

Kimberly Tripp always advocated uniqueness as a property of a good clustering index. That implies that equal_rows in the results here should be closer to 1 – nowhere near 2 billion!. So Microsoft’s advice holds true: “review tables that rely on uniqueifiers and proactively work to improve its design”.

If you find instances where any of the equal_rows are above 10000, you might choose to look closer.

Update: A careful look at the script will tell you that I’m excluding clustered indexes with more than one key column from the results. It’s still possible to get into trouble with multi-column clustered indexes, but because the stats histogram only focuses on the first column, this script can’t (easily) warn you about those times. See the comment section for lots more discussion.

Powered by WordPress