Takeaway: I explain an interesting problem I encountered. I explain how I maintain data for a table that stores hierarchy info. The problem might not be relevant to you dear reader, but the technique might.
So say that you’re keeping track of a hierarchy in a particular table and to make queries faster, you’ve decided to store the a key to the root level node as a column in each row.
Constraints: For whatever reason, the data in this column is untrustworthy and it’s our job to fix it! It’s a huge table, most of the data is okay, and we’d like to fix everything without blocking too much of the table.
- You’re keeping track of a hierarchies in a table using a foreign key to itself
- Parent nodes are indicated using a ParentId column (see table below):
- Root nodes are indicated when ParentId is NULL.
- BaseObjectId indicates the root node of hiearchies (i.e. this equals ObjectId when ParentObjectId is NULL)
- You’re using SQL 2005 and you can’t use 2008’s new hierarchy data type (or you don’t want to).
The Table Diagram would look something like this:
And the code to create this table would look like:
CREATE TABLE HierarchyExample ( ObjectId int NOT NULL IDENTITY(1,1), ParentObjectId int NULL, BaseObjectId int NULL, OtherDataGoesHere nvarchar(max) NOT NULL, CONSTRAINT PK_HierarchyExample PRIMARY KEY (ObjectId), CONSTRAINT FK_HierarchyExample FOREIGN KEY (ParentObjectId) REFERENCES HierarchyExample (ObjectId) )
Example: You might store this kind of data useful if you were really interested in finding the head or root node of the hierarchy. Like with vampires:
declare @rc int; declare @batch int; select @rc = 1, @batch = 1000; while @rc > 0 begin UPDATE TOP (@batch) HE SET BaseObjectId = HE_Parent.BaseObjectId FROM HierarchyExample HE JOIN HierarchyExample AS HE_Parent ON HE.ParentObjectId = HE_Parent.ObjectId WHERE HE.BaseObjectId <> HE_Parent.BaseObjectId; SET @rc = @@ROWCOUNT end
This script updates the BaseObjectId a thousand rows at a time which keeps the query relatively quick so it doesn’t hold on to locks too long (at least not x-locks). It also handles hierarchies with multiple levels.
Index for Efficiency: The best indexes for this table are the obvious nonclustered covering ones and really turn this query into something fast (using a merge join no less!):
- HierarchyExample(ParentObjectId, BaseObjectId, ObjectId)
- HierarchyExample(ObjectId) Include (BaseObjectId)
The original solution I tried used recursive CTEs where the performance was less than ideal.
This is the kind of thing I like working on and I thought I’d share. So what about you? What kind of interesting T-SQL challenges have you come across recently? Leave something in the comments or blog about it and post back here.