hierarchy | Michael J. Swart Michael J. Swart

June 3, 2010

Keeping Track Of Root Nodes in a Hierarchy

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", hierarchy, t-sql — Michael J. Swart @ 2:53 pm

Takeaway: I explain an interesting problem I encountered. I explain how I maintain data for a table that stores hierarchy info. The problem might not be relevant to you dear reader, but the technique might.

The Problem

So say that you’re keeping track of a hierarchy in a particular table and to make queries faster, you’ve decided to store the a key to the root level node as a column in each row.

Constraints: For whatever reason, the data in this column is untrustworthy and it’s our job to fix it! It’s a huge table, most of the data is okay, and we’d like to fix everything without blocking too much of the table.

Table Description

You’re keeping track of a hierarchies in a table using a foreign key to itself
Parent nodes are indicated using a ParentId column (see table below):
Root nodes are indicated when ParentId is NULL.
BaseObjectId indicates the root node of hiearchies (i.e. this equals ObjectId when ParentObjectId is NULL)
You’re using SQL 2005 and you can’t use 2008’s new hierarchy data type (or you don’t want to).

The Table Diagram would look something like this:

Hierarchy Example

And the code to create this table would look like:

CREATE TABLE HierarchyExample (
	ObjectId int NOT NULL IDENTITY(1,1),
	ParentObjectId int NULL,
	BaseObjectId int NULL,
	OtherDataGoesHere nvarchar(max) NOT NULL,
	CONSTRAINT PK_HierarchyExample
		PRIMARY KEY (ObjectId),
	CONSTRAINT FK_HierarchyExample
		FOREIGN KEY (ParentObjectId)
		REFERENCES HierarchyExample (ObjectId)
)

Example: You might store this kind of data useful if you were really interested in finding the head or root node of the hierarchy. Like with vampires:

Vampire Hierarchy

A Solution

The query:

declare @rc int;
declare @batch int;
select @rc = 1, @batch = 1000;
 
while @rc > 0
begin
      UPDATE TOP (@batch) HE
      SET BaseObjectId  = HE_Parent.BaseObjectId
      FROM HierarchyExample HE
      JOIN HierarchyExample AS HE_Parent
		ON HE.ParentObjectId = HE_Parent.ObjectId
      WHERE HE.BaseObjectId <> HE_Parent.BaseObjectId;
 
      SET @rc = @@ROWCOUNT
end

This script updates the BaseObjectId a thousand rows at a time which keeps the query relatively quick so it doesn’t hold on to locks too long (at least not x-locks). It also handles hierarchies with multiple levels.

Index for Efficiency: The best indexes for this table are the obvious nonclustered covering ones and really turn this query into something fast (using a merge join no less!):

HierarchyExample(ParentObjectId, BaseObjectId, ObjectId)
HierarchyExample(ObjectId) Include (BaseObjectId)

The original solution I tried used recursive CTEs where the performance was less than ideal.

What now?

This is the kind of thing I like working on and I thought I’d share. So what about you? What kind of interesting T-SQL challenges have you come across recently? Leave something in the comments or blog about it and post back here.

-- Comments (2)