Michael J. Swart

June 3, 2010

Keeping Track Of Root Nodes in a Hierarchy

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 2:53 pm

Takeaway: I explain an interesting problem I encountered. I explain how I maintain data for a table that stores hierarchy info. The problem might not be relevant to you dear reader, but the technique might.

The Problem

So say that you’re keeping track of a hierarchy in a particular table and to make queries faster, you’ve decided to store the a key to the root level node as a column in each row.

Constraints: For whatever reason, the data in this column is untrustworthy and it’s our job to fix it! It’s a huge table, most of the data is okay, and we’d like to fix everything without blocking too much of the table.

Table Description

  • You’re keeping track of a hierarchies in a table using a foreign key to itself
  • Parent nodes are indicated using a ParentId column (see table below):
  • Root nodes are indicated when ParentId is NULL.
  • BaseObjectId indicates the root node of hiearchies (i.e. this equals ObjectId when ParentObjectId is NULL)
  • You’re using SQL 2005 and you can’t use 2008’s new hierarchy data type (or you don’t want to).

The Table Diagram would look something like this:

Hierarchy Example

Hierarchy Example

And the code to create this table would look like:

CREATE TABLE HierarchyExample (
	ObjectId int NOT NULL IDENTITY(1,1),
	ParentObjectId int NULL,
	BaseObjectId int NULL,
	OtherDataGoesHere nvarchar(max) NOT NULL,
	CONSTRAINT PK_HierarchyExample
		PRIMARY KEY (ObjectId),
	CONSTRAINT FK_HierarchyExample
		FOREIGN KEY (ParentObjectId)
		REFERENCES HierarchyExample (ObjectId)

Example: You might store this kind of data useful if you were really interested in finding the head or root node of the hierarchy. Like with vampires:

Vampire Hierarchy

Vampire Hierarchy

A Solution

The query:

declare @rc int;
declare @batch int;
select @rc = 1, @batch = 1000;
while @rc > 0
      UPDATE TOP (@batch) HE
      SET BaseObjectId  = HE_Parent.BaseObjectId
      FROM HierarchyExample HE
      JOIN HierarchyExample AS HE_Parent
		ON HE.ParentObjectId = HE_Parent.ObjectId
      WHERE HE.BaseObjectId <> HE_Parent.BaseObjectId;
      SET @rc = @@ROWCOUNT

This script updates the BaseObjectId a thousand rows at a time which keeps the query relatively quick so it doesn’t hold on to locks too long (at least not x-locks). It also handles hierarchies with multiple levels.

Index for Efficiency: The best indexes for this table are the obvious nonclustered covering ones and really turn this query into something fast (using a merge join no less!):

  • HierarchyExample(ParentObjectId, BaseObjectId, ObjectId)
  • HierarchyExample(ObjectId) Include (BaseObjectId)

The original solution I tried used recursive CTEs where the performance  was less than ideal.

What now?

This is the kind of thing I like working on and I thought I’d share. So what about you? What kind of interesting T-SQL challenges have you come across recently? ¬†Leave something in the comments or blog about it and post back here.

Powered by WordPress