Indexing Foreign Keys?

April 20, 2009

Filed under: SQL Scripts,Technical Articles — Michael J. Swart @ 12:11 pm

No, this article should be titled Indexing for Foreign Keys (or even Indexing for Forty Foreign Keys)

So your application is humming along and your database has been playing nicely with the app. But the database is getting a little large and it’s time to manage that. There are a lot of different ways to approach this common issue including short term solutions like:

Buying more disk space to host all this extra data.
Using compression to make the most of the disk space you’ve got.

And of course, there’s the archive/purge/restore approach.

At some point during the purge process, you may find yourself issuing DELETE statements. And it’s almost inevitable if you’re not doing something like dropping or truncating shards or partitions. When deleting rows from a table, in order to maintain data integrity SQL Server will always check foreign keys to make sure that no other row is referring to the row you’re deleting. This is, after all, the whole point of a foreign key and it’s the R in RDBMS. This check can cause performance problems for your delete statement if the database is not indexed properly. Such performance issues may have gone unnoticed if tables have rarely had any rows deleted in the past.

I wrote a query against some the DMVs available in 2005 and 2008 that can report on the tables that might cause scans when deleting rows. If the numbers are large enough, then you’re going to have trouble deleting many rows from these tables without adding an index (hence the title Indexing for Foreign Keys).

WITH my_foreign_key_list AS (
   SELECT 
       fk.name AS foreign_key, 
       fk.referenced_object_id,
       OBJECT_NAME(fk.referenced_object_id) AS referenced_table,
       (
           SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()]
           FROM sys.foreign_key_columns AS fkc
           JOIN sys.columns c ON c.OBJECT_ID = fkc.referenced_object_id
               AND c.column_id = fkc.referenced_column_id
           WHERE fkc.parent_object_id = fk.parent_object_id
               AND fk.OBJECT_ID = fkc.constraint_object_id
           ORDER BY fkc.constraint_column_id
           FOR XML PATH('')
       ) AS referenced_columns,
       fk.parent_object_id,
       OBJECT_NAME(fk.parent_object_id) AS parent_table,
       (
           SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()]
           FROM sys.foreign_key_columns AS fkc
           JOIN sys.columns c ON c.OBJECT_ID = fkc.parent_object_id
               AND c.column_id = fkc.parent_column_id
           WHERE fkc.parent_object_id = fk.parent_object_id
               AND fk.OBJECT_ID = fkc.constraint_object_id
           ORDER BY fkc.constraint_column_id
           FOR XML PATH('')
       ) AS parent_columns
   FROM sys.foreign_keys fk
),
my_index_list AS 
(
   SELECT t.OBJECT_ID, t.name AS tablename,
       i.name AS indexname,
       (
           SELECT c.name + N',' AS [text()]
           FROM sys.index_columns AS ic
           JOIN sys.columns c ON c.OBJECT_ID = ic.OBJECT_ID 
               AND c.column_id = ic.column_id
           WHERE ic.OBJECT_ID = i.OBJECT_ID
               AND ic.index_id = i.index_id
               AND ic.is_included_column = 0
           ORDER BY ic.key_ordinal
           FOR XML PATH('')
       ) AS index_cols
   FROM sys.indexes AS i
   JOIN sys.tables t ON i.OBJECT_ID = t.OBJECT_ID
   WHERE i.index_id > 0
)
SELECT mfkl.referenced_table AS [When deleting a row from this table...],
   SUM(p.rows) AS [the number of rows that must be scanned is...]
FROM my_foreign_key_list mfkl
LEFT JOIN my_index_list missing_parent_indexes
   ON mfkl.parent_table = missing_parent_indexes.tablename
   AND missing_parent_indexes.index_cols LIKE mfkl.parent_columns + N'%' -- index covers the parent columns
JOIN sys.partitions p -- just to get rows
   ON p.OBJECT_ID = mfkl.parent_object_id
   AND p.index_id IN (1,0)
WHERE missing_parent_indexes.tablename IS NULL -- with the LEFT JOIN, this is the 
                                               -- "missing" part of missing_parent_indexes
GROUP BY mfkl.referenced_table
ORDER BY SUM(p.rows) DESC

You’ll get results that give you a sense of what it’s going to cost to delete a row from a given table.

As an example from AdventureWorks, the query points to Sales.SpecialOfferProduct as the largest potential problem, and sure enough a delete on that table causes a scan on Sales.SalesOrderDetail

-- Comments (5)

5 Comments »

Nice script! Thanks for sharing. 🙂

Comment by Michelle Ufford — April 20, 2009 @ 1:49 pm
[…] you’ve created the new foreign keys, don’t forget to read my other article about Indexing for Foreign Keys Comments […]

Pingback by Finding your Missing Keys | Michael J. Swart — December 7, 2009 @ 2:49 pm
Nice Assist. Thanks for placing in public domain.

Daniel Adeniji

Comment by Daniel Adeniji — September 5, 2014 @ 6:41 pm
This is a very good script, presenting the data in a way especially helpful when explaining the issue to non-DBAs. One useful enhancement would be for it to ignore key column order so it would recognize an index on Column2, Column1 as a match for a foreign key on Column1, Column2. I found that circumstance in a production database recently. The inconsistency was insufficient justification to make a change because everything was functioning as intended.

Comment by Mark Freeman — February 24, 2021 @ 10:30 am
I’ve worked with a number of systems that implement an IsDeleted BIT column in every significant table and then periodically run a job with a bunch of “DELETE…WHERE IsDeleted = 1” statements outside normal work hours. (In some cases the WHERE clauses were more complicated, taking into account status values for matching parent table rows or date criteria.) This gets them speed for the deletes during normal operating hours, the ability to “undelete” rows painlessly until some amount of time had passes, and dramatically reduced the cost penalty for missing indexes that should support the FKs. But in general, I agree that ensuring you have an index matching your foreign keys is very important to performance.

Comment by Mark Freeman — February 24, 2021 @ 10:40 am

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

April 20, 2009

Indexing Foreign Keys?

5 Comments »

Leave a comment