No, this article should be titled Indexing for Foreign Keys (or even Indexing for Forty Foreign Keys)
So your application is humming along and your database has been playing nicely with the app. But the database is getting a little large and it’s time to manage that. There are a lot of different ways to approach this common issue including short term solutions like:
- Buying more disk space to host all this extra data.
- Using compression to make the most of the disk space you’ve got.
And of course, there’s the archive/purge/restore approach.
At some point during the purge process, you may find yourself issuing DELETE statements. And it’s almost inevitable if you’re not doing something like dropping or truncating shards or partitions. When deleting rows from a table, in order to maintain data integrity SQL Server will always check foreign keys to make sure that no other row is referring to the row you’re deleting. This is, after all, the whole point of a foreign key and it’s the R in RDBMS. This check can cause performance problems for your delete statement if the database is not indexed properly. Such performance issues may have gone unnoticed if tables have rarely had any rows deleted in the past.
I wrote a query against some the DMVs available in 2005 and 2008 that can report on the tables that might cause scans when deleting rows. If the numbers are large enough, then you’re going to have trouble deleting many rows from these tables without adding an index (hence the title Indexing for Foreign Keys).
WITH my_foreign_key_list AS (
SELECT
fk.name AS foreign_key,
fk.referenced_object_id,
OBJECT_NAME(fk.referenced_object_id) AS referenced_table,
(
SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()]
FROM sys.foreign_key_columns AS fkc
JOIN sys.columns c ON c.OBJECT_ID = fkc.referenced_object_id
AND c.column_id = fkc.referenced_column_id
WHERE fkc.parent_object_id = fk.parent_object_id
AND fk.OBJECT_ID = fkc.constraint_object_id
ORDER BY fkc.constraint_column_id
FOR XML PATH('')
) AS referenced_columns,
fk.parent_object_id,
OBJECT_NAME(fk.parent_object_id) AS parent_table,
(
SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()]
FROM sys.foreign_key_columns AS fkc
JOIN sys.columns c ON c.OBJECT_ID = fkc.parent_object_id
AND c.column_id = fkc.parent_column_id
WHERE fkc.parent_object_id = fk.parent_object_id
AND fk.OBJECT_ID = fkc.constraint_object_id
ORDER BY fkc.constraint_column_id
FOR XML PATH('')
) AS parent_columns
FROM sys.foreign_keys fk
),
my_index_list AS
(
SELECT t.OBJECT_ID, t.name AS tablename,
i.name AS indexname,
(
SELECT c.name + N',' AS [text()]
FROM sys.index_columns AS ic
JOIN sys.columns c ON c.OBJECT_ID = ic.OBJECT_ID
AND c.column_id = ic.column_id
WHERE ic.OBJECT_ID = i.OBJECT_ID
AND ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.key_ordinal
FOR XML PATH('')
) AS index_cols
FROM sys.indexes AS i
JOIN sys.tables t ON i.OBJECT_ID = t.OBJECT_ID
WHERE i.index_id > 0
)
SELECT mfkl.referenced_table AS [When deleting a row from this table...],
SUM(p.rows) AS [the number of rows that must be scanned is...]
FROM my_foreign_key_list mfkl
LEFT JOIN my_index_list missing_parent_indexes
ON mfkl.parent_table = missing_parent_indexes.tablename
AND missing_parent_indexes.index_cols LIKE mfkl.parent_columns + N'%' -- index covers the parent columns
JOIN sys.partitions p -- just to get rows
ON p.OBJECT_ID = mfkl.parent_object_id
AND p.index_id IN (1,0)
WHERE missing_parent_indexes.tablename IS NULL -- with the LEFT JOIN, this is the
-- "missing" part of missing_parent_indexes
GROUP BY mfkl.referenced_table
ORDER BY SUM(p.rows) DESC
You’ll get results that give you a sense of what it’s going to cost to delete a row from a given table.
Nice script! Thanks for sharing. 🙂
Comment by Michelle Ufford — April 20, 2009 @ 1:49 pm
[…] you’ve created the new foreign keys, don’t forget to read my other article about Indexing for Foreign Keys Comments […]
Pingback by Finding your Missing Keys | Michael J. Swart — December 7, 2009 @ 2:49 pm
Nice Assist. Thanks for placing in public domain.
Daniel Adeniji
Comment by Daniel Adeniji — September 5, 2014 @ 6:41 pm
This is a very good script, presenting the data in a way especially helpful when explaining the issue to non-DBAs. One useful enhancement would be for it to ignore key column order so it would recognize an index on Column2, Column1 as a match for a foreign key on Column1, Column2. I found that circumstance in a production database recently. The inconsistency was insufficient justification to make a change because everything was functioning as intended.
Comment by Mark Freeman — February 24, 2021 @ 10:30 am
I’ve worked with a number of systems that implement an IsDeleted BIT column in every significant table and then periodically run a job with a bunch of “DELETE…WHERE IsDeleted = 1” statements outside normal work hours. (In some cases the WHERE clauses were more complicated, taking into account status values for matching parent table rows or date criteria.) This gets them speed for the deletes during normal operating hours, the ability to “undelete” rows painlessly until some amount of time had passes, and dramatically reduced the cost penalty for missing indexes that should support the FKs. But in general, I agree that ensuring you have an index matching your foreign keys is very important to performance.
Comment by Mark Freeman — February 24, 2021 @ 10:40 am