No, this article should be titled Indexing for Foreign Keys (or even Indexing for Forty Foreign Keys)
So your application is humming along and your database has been playing nicely with the app. But the database is getting a little large and it’s time to manage that. There are a lot of different ways to approach this common issue including short term solutions like:
- Buying more disk space to host all this extra data.
- Using compression to make the most of the disk space you’ve got.
And of course, there’s the archive/purge/restore approach.
At some point during the purge process, you may find yourself issuing DELETE statements. And it’s almost inevitable if you’re not doing something like dropping or truncating shards or partitions. When deleting rows from a table, in order to maintain data integrity SQL Server will always check foreign keys to make sure that no other row is referring to the row you’re deleting. This is, after all, the whole point of a foreign key and it’s the R in RDBMS. This check can cause performance problems for your delete statement if the database is not indexed properly. Such performance issues may have gone unnoticed if tables have rarely had any rows deleted in the past.
I wrote a query against some the DMVs available in 2005 and 2008 that can report on the tables that might cause scans when deleting rows. If the numbers are large enough, then you’re going to have trouble deleting many rows from these tables without adding an index (hence the title Indexing for Foreign Keys).
WITH my_foreign_key_list AS ( SELECT fk.name AS foreign_key, fk.referenced_object_id, OBJECT_NAME(fk.referenced_object_id) AS referenced_table, ( SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()] FROM sys.foreign_key_columns AS fkc JOIN sys.columns c ON c.OBJECT_ID = fkc.referenced_object_id AND c.column_id = fkc.referenced_column_id WHERE fkc.parent_object_id = fk.parent_object_id AND fk.OBJECT_ID = fkc.constraint_object_id ORDER BY fkc.constraint_column_id FOR XML PATH('') ) AS referenced_columns, fk.parent_object_id, OBJECT_NAME(fk.parent_object_id) AS parent_table, ( SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()] FROM sys.foreign_key_columns AS fkc JOIN sys.columns c ON c.OBJECT_ID = fkc.parent_object_id AND c.column_id = fkc.parent_column_id WHERE fkc.parent_object_id = fk.parent_object_id AND fk.OBJECT_ID = fkc.constraint_object_id ORDER BY fkc.constraint_column_id FOR XML PATH('') ) AS parent_columns FROM sys.foreign_keys fk ), my_index_list AS ( SELECT t.OBJECT_ID, t.name AS tablename, i.name AS indexname, ( SELECT c.name + N',' AS [text()] FROM sys.index_columns AS ic JOIN sys.columns c ON c.OBJECT_ID = ic.OBJECT_ID AND c.column_id = ic.column_id WHERE ic.OBJECT_ID = i.OBJECT_ID AND ic.index_id = i.index_id AND ic.is_included_column = 0 ORDER BY ic.key_ordinal FOR XML PATH('') ) AS index_cols FROM sys.indexes AS i JOIN sys.tables t ON i.OBJECT_ID = t.OBJECT_ID WHERE i.index_id > 0 ) SELECT mfkl.referenced_table AS [When deleting a row from this table...], SUM(p.rows) AS [the number of rows that must be scanned is...] FROM my_foreign_key_list mfkl LEFT JOIN my_index_list missing_parent_indexes ON mfkl.parent_table = missing_parent_indexes.tablename AND missing_parent_indexes.index_cols LIKE mfkl.parent_columns + N'%' -- index covers the parent columns JOIN sys.partitions p -- just to get rows ON p.OBJECT_ID = mfkl.parent_object_id AND p.index_id IN (1,0) WHERE missing_parent_indexes.tablename IS NULL -- with the LEFT JOIN, this is the -- "missing" part of missing_parent_indexes GROUP BY mfkl.referenced_table ORDER BY SUM(p.rows) DESC
You’ll get results that give you a sense of what it’s going to cost to delete a row from a given table.