I wrote the following query to give me a sense of which identity columns are running out of room. The query finds the identity columns in the current database and reports the ones that are “closest” to having values that are too big for the datatype. The query does this by reporting the percentage of room used so-far.
;WITH maxValues AS
(
SELECT user_type_id, system_type_id, name,
CASE name
WHEN 'bigint' THEN CAST(0x7FFFFFFFFFFFFFFF AS bigint)
WHEN 'int' THEN CAST(0x7FFFFFFF AS bigint)
WHEN 'smallint' THEN CAST(0x7FFF AS bigint)
WHEN 'tinyint' THEN CAST(0xFF AS bigint)
END AS maxValue
FROM sys.types t
)
SELECT TOP 20
t.name AS tableName,
c.name AS identityColumnName,
maxValues.name AS typeName,
ISNULL(IDENT_CURRENT(t.name),0) AS currentIdentity,
100.0 * ISNULL(IDENT_CURRENT(t.name),0) / maxValues.maxValue AS percentConsumed
FROM sys.columns c
JOIN maxValues
ON maxValues.system_type_id = c.system_type_id
AND maxValues.user_type_id = c.user_type_id
JOIN sys.tables t
ON t.OBJECT_ID = c.OBJECT_ID
WHERE c.is_identity = 1
ORDER BY percentConsumed DESC
So say that you ran the query, and there’s a few numbers you’re looking at. How do you know if you’re in danger of running out of room? Well, there’s no way to know how fast each table is growing based on the schema alone. A static data table that uses 15% of the room might be acceptable. But an active table that has used 5% of the room in a months time is not. An understanding of the business domain is the only way to make a good decision here.
So now what happens now that you’ve decided there’s an identity column that’s in danger of running out of room? The only thing that comes to mind is to increase the data type (e.g. make int a bigint). This usually means a ton of regression testing. A large amount of booked downtime for maintenance.
I’m sure there are other suggestions, I may save those for another post. If you have any ideas, leave them in the comments.
Here’s a few other interesting ones that I forgot to mention:
SProc : Short for stored procedure. I like this one, it’s fun to say out loud. The term is pretty ubiquitous, but I still make a conscious effort to say stored procedure when dealing with folks who might not know the term.
(S)GAM: These stand for (Shared) Global Allocation Map. You’ll come across these only if you’re really really interested in the guts of SQL Server. Thankfully, I’ve never felt the need to try to prounounce it.
CRUD: Stands for Create/Read/Update/Delete. These terms are analogous to INSERT/SELECT/UPDATE/DELETE respectively. I kind of like using the term, especially on grumpy days, but I use the DML acronym more often.
LOB (or better BLOB): This stands for (Binary) Long OBject. I can’t imagine how I missed this one the first go around. I love this acronym. Mostly because a blob is a great metaphor for what a Binary Long Object is.
No, this article should be titled Indexing for Foreign Keys (or even Indexing for Forty Foreign Keys)
So your application is humming along and your database has been playing nicely with the app. But the database is getting a little large and it’s time to manage that. There are a lot of different ways to approach this common issue including short term solutions like:
Buying more disk space to host all this extra data.
Using compression to make the most of the disk space you’ve got.
And of course, there’s the archive/purge/restore approach.
At some point during the purge process, you may find yourself issuing DELETE statements. And it’s almost inevitable if you’re not doing something like dropping or truncating shards or partitions. When deleting rows from a table, in order to maintain data integrity SQL Server will always check foreign keys to make sure that no other row is referring to the row you’re deleting. This is, after all, the whole point of a foreign key and it’s the R in RDBMS. This check can cause performance problems for your delete statement if the database is not indexed properly. Such performance issues may have gone unnoticed if tables have rarely had any rows deleted in the past.
I wrote a query against some the DMVs available in 2005 and 2008 that can report on the tables that might cause scans when deleting rows. If the numbers are large enough, then you’re going to have trouble deleting many rows from these tables without adding an index (hence the title Indexing for Foreign Keys).
WITH my_foreign_key_list AS (
SELECT
fk.name AS foreign_key,
fk.referenced_object_id,
OBJECT_NAME(fk.referenced_object_id) AS referenced_table,
(
SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()]
FROM sys.foreign_key_columns AS fkc
JOIN sys.columns c ON c.OBJECT_ID = fkc.referenced_object_id
AND c.column_id = fkc.referenced_column_id
WHERE fkc.parent_object_id = fk.parent_object_id
AND fk.OBJECT_ID = fkc.constraint_object_id
ORDER BY fkc.constraint_column_id
FOR XML PATH('')
) AS referenced_columns,
fk.parent_object_id,
OBJECT_NAME(fk.parent_object_id) AS parent_table,
(
SELECT CAST(c.name AS NVARCHAR(MAX)) + N',' AS [text()]
FROM sys.foreign_key_columns AS fkc
JOIN sys.columns c ON c.OBJECT_ID = fkc.parent_object_id
AND c.column_id = fkc.parent_column_id
WHERE fkc.parent_object_id = fk.parent_object_id
AND fk.OBJECT_ID = fkc.constraint_object_id
ORDER BY fkc.constraint_column_id
FOR XML PATH('')
) AS parent_columns
FROM sys.foreign_keys fk
),
my_index_list AS
(
SELECT t.OBJECT_ID, t.name AS tablename,
i.name AS indexname,
(
SELECT c.name + N',' AS [text()]
FROM sys.index_columns AS ic
JOIN sys.columns c ON c.OBJECT_ID = ic.OBJECT_ID
AND c.column_id = ic.column_id
WHERE ic.OBJECT_ID = i.OBJECT_ID
AND ic.index_id = i.index_id
AND ic.is_included_column = 0
ORDER BY ic.key_ordinal
FOR XML PATH('')
) AS index_cols
FROM sys.indexes AS i
JOIN sys.tables t ON i.OBJECT_ID = t.OBJECT_ID
WHERE i.index_id > 0
)
SELECT mfkl.referenced_table AS [When deleting a row from this table...],
SUM(p.rows) AS [the number of rows that must be scanned is...]
FROM my_foreign_key_list mfkl
LEFT JOIN my_index_list missing_parent_indexes
ON mfkl.parent_table = missing_parent_indexes.tablename
AND missing_parent_indexes.index_cols LIKE mfkl.parent_columns + N'%' -- index covers the parent columns
JOIN sys.partitions p -- just to get rows
ON p.OBJECT_ID = mfkl.parent_object_id
AND p.index_id IN (1,0)
WHERE missing_parent_indexes.tablename IS NULL -- with the LEFT JOIN, this is the
-- "missing" part of missing_parent_indexes
GROUP BY mfkl.referenced_table
ORDER BY SUM(p.rows) DESC
You’ll get results that give you a sense of what it’s going to cost to delete a row from a given table.
As an example from AdventureWorks, the query points to Sales.SpecialOfferProduct as the largest potential problem, and sure enough a delete on that table causes a scan on Sales.SalesOrderDetail
I explain a little about what partitioned tables are in SQL Server and then I give a recap of various methods for managing partitioned tables.
What are Partitioned Tables? (2005)
SQL Server partitioned tables were introduced with SQL Server 2005 Enterprise Edition. Partitions are used to improve maintenance and management of large tables. I’m not going to explain the what and the why of partitions. That would be like reinventing the wheel. Partitions are best explained with Kimberly Tripp’s white paper Partitioned Tables and Indexes in SQL Server 2005. Whether you’re new to partitioning, or whether you’re the MS employee who designed partitioning, I highly recommend reading the article it’s got something for everybody.
What’s new with 2008?
See above. There hasn’t been much that’s changed in terms of basic concepts. However, there are a couple features that have been introduced:
SQL Server’s locking mechanism is more aware of partitions than it was in 2005. For syntax, see the LOCK_ESCALATION option for tables in books on-line and for details/examples/commentary see this article by Paul Randal.
Update April 8, 2009. See the comment section for a couple 2008 features that I missed.
Great, so now you know all there is about partitioning, sliding window scenarios, partition switching, aligned vs. unaligned indexes. So now what’s the best way to go about managing these partitions? There are a few options that I’ve come across that each have their benefits and drawbacks. I’d like to list them out here.
T-SQL (of course):
Using straight T-SQL scripts does everything you need without the handholding.
According to their website, SQL Partition Manager “automates partition management tasks”. It does a very good job of enabling the DBA to perform all tasks that deal with SQL Server partitioning. The price tag starts at $199 which is a small fraction of the price of SQL Server Enterprise Edition.
The biggest advantage that this software has over the other options is its user interface. You can browse partitioned tables faster and easier.
It’s nice and well worth trying the limited time demo version. There is one thing that I wish for it. That it has a “view script” feature that you can use before implementing a change. I mean, it’s always possible to capture everything using a server side trace with SQL Profiler, but that reminds me of the old Saturday Night Live joke: “For a transcript of today’s episode, write down everything I’m saying”
SQL Server Partition Management
A codeplex project by Stuart Ozer from Microsoft. This program is a command line tool that allows people to manage sliding window scenarios. It can also create staging tables to be able to switch data in or out. In fact that is the only focus of this program; to prepare staging tables and to switch partitions with such partitions. It is an excellent alternative to maintaining T-SQL scripts to be used during weekly or monthly partition maintenance tasks. But – as with T-SQL scripts – there is no hand-holding here. Also, there is no output-as-script feature available either.
SSMS wizards:
As mentioned earlier, these wizards helps create, manage and modify partitions. They are very script-able and easy to use.
The Create Partition Wizard can be used to partition an existing table. If needed, the wizard can help create a partition scheme and partition function.
The Manage Partition Wizard can create a staging table for partition switching or it can switch in or switch out data.
In short these wizards accomplish what the command line tool SQL Server Partition Management does except through a UI.
Toad for SQL Server (Up and Coming version 4.5)
I also understand that Toad for SQL Server version 4.5 will have partitioning features and partition management in Capacity Manager as well. (Thanks Brent!)
Summary
For automated management I recommend T-SQL plus the command line tool SQL Server Partition Management.
For day to day and manual management I recommend the windows application SQL Partition Manager.