Michael J. Swart

March 26, 2010

Listing Tables Ordered By Dependency

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 4:47 pm

Short post for the Weekend.

There’s a number of use cases that may require you to see a list of database tables ordered by dependency.

For example, you may want to know in which order to create tables if you’re dealing with a scripting project. (On the flip side, if you’re looking to delete tables, reverse the order).

I came up with a pretty script that lists the tables you care about in order of how they can be created (i.e. create tables at level zero before moving on to level one tables).

WITH myTableList AS
(
	select object_id, name
	from sys.tables
	--where name like 'stuffICareAbout%'
),
myForeignKeyList AS
(
	select fk.parent_object_id, fk.referenced_object_id
	from sys.foreign_keys fk
	join myTableList mtl_parent
		on mtl_parent.object_id = fk.parent_object_id
	join myTableList mtl_referenced
		on mtl_referenced.object_id = fk.referenced_object_id
),
TablesDependencies AS
(
	SELECT name AS TableName,
		object_id AS TableID,
		0 AS level
	FROM myTableList 
 
	UNION ALL
 
	SELECT mtl.name, mtl.object_id, td.level + 1
	FROM myTableList mtl
	JOIN myForeignKeyList mfk
		ON mfk.parent_object_id = mtl.object_id
		AND NOT mfk.parent_object_id = mfk.referenced_object_id
	JOIN TablesDependencies td
		ON mfk.referenced_object_id = td.TableID
)
SELECT TableName, MAX(level) as level
FROM TablesDependencies
GROUP BY TableName
ORDER BY level, TableName

Modify this to suit your needs, i.e:

  • Modify the where clause to filter only the tables you care about.
  • Modify the fields to report on schema if it’s something you care about

Cheers

March 8, 2010

I/O Operations, Arguably the Only Performance Metric You Need

Measuring Performance

Measuring Performance

Mike Walsh is hosting T-SQL Tuesday this month. And the topic is Input and Output (I/O).

The thing about I/O is that it’s very much a hardware topic and so the category is very very suited for database administrators. The topic is maybe a little more suited to them than us database developers.

So it’s natural to ask: What is it (if anything) about I/O that db developers should know?

I/O is THE Metric For Measuring Performance

Well, here’s my thesis:  I/O is the principal metric to watch. Any other metric – such as duration, CPU cycles, network lag etc… – don’t matter nearly as much as I/O.

Here’s why:

  • (Solid State Drives excepted) Disk operations are one of the few things databases do that rely on moving parts. As such they are slooow (relatively speaking). Save the I/O, save the world.
  • In theory, OLTP databases should get by with minimal I/O. If particular queries are performing many reads or writes chances are that the query can be improved.
  • Is it really a memory issue? You’re worried about memory. Low page life expectancy is usually a symptom of too little memory. But if this value nosedives often, it could be in conjunction with a I/O-intensive query.
  • What about measuring duration? Duration should be treated like a symptom, not a cause. Arguably, excessive I/O is a symptom too, but it’s often one step closer to the cause than duration.
  • Ease off on the CPU-heavy stuff. I’m a big fan of letting the database server serve data and of letting the application do the thinking. I work in a very developer-centric environment and it’s great: Everyone is comfortable with letting the app do any heavy CPU work.
  • I’ve found it convenient to deal with only one ruler. I’ve used I/O as my only metric for a few years now and I’ve been extremely happy with the results. (Well, not quite my only metric, but my TOP 20 I/O queries and my TOP 20 CPU queries have about 18 queries in common).
  • I plan to re-evaluate things if/when Solid State Drives have their day in the sun. It’s not too far into the (exciting) future.

As it turns out, as I write this, the first I/O articles are coming in and it seems that Rob Farley seems to have written a very thorough article about I/O as it relates to the cost of a query. He points out that we shouldn’t ignore other metrics in favor of I/O alone. I may be lucky, but I’ve never seen a performance problem in production that did not have the symptom of high I/O.

Bonus Post Script: My Top 20 I/O Query

Others have created these queries in the past. You can find them everywhere. This one’s mine. It’s a tool that’s been on my belt for a number of years.

DECLARE @SystemIO FLOAT
SELECT @SystemIO = SUM(total_logical_reads + total_logical_writes)
FROM sys.dm_exec_query_stats;
 
SELECT TOP 20 [Row Number] = ROW_NUMBER() OVER (ORDER BY total_logical_reads + total_logical_writes DESC),
    [Query Text] = CASE
        WHEN [sql_handle] IS NULL THEN ' '
        ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
            (CASE
                WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(MAX),ST.text)) * 2
                ELSE QS.statement_end_offset
                END - QS.statement_start_offset) / 2))
        END,
    [Execution Count] = execution_count,
    [Total IO] = total_logical_reads + total_logical_writes,
    [Average IO] = (total_logical_reads + total_logical_writes) / (execution_count + 0.0),
    [System Percentage] = 100 * (total_logical_reads + total_logical_writes) / @SystemIO,
    [Object Name] = OBJECT_NAME(ST.objectid),
    [Total System IO] = @SystemIO,
    [SQL Handle] = [sql_handle]
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text ([sql_handle]) ST
WHERE total_logical_reads + total_logical_writes > 0
ORDER BY [Total IO] DESC

February 2, 2010

Looking at Page Contents

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 10:06 am

If you want to dig deep into SQL Server internals, you’re eventually going to want to look at the physical page structure of a data page. As such I’ve bookmarked Anatomy of a Page by Paul Randal.

The syntax for all this stuff is hard for me to remember, so I’ve made a template for myself. And I’m sharing that here:

-- looking at page contents.
 
-- T3604 to output to console
dbcc traceon (3604)
 
-- select rows from the table of interest
-- (as well as their physical location)
select top (10) t.*,
	pl.file_id, pl.page_id, pl.slot_id
from <tablename,sysname,Production.Product> as t
cross apply sys.fn_PhysLocCracker(t.%%physloc%%) as pl;
 
declare @dbname sysname;
set @dbname = DB_NAME();
dbcc page(@dbname, --db_name
	1, --file_id
	136, --page_id
	1) --results style

Ctrl+Shift+M will let you provide the table name. I’ve also made this script into a code-snippet called page.

Note that the script makes use of sys.fn_PhysLocCracker which is SQL 2008 and later.

By the way, I highly recommend playing with this stuff. Not having to guess at the width of records is really helpful. I plan to write a post soon which shows how it helped an investigation I did.

December 21, 2009

Top 8 Queries to Improve DB Design

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 2:24 pm
Top Eight Queries to Improve DB Design.

Top Eight Queries to Improve DB Design.

There is a disease that most television networks suffer from. The disease was more prevalent in the eighties and nineties than it is today, but every sit-com that lasts longer than two seasons seems destined to suffer from it: The cheesy clip show. The writers of such sitcoms probably thought “Whew! I need a break.” That thought was probably quickly followed by. “I betcha the viewers would like to re-watch Alex P. Keaton attempting to handle that feisty kangaroo”.

Well this article is like that. It’s a time where I link to old articles of mine (and some from the SQLServerPedia wiki!) and pass it off as something novel. Thanks for bearing with me and Happy Holidays. Without further ado:

The Eight Queries

These queries can be run right now! And can help identify areas for improvement in DB design. These queries are focused on DB design (so queries that identify top wait types or top queries by resource usage are skipped).

In no particular order:

  1. Find Missing SQL Dependencies If a view, function or stored procedure refers to a nonexistent object (be it a table, view etc…) This query can help you identify those.
  2. Finding your Missing Keys If a column with a name like ‘%id’ does not point to a table, you just might need a foreign key.
  3. Indexing Foreign Keys? To minimize the amount of maintenance time required to maintain data integrity required by foreign keys. (The query needs a database with data in it. The query is useless against empty databases).
  4. Which identity column is running out of room? Find columns which might be in danger of running out of room. (Run against a database with data in it).
  5. Find Missing Indexes This is courtesy SQLServerPedia. The Microsoft documentation for the missing index views is here. But the SQLServerPedia article pulls it all together. (Must be run against an active database).
  6. Find Queries in the Plan Cache That Are Missing an Index Another query from SQLServerPedia. Be careful with this one. It can use a lot of resources.
  7. Find Indexes Not In Use Again, from SSP, use against an active database.
  8. Find Tables Without Primary Keys Looking for heaps of trouble? This last query is also from SSP. It doesn’t need to be an active DB. Any old development DB will do.

What? Still here?

You still want more meta information about your databases. Be careful what you wish for. I recommend looking at what Jason Strate has done. That’ll keep you busy.

December 17, 2009

Find Missing SQL Dependencies

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:01 pm

A short post today.

I use the following script to find (broken) stored procedures or views that refer to sprocs or tables that no longer exist… or that don’t exist yet.

This is a quick sanity check that can help identify broken or obsolete stored procedures.

SELECT OBJECT_SCHEMA_NAME(referencing_id) + '.' + 
    OBJECT_NAME(referencing_id) AS [referencer],
    referenced_entity_name AS [referenced]
FROM sys.sql_expression_dependencies
WHERE is_ambiguous = 0
    AND OBJECT_ID(ISNULL(referenced_schema_name, 'dbo') + '.' + referenced_entity_name) IS NULL
    AND OBJECT_ID(ISNULL(referenced_schema_name, OBJECT_SCHEMA_NAME(referencing_id)) + '.' + referenced_entity_name) IS NULL
    AND referenced_entity_name NOT IN (SELECT Name FROM sys.types WHERE is_user_defined = 1) -- avoid type false positives
    AND referenced_entity_name not in ('deleted', 'inserted') -- avoid trigger false positives
    AND referenced_database_name is null
ORDER BY OBJECT_NAME(referencing_id), referenced_entity_name

December 7, 2009

Finding your Missing Keys

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 2:47 pm

If you love the R in RDBMS the way I do, you know how useful Foreign Keys are when enforcing relationships. (Update: Actually, I’ve since learned that the R in RDBMS refer to Relations which tables are an example of, not relationships between tables)

So if there’s a column that is supposed to participate in such a relationship (either as a surrogate key, or as a referring column), then you expect it to show up in the guts of some foreign key.

I wrote a script (which may or may not apply to your db) that queries the system views to tell me where I might be missing such foreign keys.

The Query:

SELECT t.name, c.name
FROM sys.columns c
INNER JOIN sys.tables t
	ON t.object_id = c.object_id
INNER JOIN sys.indexes i
	ON i.object_id = t.object_id
LEFT JOIN sys.foreign_key_columns fkc_Parent
	ON fkc_Parent.parent_column_id = c.column_id
	AND fkc_Parent.parent_object_id = c.object_id
LEFT JOIN sys.foreign_key_columns fkc_Referenced
	ON fkc_Referenced.Referenced_column_id = c.column_id
	AND fkc_Referenced.Referenced_object_id = c.object_id
LEFT JOIN sys.index_columns ic
	ON ic.index_id = i.index_id
	AND ic.object_id = t.object_id
	AND ic.column_id = c.column_id
WHERE fkc_Referenced.constraint_object_id IS NULL
	AND fkc_Parent.constraint_column_id IS NULL
	AND ic.index_column_id IS NULL
	AND c.name LIKE '%id'
	AND i.is_primary_key = 1
ORDER BY t.name, c.name

This query returns a candidate list of columns that hint at a potential table relationship with no corresponding foreign key. Specifically, this list is comprised of columns that are

  • made up of columns with a name ending in %id. So this query applies to surrogate columns only for databases that follow this naming convention.
  • non-referencing,
  • non-referenced,
  • not part of any primary key …
  • … but are on tables that have primary keys

The whole point of this list is to make db designers think twice and ask themselves: “Do I need a foreign key here?”

I tried it out on AdventureWorks and there seems to be a missing key from Production.TransactionHistoryArchive(ProductID) to Production.Product(ProductID). As an archive table, maybe the missing foreign key is intentional. But again, it’s good to think twice about it.

Further Reading

Once you’ve created the new foreign keys, don’t forget to read my other article about Indexing for Foreign Keys

September 28, 2009

Detecting Loops using Recursive CTEs

Filed under: SQL Scripts,Technical Articles — Tags: , , , , — Michael J. Swart @ 5:29 am

I want to write (for posterity) a solution to a problem I encountered last week. I was asked a question about the following error message:

The statement terminated. The maximum recursion 100 has
been exhausted before statement completion.

I was dealing with a table that represented a (0..*) to (0..*) relationship between objects I’ll call THINGS1:

CREATE TABLE PARENTS
(
   ParentId INT NOT NULL REFERENCES THING(Id),
   ChildId INT NOT NULL REFERENCES THING(Id),
   PRIMARY KEY (ParentId, ChildId)
)

We were attempting to populate a similar table called ANCESTORS using a recursive CTE. I suspected that the recursive CTE was not terminating because of a loop in the PARENTS table. That is to say, the data in the PARENTS table implied that some THING was its own ancestor. (Think of it this way, if you are your own parent, that also means you’re your own grandparent, great-grandparent etc…)

To help fix the data I needed to write a query that returned a list of all THINGs that were (somehow) their own ancestor. This is what I came up with. It’s inefficient, but it did the trick.

;WITH ANCESTORS_CTE
AS (
     SELECT ParentId AS AncestorId, ChildId AS DescendantId, 1  AS Depth
     FROM PARENTS
     UNION ALL
     SELECT ANCESTORS_CTE.AncestorId, PARENTS.ChildId, ANCESTORS_CTE.Depth + 1 AS Depth
     FROM ANCESTORS_CTE
     JOIN PARENTS
          ON ANCESTORS_CTE.DescendantId = PARENTS.ParentId
     WHERE ANCESTORS_CTE.Depth < 10
)
SELECT DISTINCT AncestorId
FROM ANCESTORS_CTE
WHERE AncestorId = DescendantId

1 Mathematically speaking, the PARENTS table represents the adjacency matrix of a directed acyclic graph.

May 22, 2009

How old is the procedure cache?

Filed under: SQL Scripts,Technical Articles — Michael J. Swart @ 4:55 am

I like to use DMVs a lot. There’s a lot of good information in there. Especially powerful is combining sys.dm_exec_cached_plans with sys.dm_exec_query_plan with a cross apply.

I like looking at the procedure cache because it’s a really good sample of typical activity on a server. It’s useful, but you have to take that information with a grain of salt. The procedure cache might be missing interesting queries that have not been sent since the last server restart. It makes intuitive sense: The longer the procedure cache has to monitor and cache queries, the more comprehensive the procedure cache, and the closer to a good representation of typical server activity.

So it makes sense to ask “How old is the procedure cache?”.

I’ve written a script that quickly tells me when the last server restart was, or when the last DBCC FREEPROCCACHE was issued: (Update June 2, 2009: The comments for this post is a must-see for a much simpler query that returns the same information as the query I post here!)

DECLARE @log TABLE (LogDate datetime, ProcessInfo NVARCHAR(100), [Text] NVARCHAR(MAX));
DECLARE @ProcCacheStart datetime
 
-- get last server restart
SELECT @ProcCacheStart = login_time
FROM sys.sysprocesses
WHERE spid = 1
 
-- get last freeproccache (if necessary)
INSERT @log
EXEC sp_readerrorlog 0;
 
SELECT TOP 1 @ProcCacheStart = LogDate
FROM @log
WHERE [Text] LIKE '%SQL Plans%'
    AND [Text] LIKE '%cachestore flush%'
ORDER BY LogDate DESC
 
SELECT @ProcCacheStart

This gives a pretty good idea of how old your procedure cache is. There are a couple caveats though. For one, sp_readerrorlog is an undocumented feature. Also, the info could be inaccurate if the error log is cycled with sp_cycle_errorlog.

May 7, 2009

When do two timespans overlap?

Filed under: SQL Scripts,Technical Articles — Michael J. Swart @ 12:14 pm

This post is about comparing two timespans and determining whether they overlap at all. Suppose you have a table of timespans similar to the following abstract example:

CREATE TABLE timespans
(
   id INT IDENTITY PRIMARY KEY,
   starttime DATETIME NOT NULL,
   endtime DATETIME NOT NULL
)

A common task is to determine what expression determines whether two given timespans overlap. In this example, say @id1 and @id2 are the two timespans.

A straightforward approach might lead you to consider a number of different scenarios to check. I’ve seen the following checks a couple times:
  • timespan @id1 contains the start time of timespan @id2
  • timespan @id1 contains the end time of timespan @id2
  • timespan @id2 contains the start time of timespan @id1
  • timespan @id2 contains the end time of timespan @id1
These are four checks which result in 8 comparisons. It turns out that the last check is extraneous and so only 6 comparisons are needed.
I find it helpful to first determine whether the timespans do not overlap by doing the following checks:
  • timespan @id1 starts after timespan @id2 ends
  • timespan @id2 starts after timespan @id1 ends

This only needs two checks but each check is one comparison for a total of 2 comparisons. This leads to queries that contain expressions like the one used here:

SELECT 1
FROM timespans AS timespan1,
   timespans AS timespan2
WHERE timespan1.id = @id1
   AND timespan2.id = @id2
   AND NOT timespan1.endtime < timespan2.starttime
   AND NOT timespan2.endtime < timespan1.starttime

This query will return 1 if the two timespans @id1 and @id2 overlap in some way.

April 28, 2009

Which identity column is running out of room?

Filed under: SQL Scripts,Technical Articles — Michael J. Swart @ 10:16 am

I wrote the following query to give me a sense of which identity columns are running out of room. The query finds the identity columns in the current database and reports the ones that are “closest” to having values that are too big for the datatype. The query does this by reporting the percentage of room used so-far.


;WITH maxValues AS
(
   SELECT user_type_idsystem_type_idname, 
   CASE name 
       WHEN 'bigint' THEN CAST(0x7FFFFFFFFFFFFFFF AS bigint)
       WHEN 'int' THEN CAST(0x7FFFFFFF AS bigint)
       WHEN 'smallint' THEN CAST(0x7FFF AS bigint)
       WHEN 'tinyint' THEN CAST(0xFF AS bigint)
   END AS maxValue
   FROM sys.types t
)
SELECT TOP 20 
   t.name AS tableName, 
   c.name AS identityColumnName, 
   maxValues.name AS typeName, 
   ISNULL(IDENT_CURRENT(t.name),0AS currentIdentity,
   100.0 ISNULL(IDENT_CURRENT(t.name),0) / maxValues.maxValue AS percentConsumed
FROM sys.columns c
JOIN maxValues 
   ON maxValues.system_type_id c.system_type_id
   AND maxValues.user_type_id c.user_type_id
JOIN sys.tables t 
   ON t.OBJECT_ID c.OBJECT_ID
WHERE c.is_identity 1
ORDER BY percentConsumed DESC

So say that you ran the query, and there’s a few numbers you’re looking at. How do you know if you’re in danger of running out of room? Well, there’s no way to know how fast each table is growing based on the schema alone. A static data table that uses 15% of the room might be acceptable. But an active table that has used 5% of the room in a months time is not. An understanding of the business domain is the only way to make a good decision here.

So now what happens now that you’ve decided there’s an identity column that’s in danger of running out of room? The only thing that comes to mind is to increase the data type (e.g. make int a bigint). This usually means a ton of regression testing. A large amount of booked downtime for maintenance.

I’m sure there are other suggestions, I may save those for another post. If you have any ideas, leave them in the comments.

« Newer PostsOlder Posts »

Powered by WordPress