December, 2009 | Michael J. Swart Michael J. Swart

December 21, 2009

Top 8 Queries to Improve DB Design

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 2:24 pm

Top Eight Queries to Improve DB Design.

There is a disease that most television networks suffer from. The disease was more prevalent in the eighties and nineties than it is today, but every sit-com that lasts longer than two seasons seems destined to suffer from it: The cheesy clip show. The writers of such sitcoms probably thought “Whew! I need a break.” That thought was probably quickly followed by. “I betcha the viewers would like to re-watch Alex P. Keaton attempting to handle that feisty kangaroo”.

Well this article is like that. It’s a time where I link to old articles of mine (and some from the SQLServerPedia wiki!) and pass it off as something novel. Thanks for bearing with me and Happy Holidays. Without further ado:

The Eight Queries

These queries can be run right now! And can help identify areas for improvement in DB design. These queries are focused on DB design (so queries that identify top wait types or top queries by resource usage are skipped).

In no particular order:

Find Missing SQL Dependencies If a view, function or stored procedure refers to a nonexistent object (be it a table, view etc…) This query can help you identify those.
Finding your Missing Keys If a column with a name like ‘%id’ does not point to a table, you just might need a foreign key.
Indexing Foreign Keys? To minimize the amount of maintenance time required to maintain data integrity required by foreign keys. (The query needs a database with data in it. The query is useless against empty databases).
Which identity column is running out of room? Find columns which might be in danger of running out of room. (Run against a database with data in it).
Find Missing Indexes This is courtesy SQLServerPedia. The Microsoft documentation for the missing index views is here. But the SQLServerPedia article pulls it all together. (Must be run against an active database).
Find Queries in the Plan Cache That Are Missing an Index Another query from SQLServerPedia. Be careful with this one. It can use a lot of resources.
Find Indexes Not In Use Again, from SSP, use against an active database.
Find Tables Without Primary Keys Looking for heaps of trouble? This last query is also from SSP. It doesn’t need to be an active DB. Any old development DB will do.

What? Still here?

You still want more meta information about your databases. Be careful what you wish for. I recommend looking at what Jason Strate has done. That’ll keep you busy.

-- Comments (1)

December 17, 2009

Find Missing SQL Dependencies

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:01 pm

A short post today.

I use the following script to find (broken) stored procedures or views that refer to sprocs or tables that no longer exist… or that don’t exist yet.

This is a quick sanity check that can help identify broken or obsolete stored procedures.

SELECT OBJECT_SCHEMA_NAME(referencing_id) + '.' + 
    OBJECT_NAME(referencing_id) AS [referencer],
    referenced_entity_name AS [referenced]
FROM sys.sql_expression_dependencies
WHERE is_ambiguous = 0
    AND OBJECT_ID(ISNULL(referenced_schema_name, 'dbo') + '.' + referenced_entity_name) IS NULL
    AND OBJECT_ID(ISNULL(referenced_schema_name, OBJECT_SCHEMA_NAME(referencing_id)) + '.' + referenced_entity_name) IS NULL
    AND referenced_entity_name NOT IN (SELECT Name FROM sys.types WHERE is_user_defined = 1) -- avoid type false positives
    AND referenced_entity_name not in ('deleted', 'inserted') -- avoid trigger false positives
    AND referenced_database_name is null
ORDER BY OBJECT_NAME(referencing_id), referenced_entity_name

-- Comments (15)

December 16, 2009

Gleeful Calamity

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication — Michael J. Swart @ 11:21 am

Uhhh Ooooo

“Gleeful Calamity”

… is a phrase that I’ve been giving thoughts to for the past few days. I’m quoting a phrase that Gever Tulley used in a TED talk he gave about his tinkering school.

In many situations it’s valuable to have the freedom to fail. Especially when the goal is learning.

When learning new things about SQL Server, I couldn’t take in half as much knowledge if I didn’t have a development server called sandbox. The sandbox db sits on my local machine. And like its namesake, it’s a safe place to play. It’s a place where starting from scratch from time to time is normal and expected. I expect that the tempdb fills this role for many of you.

Being Creative (Idea Mash-ups)

Having the freedom to fail is also valuable when you want to be creative. Later on in this post I’ll explain a failure of mine and I’ll go through what went wrong and why it doesn’t matter.

Being creative often entails tweaking an existing idea or taking two separate ideas and putting them together.

Sometimes the Idea Mash-ups Work…

Deadlock graphs + blocked process reports = this idea
transactions + db snapshots = this script
TaunTaun + sleeping bag = fantastic-ness

… and Sometimes They Don’t

Here’s one failed idea I had. A long time ago, I work with a database that had 800 tables in one schema in one database. By comparison, the Adventureworks database has about 70 tables in five schemas. I asked myself. Is there an automatic way to group the tables into candidate schemas while staying agnostic about the business rules? Here was my thinking:

I can infer table relationships using defined foreign keys, sys.sysdepends and other clues based on the naming conventions we used.
Tables and their relationships remind me of vertices and edges in graph theory.
I can rely on theories of dead mathematicians to do my work for me!

… in theory any way. In practice – as I found out – there’s no substitute for digging into the business rules of a domain, sigh. But the process was fun and didn’t take too much time. While I was investigating this idea, I thought this is what creativity feels like.

Meta-Calamity

When I started writing this post, I had a different example of a personal failure in mind:

In my sandbox, I once explored a (seemingly) novel idea and after a while, I decided it sucked. As it turns out, Google tells me that my idea was not so novel after all and was written up in a blog as a good idea by someone I respect. So for that and other reasons, I picked another failure. Lesson: Google seemingly novel ideas before beginning a blog topic about it.

(Image is courtesy of Anthony Cain on Flickr… Check out his photostream!)

-- Comments (8)

December 11, 2009

Tracking Down Elusive Query Timeouts

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 3:00 am

Andy Leonard (fine fellow) recently wrote:

If I solve the same problem twice, I blog about it… for me!

So taking a page from his book I want to talk about a problem I recently had. I’ll explain the symptoms and the root cause. (BTW, the format I’m going for is very similar to the SQL CSS team’s blog. They have good case studies there too.)

The Symptoms

Timeouts on one particular type of ad-hoc query were reported.
Our custom error logging framework told us that these timeouts were coming in stretches and always on the same type of query.
These stretches of timeout errors were not based on any particular hour of the day/day of the week.
But these stretches were elusive, most of the time the queries are sub-second.
These ad-hoc queries were not heavy hitters. They never returned more than 50 rows.
Ruled out blocking and bad query plans. The query read data from one table only… no self joins, no scans.
However this table was the largest in the db, (both by rows and by space).
The query itself filtered on a number of columns including on a particular ntext column. (Now we’re getting somewhere).
We finally caught the queries in a “bad stretch”:
- The queries were waiting on Page IO.
- The estimated query plan wouldn’t return.
- And finally, the server trace showed millions of reads when updating statistics on the ntext column.

So there you have it. That last point most closely gives away what’s going on. SQL Server was (auto)-updating statistics but couldn’t complete before the timeout. This meant stats weren’t updated. Setting up the same situation for the next sucker query to come along.

Our Setup, the Perfect Storm

Our particular database was:

an OLTP database that slowly grew into its huge size (i.e. no large loads).
using an older schema that hadn’t been refactored in a while (hence the ntext columns).
doing queries which filtered on this ntext columns. (Causing the optimizer to create/maintain stats on this column). It doesn’t matter that the majority of these ntext values were less than 30 characters.
a db that had auto-create and auto-update stats on.
a db that had auto-update-stats async off.

The Solution

Turn the database setting AUTO_UPDATE_STATISTICS_ASYNC to ON. Even though we had nightly maintenance jobs that updated these stats, sometimes SQL Server sometimes decided that it needed newer ones.

In our case this is the answer. You have to be careful with this setting, especially in Data Warehouse scenarios as Adam Machanic found out (via Chad Boyd).

One More Thing To Learn

SQL Server has a really hard time updating stats on text or ntext columns. I mean a really, really hard time. I wrote the following script to illustrate the difference between ntext and nvarchar(max) in this respect.

--first we create a large table:
create table big
(
	id uniqueidentifier default (newid()) not null primary key,
	nvc_max_value nvarchar(50) null,
	ntext_value ntext null
);
 
set nocount on
go
 
-- this next part takes a while (what with the page-splitting)
insert big default values
go 1000000
 
update big
set nvc_max_value = CAST(id as nvarchar(max)),
	ntext_value = CAST(CAST(id as nvarchar(max)) as ntext);
 
alter index all on big rebuild;
 
-- start profiler here (with appropriate filters):
dbcc dropcleanbuffers
select COUNT(1) from big where nvc_max_value LIKE N'%abcd%'
dbcc dropcleanbuffers
select COUNT(1) from big where ntext_value LIKE N'%abcd%'
/*
                     CPU    Duration  Reads    Writes
-------------------  ------ --------- -------- ---------
nvarchar(max) stats   656       2082     2799  29
ntext stats          2906      12265  3319065  32
*/

Updating the ntext stats requires over 1000 times the reads of the corresponding nvarchar(max)!!! And even though many of these reads were not physical reads, enough of them were so that the ntext case had duration and cpu values that were around five times larger than the nvarchar(max) case.

This is likely because all ntext values are not stored in-row but in LOB pages. Even so, I didn’t expect that big a difference.

Is that it?

Not entirely but the other details aren’t relevant to you dear blog reader. Until next time…

-- Comments (3)

December 7, 2009

Finding your Missing Keys

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 2:47 pm

If you love the R in RDBMS the way I do, you know how useful Foreign Keys are when enforcing relationships. (Update: Actually, I’ve since learned that the R in RDBMS refer to Relations which tables are an example of, not relationships between tables)

So if there’s a column that is supposed to participate in such a relationship (either as a surrogate key, or as a referring column), then you expect it to show up in the guts of some foreign key.

I wrote a script (which may or may not apply to your db) that queries the system views to tell me where I might be missing such foreign keys.

The Query:

SELECT t.name, c.name
FROM sys.columns c
INNER JOIN sys.tables t
	ON t.object_id = c.object_id
INNER JOIN sys.indexes i
	ON i.object_id = t.object_id
LEFT JOIN sys.foreign_key_columns fkc_Parent
	ON fkc_Parent.parent_column_id = c.column_id
	AND fkc_Parent.parent_object_id = c.object_id
LEFT JOIN sys.foreign_key_columns fkc_Referenced
	ON fkc_Referenced.Referenced_column_id = c.column_id
	AND fkc_Referenced.Referenced_object_id = c.object_id
LEFT JOIN sys.index_columns ic
	ON ic.index_id = i.index_id
	AND ic.object_id = t.object_id
	AND ic.column_id = c.column_id
WHERE fkc_Referenced.constraint_object_id IS NULL
	AND fkc_Parent.constraint_column_id IS NULL
	AND ic.index_column_id IS NULL
	AND c.name LIKE '%id'
	AND i.is_primary_key = 1
ORDER BY t.name, c.name

This query returns a candidate list of columns that hint at a potential table relationship with no corresponding foreign key. Specifically, this list is comprised of columns that are

made up of columns with a name ending in %id. So this query applies to surrogate columns only for databases that follow this naming convention.
non-referencing,
non-referenced,
not part of any primary key …
… but are on tables that have primary keys

The whole point of this list is to make db designers think twice and ask themselves: “Do I need a foreign key here?”

I tried it out on AdventureWorks and there seems to be a missing key from Production.TransactionHistoryArchive(ProductID) to Production.Product(ProductID). As an archive table, maybe the missing foreign key is intentional. But again, it’s good to think twice about it.

December 3, 2009

Drag a Column List From Object Explorer

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 1:00 am

Until today, I did not know you could do this. This article explains a tip that might be new to some others like me.

In Management Studio you can drag the Columns folder from Object Explorer into a query window and voila! You’ve got a nice comma delimited list of a table’s columns. Another drag-and-drop win!

10 seconds of Drag and Drop awesomeness

The old way I used to do this was to use ALT+F1 on a tablename to get its sp_help results. Then after copy-pasting the column list, I formatted the list to include commas and to take out whitespace.

The only downside to this drag-and-drop method is that you have to have object explorer expanded to the table you want to use.

But try it out. Play with it. See what other objects in the object explorer are drag-able. This is not a new idea. A quick search shows that Aaron Bertrand posted the same tip about 2 months ago. But it’s probably an idea that deserves more exposure.

-- Comments (0)

December 1, 2009

Comparison: Switching Tables vs. sp_rename

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication — Tags: "sql server", partition, sp_rename, switch — Michael J. Swart @ 1:00 am

Takeaway: Stick with sp_rename and leave the ALTER TABLE SWITCH statement for partitions.

Earlier in the month I was at a talk given by Don Vilen called To Partition or Not To Partition The questions .. and the answers. In the talk he gave an interesting tip about switching partitions. (Remember that switching partitions is a method that is used to implement the sliding window strategy. For more info, search for the keyword SWITCH in this page).

Here’s the tip: Notice that the partition arguments in the ALTER TABLE – SWITCH command are optional. This means that you can swap the data of any table with another. It doesn’t have to be partitioned. But that staging table must still be empty.

So as I thought about how to get around the empty staging table restriction, I had an idea (Great Scott!) This is the diagram I made in my notes:

Not a flux capacitor

Moments later, Don Vilen summed up this idea: “The staging table must be empty. But there’s nothing stopping you from taking a third table with data and switching that into the newly emptied table.” I’m paraphrasing. He said it much better than that. Maybe an example would help:

use tempdb;
 
create table primary_table (i int primary key);
create table staging_table_with_data (i int primary key);
create table empty_staging_table (i int primary key);
insert primary_table values (1), (2), (3)
insert staging_table_with_data values (4), (5), (6)
 
-- the following does not work:
alter table primary_table switch to staging_table_with_data;
/* this fails with:
    ALTER TABLE SWITCH statement failed. The target table
    'tempdb.dbo.staging_table_with_data' must be empty.
*/
 
-- but this does work:
alter table primary_table switch to empty_staging_table;
alter table staging_table_with_data switch to primary_table;

When would you use this?

The original reason that the SWITCH statement was implemented was to help transfer data efficiently by using partition switching. But I think we can use this feature on whole non-partitioned tables in order to quickly transform data.

A great example is an update statement that applies to a large number of rows that has to be done quickly. Myself, I’ve come across a situation where we were converting local times to UTC times and needed speed.

Is this any better than sp_rename?

In the domain of non-partitioned tables, the answer is no, not at all. In fact the code above can probably be written like this:

use tempdb;
 
create table primary_table (i int primary key);
create table staging_table (i int primary key);
 
insert primary_table values (1), (2), (3)
insert staging_table values (4), (5), (6)
 
exec sp_rename @objname='primary_table', @newname = 'old_data', @objtype = 'OBJECT'
exec sp_rename @objname='staging_table', @newname = 'primary_table', @objtype = 'OBJECT'

You probably have seen this kind of technique used by tools that generate scripts to change table definitions. In fact this is the only technique you can use to add or remove an identity column to a table.

So then what’s the difference between them? Why would you choose one method over the other. Hopefully you can use the rest of the article to make an informed decision.

Restrictions

When renaming tables using sp_rename, there are very few restrictions, in fact I’m pretty sure there are only four:

Neither the primary or staging table can be temporary
The table can’t be published (via replication)
The new name can’t clash with an existing object name.
Enforced dependencies – like schema-bound views – can’t exist.

In fact, aside from syntax errors, I’ve never ever seen sp_rename fail. However, when swapping non-partitioned tables using the SWITCH keyword there are more pitfalls to watch out for:

The columns, types, indexes and constraints of the primary and staging tables must match.
too many others to list

To see what could go wrong, read the manual here or check out the error list with:

--roughly 50 rows here:
select text from sys.messages
where text like '%ALTER TABLE SWITCH statement%'
and text not like '%partition%'

But the the best approach would be to test specific scenarios on a test machine. Just to find out what’s possible.

Benefits of one over the other

So if you’re still with me (and congratulations and thank you for bearing with me) you might get the distinct impression that sp_rename is the better choice. You’re probably right. But I’ll still try to sum up the benefits and drawbacks of each method here:

sp_rename:

Almost never fails.
Different table definitions are possible.
Indexed views (which rely on schema-bound views) are not allowed.
Foreign keys and indexes must be recreated before or after the rename.

SWITCH

You don’t have to recreate indexed views. But the staging table must have matching views before the switch.
Other foreign keys, indexes and constraints do not need to be redefined. But as before, the staging table must be prepared properly.
Minimal or zero downtime.
The big drawback is the rule that requires the staging and the primary tables to have matching definitions.

So, like I said in the beginning: For the most part, stick with sp_rename and leave the ALTER TABLE SWITCH statement for partitions.

-- Comments (5)