Michael J. Swart

August 28, 2012

Sharpshooting Query Plans From The Cache

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

Takeaway: I talk a bit about DBCC FREEPROCCACHE, using the plan_handle parameter to remove exactly one query plan from cache. I also explain why and when you might want to.

This is another post where I share a script of mine. It’s one of my favorite kinds of posts so I’ve created a new category for it: SQL Scripts

The Situation

  • Crisis mode: The database server is slow (all of a sudden) and the performance is unacceptable.
  • You used your troubleshooting checklist and you see CPU is suddenly pegged on the database server.
  • And sp_whoisactive (or other monitoring software) tells you that there’s a particular query that is running frequently. In this example, the query starts with “SELECT TOP (@ResultLimit) M.Name FROM MyTable M JOIN … etc“.
  • The query is called in exactly one place in your application (Oh man! A fix will take time to recompile, QA and deploy).
  • The query usually runs for a few minutes and using up almost all the CPU.
  • You’ve collected the query plan and the query text.
  • But, if you run the query in SQL Server Management Studio, the query completes quickly.

Some Analysis

  • The query plan looks different when run in Management Studio
  • The production database must have picked a plan based on unlucky compile-time parameters.
  • Confirmed! When running the query in Management Studio using the OPTIMIZE FOR clause with similar parameters, you can reproduce the bad plan.
  • To buy some time, it would be good to have SQL Server recompile that plan.

This is what I want to explore in this post. What is the best way to get SQL Server to recompile that plan?

Before SQL Server 2008

How do we get SQL Server to recompile this plan? Before SQL Server 2008, there were a couple things I could do:

  • DBCC FREEPROCCACHE: Holy cow! That’s a bit extreme. We ask SQL Server to recompile all of its plans. (Missed opportunity: Illustration featuring Darth Sidious saying “Wipe them out. All of them.”)
  • UPDATE STATISTICS [tablename]: Find a table that is used in the query and have SQL Server update statistics on it. All plans using the table will be recompiled afterwards (including our troubled plan!) We might not actually need updated statistics; it’s the side effect of recompiled plans that we want here. I talked a bit about that in Updating Statistics Helps, But Not For The Reasons You Think.
  • EXEC sp_recompile [tablename]: This technique is similar to UPDATE STATISTICS in that plans are dropped. The benefit is that we don’t have all the overhead and time spent creating the statistics. It’s still not ideal though. It takes a schema modification lock and based on the server’s current busy state, that will mean at least a couple minutes of blocked processes.

Targeting a Single Query Plan

SQL Server version 2008 and later allows you to take out a single query plan from cache using the FREEPROCCACHE command with a plan handle. This is a sharp-shooter technique compared to the other techniques.

I like this technique because it takes no important locks. Existing calls to this query will continue to execute and complete (using the old bad plan). But in the meantime, new calls to this query will use a recompiled plan based on the different (and hopefully better) parameters. If so you’ve just bought yourself time to find and fix this query correctly.

But you can’t just use FREEPROCCACHE without some preparation. Finding the plan handle is not straightforward. So I’ve got a script that only asks that you identify a query based on its text.

The Script

First find out how bad your query is. I’m arbitrarily defining more than 10 active queries as bad. Change the query text here. Use enough text to uniquely identify the problem query. In my case I used “TOP (@ResultLimit)”.

-- Part 1: 
-- Use this query to look for bad performing queries using "TOP (@ResultLimit)"
select 
       count(1) as [count of running queries, should be less than 10], 
       max(datediff(second, start_time, getdate())) as [longest running such query in seconds],
       min(datediff(second, start_time, getdate())) as [shortest running such query in seconds]
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self
GO

Then run this part to knock out that one query (remembering again to adjust the query text).

-- Part 2:
-- If the above indicates poor performance (i.e. many active MyTable queries), 
-- use the following to clear bad plan from cache.
declare @count int = 0;
declare @planhandle varbinary(64)
 
select 
    @count = count(1), 
    @planhandle = max(plan_handle) -- any arbitrary plan
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self
 
-- this query recompiles the one identified plan.
if (@count > 10)
    --working
    dbcc freeproccache (@planhandle);
GO

Hopefully You’ve Bought Yourself Time

The query is recompiled with a better plan and you can sit back (for now). Your query is still vulnerable to bad plans and you’ve got to fix that, but you’ve got time.

In practice I’ve used this script maybe three or four times successfully. Your mileage may vary.

How Risky Is This Script?

Well, What’s the worst thing that could happen?

To be honest, I can’t think of much that could go wrong other than it doesn’t solve your problem. In the worst case scenario:

  • your analysis was wrong, the plan is recompiled and still performs poorly.
  • Or perhaps somehow the script picks the wrong plan to recompile, that’s not bad news either, it’s just news. I certainly can’t say the same for the other techniques that were available before version 2008.
  • Maybe expectations were set incorrectly. You propose this change, It doesn’t fix anything and now it may look like your guessing (This isn’t a technical issue, but it’s something to keep in mind).

By the way, don’t take my word for it. You need to tell others that the risk is small and you have to back this claim up without my help. So you should understand what’s going on here.

 

August 17, 2012

Be Careful with the Merge Statement

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 8:38 am

So there’s a lot of good documentation provided by Microsoft for SQL Server. Microsoft in general is the best model I know of for how technical documentation should be provided to the public (If you’re laughing, it’s time to take another look).

But to the unfamiliar, it can still be hard to judge good docs from bad (or good docs from better). If you knew better, then you probably wouldn’t need the docs right? Wouldn’t it be great to have some third party point to a particular article and say “Read this; this is important.”

For example consider this article: Optimizing MERGE Statement Performance. It’s well written and it explains how performance gains are possible using the MERGE statement because source and target datasets can be processed only once. The alternative to the MERGE statement is to process the datasets multiple times by using more than one INSERT, UPDATE or DELETE statements. But then Microsoft goes on to say “… performance gains depend on having correct indexes, joins, and other considerations in place.” That is an understatement dear readers and I’d like to call your attention to it.

I’m not going to repeat Microsoft’s advice (it is like I said, an excellent article), but I am going to add to it by describing some of the dangers of not following their advice.

Why Is MERGE Dangerous?

Here’s why. It’s because there is a risk that if you don’t get the “indexing, joins and other considerations” right, then it is possible to not only lose the performance benefits you hoped for, but to suffer from performance problems much much worse than if you had written the straight-forward INSERT, UPDATE and/or DELETE statements in the first place. Because of their complexity, I believe MERGE statements seem more vulnerable to bad query plans than other DML statements.

My Own Experience

I’ve been bitten by this problem twice in the past month the most recent time being yesterday. So for the curious, you could say that this lesson counts as the latest thing about SQL Server I learned the hard way.

Experience #1

The first time was with a merge statement that was implementing the standard logic of updating an existing set of table rows with using a dataset sent from an application:

  • Insert rows that are new
  • Update rows that have changed
  • Delete rows that are gone

But the logic of the whole thing required a fussy “WHEN NOT MATCHED BY SOURCE AND …” and the resulting query plan was doing something like a full-outer join. The query processed the entire target table even though it didn’t need to. If you’re really really curious, look for more details look at this similar question I found on Stack Overflow: Sql Server Delete and Merge performance.

I was about to rewrite the whole thing using only UPDATE/INSERT/DELETE statements, but then my colleague found a way to make the merge statement work. Woo hoo.

Experience #2

And then just yesterday, I had another problem with a MERGE statement. Some of the facts:

  • The source query was a CTE and produced about 4000 rows max. That CTE when run on its own never took more than 3 seconds in either on test or production databases.
  • The MERGE’s search condition – the ON [merge_search_condition] clause – used columns that matched a unique non-clustered index. Microsoft recommends a clustered index here, but what can you do?
  • Testing with a target table of several thousand rows went just fine (in hindsight, a test db with millions of rows would have saved me).

But testing is testing and production is production. The optimizer crossed a threshold somewhere and said “Whelp, that’s it. I guess a non-clustered index isn’t going to cut it here any more. Let’s scan the whole thing.” Now most of the time the optimizer comes up with a really good plan. But in this case, it just didn’t.

I plan to rewrite the query as separate INSERT and UPDATE statements and early tests are promising.

Where Do We Go Now

Well, as a veteran problem-avoider, I now feel distrustful and wary of the MERGE statement. And maybe I don’t mind the multiple INSERT/UPDATE statements so much any more. If my MERGE statement takes one second to run and my INSERT/UPDATE statements takes two, then maybe two seconds is acceptable when the alternative takes so much extra effort to properly test. What do you think?

August 10, 2012

Avoiding Problems

Filed under: SQLServerPedia Syndication,Technical Articles,Tongue In Cheek — Michael J. Swart @ 8:00 am

Takeaway: None really, I just wanted to tell a few stories and draw a bit

A long time ago, I had an interview and I was asked a question that (I’m guessing) was designed to evaluate how I approached problem solving. It went something like this:

Interviewer: “Say you’re a farmer who has a cow in a fenced field. But the fence is broken and you need a nail to fix it. The nail you need is inside the barn and if you retrieve it you leave the cow free to escape. What do you do?”
Me: “I think I would just go fetch the nail. I don’t think the cow will get out”
Interviewer: “But you can’t leave the cow alone”
Me: “I don’t know. I’ve been around cows and they’re pretty stupid. There’s a good chance they don’t know the fence has a hole in it. It will probably take me under two minutes to get the nail.”
Interviewer: “Ohh… You’re a problem-avoider.”
Me: !!!

“Problem Avoider”? That label took me by surprise. And even though it wasn’t meant as a criticism, maybe I did feel a little slighted by it. The implication is maybe that I don’t face problems that ought to be faced.

Since then, when I face a technical problem I often ask myself: “Am I a problem avoider?” “Am I refusing to deal with a particular issue, hoping it will go away?” I’d like to believe that the answer to those questions is yes and no respectively: I am a problem-avoider but I do not have my head buried in the sand.

Unapologetic Problem Avoider

Sometimes taking a step back to try another approach makes the problem go away. I was asked something like this recently:

Friend: For this set of data, how do I get the min, max, median values. As well as the value at the 25th percentile and the value at the 75th percentile.
Me: Oooh, I know this one…

I googled “NTILE” and was soon looking at the books online topic wondering if this was appropriate. After some thinking I decided to maybe avoid the problem:

Me: How many rows are we talking about?
Friend: Umm a couple hundred… maybe up to a thousand max.
Me: You can do this in C# right? Just load the entire dataset, sort it and then do index look ups after doing some arithmetic.
Friend: I was thinking of that, but I wanted to know your opinion.
Me: I think that’s the way to go. SQL Server already has to do the reads. The only thing we’re not saving is the network bandwidth. And that’s not much any way.

Problem avoided. But I think that the important part here is that the problem is still being faced; it’s just being faced in the right place and by the right person.

Taking Responsibility

But there are times when you can’t avoid a problem and you have to take responsibility. I’m not going to preach against procrastination. Mostly because I think Nike already has the best advice.

But I want to talk about procrastination’s second cousin: Doing-fun-work-first. Man, I do that a lot. Probably too much. I still struggle, but lately, my approach has been

  • to prioritize things and then be as objectively honest about the priorities as possible (no kidding).
  • Or deal with crummy tasks first in order to “get them out of the way”.

If you got any tips of your own, send them this way…

July 26, 2012

Data Modelling: Composite Relationships

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Takeaway: In this article, I recommend that you use composite primary keys for child tables in composite relationships. If a child record is part of its parent record, then the parent’s primary key should be part of its child’s primary key.

That means that this:

create table CD
(
    CDId int identity,
    Name nvarchar(max),
    Artist nvarchar(max),
    Genre nvarchar(max),
    primary key (CDId)
)
 
create table Track
(
    CDId int references CD(CDId),
    TrackId int,
    Name nvarchar(max),
    primary key (CDId, TrackId)
)

… is better than this:

create table CD
(
    CDId int identity,
    Name nvarchar(max),
    Artist nvarchar(max),
    Genre nvarchar(max),
    primary key (CDId)
)
 
create table Track
(
    TrackId int identity,
    CDId int references CD(CDId),
    TrackNumber int,
    Name nvarchar(max),
    primary key (TrackId)
)

A CD of Milli Vanilli

First some definitions, and then I’ll briefly describe the benefits of this practice.

Composite Relationship

I’m using the UML definitions when I talk about these relationships. Composite relationships between two tables occur when a child points to a parent and when the child is considered part of the parent. A quick test for this relationship is to ask whether records are deleted when the parent is. Or you can ask whether an object gets copied when the parent does. Track-CD is a good example of an composite relationship. Other examples might be Department-Company or SalesOrderDetail-SalesOrderHeader .

Relationships That Are Not Composite

For contrast, some relationships that are not composite include aggregationassociation or dependency relationships. For example, the driver-vehicle relationship is not an composite relationship. It’s a “has a” relationship. Both vehicles and drivers have their own life cycle and it’s common for vehicles to change owners.

Benefits

Since the child records live and die (and get copied) with the parent record, it makes sense to have the parent PK column be the first column of a child’s primary key. The values of the child’s primary key almost never change. Some other benefits:

  • Index lookups and index scans become more efficient.
  • When you extend this practice to children of children, it’s even better. You have the opportunity for less joins on many queries.
  • The SQL you have to write turns out so much cleaner. I hinted about that here: My Favourite Data Modelling Example.
  • In some cases business rules can be enforced that couldn’t before.
  • Delete statements and purge strategies become simpler.

Is This Obvious?

Maybe when you read the Track-CD example above, it was obvious to you that the first example is better. But I see the second example way too often. So often that I felt it deserved a blog post.

If you want, you can look for this stuff in your own tables to double check. Try this out a couple times:

  • Find a foreign key somewhere in your database.
  • Is the relationship enforced by the foreign key an composite relationship? In other words, is one record part of the record it refers to? (e.g. Tracks are a part of their CDs)
  • Then are the foreign key columns part of the primary key? I hope so.

July 17, 2012

T-SQL Tuesday #32 – A Day in the Life

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 8:00 am

T-SQL Tuesday Logo

So it’s T-SQL Tuesday time again! This month it’s hosted by Erin Stellato. The topic this month is interesting. It’s got a bit of a twist. Participating bloggers are asked to keep a diary of sorts for what they did last Wednesday July, 11th 2011. I came into work last Wednesday feeling a slightly more interested in the day ahead. The day seemed a little more significant than other ordinary days. Would it be good, bad or average? It reminded me of a movie (Groundhog Day) about another day with a lot of scrutiny:

Anyway, it’s always interesting to compare someone’s job title with their actual activities and I can’t wait to read Erin’s round up. Here we go:

Michael J. Swart: Senior Database Developer

Morning

Some normal morning Activities.  I took a new way into work this morning. I came in at 8:00 and got a large regular coffee from Tim Horton’s. The first thing I did after logging into my computer is open up OneNote for taking notes today. And now I start answering email:

  • Our company announced support for our product on SQL Server 2012 recently. I answered a few questions about what that means for developers exactly.
  • Looked at db statistics from the previous day. Decided to follow up on an un-patched database that was hogging the resources on its server. There was a bit of stress here. I have responsibility for something that I don’t have enough control over and I had to sort that out.
  • I launched some tests earlier this week. These tests exercise a utility I wrote that deletes data. Because the tests last a few days, I checked up on their progress to see whether they’re still running (They are).
  • Stand up time! (Scrum) Where I learn what others on my team are doing. 9:30

Coffee #2

  • Had a conversation with colleague about implementing a locking mechanism (similar to mutexes) at the database level.
  • Tuning a two line query (which used a view in a view in a view) which is more like an 80 line query. Did I tell you I dislike views? They mask complexity and stuff like this happens. Just now I wrote a quick and dirty program to help me expand view definitions inline. I couldn’t quickly find a free utility online. It’s a cool simple program that I would share if I hadn’t written it during work hours.
  • Woohoo! My first interruption of the day. There’s a particular third party app we use that doesn’t have a client timeout setting. Long running queries cause impatient users to hit refresh and now there’s two long running queries, then three, then four… With the help of sp_whoiscactive, we went from panic to understanding everything relevant. And the brought the server from unresponsive to responsive. The whole process lasted less than two minutes (Thanks again to Adam Machanic and his sp_whoisactive sproc!). Now I have to spend some extra time following up on this. I’ll probably take the next 15 minutes working with the issue tracking software we use to log this incident and make it possible for follow up activities.
  • Interruption 2! A mercifully short question: No you can’t restore a 2008 R2 backup onto a 2008 server.

Lunch time

Went to a used book to pick up Drawing on the Right Side of the Brain. My blog readers voted for Developing on the Right Side of the Brain to be my next post (published last week here) and I don’t have a well-thought-out idea for what the post should be about. The book should evoke some ideas. We’ll see how it goes.

Afternoon

  • More time looking at the poor performing queries I mentioned earlier. Finally got to the bottom of it but with no easy fixes, it was time to discuss other options. I had a meeting about that. Some hard choices here. Not fun. Let’s move on.

3:00 Coffee #3

  • Started on a new project just now. I have to create a utility that takes a database and makes everything anonymous (no personally identifiable information). Easier said than done I think!
  • Woohoo! Another interruption. And another interesting discussion about shrinking databases!

5:00 Home Time

Pasting this OneNote doc into wordpress to be edited later. Overall this was an average day I think.

Some More Thoughts

All this goes to say that there’s a lot of variety in this day in my job. And overall, I think that this day was pretty typical for myself. There was some overlap into DBA territory (production issues) and there was some overlap into Business Intelligence territory too. About 60% of my day was spent on activities I had planned in advance which and 40% then on surprises.

I’m really keen to see what the rest of the SQL Server bloggers did this day! Thanks Erin for hosting and the great topic.

June 13, 2012

Visualizing Transaction Isolations For SQL Server

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 12:00 pm

The four standard SQL Server isolation levels that SQL Server provides are READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ and SERIALIZABLE. (I leave aside the row-versioning levels for another day). SQL Server implements these isolation levels using locks.

In fact I was listening to Kimberly Tripp’s MCM Preparation Video “Snapshot Isolation“. She gives a recap of isolation levels and describes the locks taken during an index scan using READ COMMITTED. She says:

“(For readers) READ COMMITTED uses shared locks … the shared locks are only held for the life of the resource being read. You can almost think of shared locks kind of trickling through the table but not being kept.”

She invites us to visualize this scenario and so I did,  literally. Enjoy!

Visualizing Isolation LevelsLocks taken by SQL Server for the standard isolation levels

(Update June 15, 2012: Remember that with every isolation level (and with NOLOCK in particular) schema stability locks are taken and held. This affects concurrency of DDL statements like CREATE INDEX or ALTER TABLE. See Brent’s comment in the comment section for more)

Other reading

May 30, 2012

MERGE Statement Generator

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 12:00 pm

So what I've got here below is a web form. It takes a source query and target table and spits out a merge statement for the standard insert/update use case. I made it for myself but you might find it useful too. I'm counting on this to be a real time saver (at least for me).

Clustered Index Merge Showplan Operator

Some Neat Things

  • This webform doesn't post to any server. It's all javascript that gets executed in your browser. This means you don't have to worry about anyone stealing your code and I don't have to worry about this form going viral (ha ha).
  • SQL Snippets (either kind) don't quite handle the dynamic list of columns here. Otherwise, you would be reading a different post today.
  • The MERGE statement is a lot more versatile than what I show here. I'm just handling the most common use case.

Things to Watch For

  • Concurrency. If this is important to you, remember to use appropriate locks (usually UPDLOCK) on the target table.
  • Make sure the column list of the source query matches that of the target table
  • You're going to have to clean up the script if your target table has columns that are rowversion, identity, computed etc...
  • User input is being used to generate code here. So my SQL-injection spidey-sense starts to tingle. But it's okay in this case because I'm not running anything. I'm just displaying it. You're the one who's running this stuff so it's up to you to vouch for any generated code. Take care.

The Form


Put in your own data (or just use the sample here). When you're ready, hit MERGE!
Source Query
Target Table
Key Columns (one column per line)
Other Columns (one column per line)

Your Merge Query

May 23, 2012

The View of Foreign Keys That I Wanted

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 8:47 pm

tropical sunset

So I’ve been working on an automation project that makes frequent use of foreign key metadata. I find myself writing queries for this data but I discovered that there’s no super-easy out-of-the-box view of foreign keys for me to use. Here are the ones I considered.

INFORMATION_SCHEMA views

INFORMATION_SCHEMA views that give me foreign key information include

  • INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE for the foreign key info
  • INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS for the referring columns,
  • INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE for the referenced columns

The first view here gives a list of foreign keys and you have to join to the other two tables in order to find the column names. But it’s a crummy solution. First of all, if the foreign key has multiple columns, there’s no real way to match the referring columns to the referenced columns.

The second thing is that we don’t see foreign keys that point to unique Keys (as pointed out by Aaron Bertrand in his post The case against INFORMATION_SCHEMA views.

So that’s out. What else have we got in?

Microsoft SQL Server system views

These views include

  • sys.foreign_keys
  • sys.foreign_key_columns
  • with support from sys.columns and sys.tables

These are the views I deserve, but not the views I need right now. The joins are just too annoying to remember and type each time.

Besides, the word “parent” used here changes with context. The parent table in a foreign key relationship owns the foreign key and does the pointing. But say I’m modeling a hierarchy. In the context of the data model, children records point to their parent records. The mental effort needed to keep these straight is not difficult, but it’s annoying.

My Own Views

So I’ve created my own, the goal is to simplify typing and minimize joins. I skip the word “parent” all together and use “referrer” and “referrenced”. Feel free to use and build on these.

Update June 7, 2012: I added columns ReferrerColumnCanBeNull to each view. I found I wanted it, so I added it here.

FOREIGN_KEYS

CREATE VIEW dbo.FOREIGN_KEYS
AS
    SELECT  SCHEMA_NAME(fk.schema_id) AS FKSchema ,
            fk.name AS FK ,
            SCHEMA_NAME(p.schema_id) AS ReferrerSchema ,
            p.name AS Referrer ,
            STUFF(CAST(( SELECT ',' + c.name
                         FROM   sys.foreign_key_columns fkc
                                JOIN sys.columns c ON fkc.parent_object_id = c.object_id
                                                      AND fkc.parent_column_id = c.column_id
                         WHERE  fkc.constraint_object_id = fk.object_id
                         ORDER BY fkc.constraint_column_id ASC
                       FOR
                         XML PATH('') ,
                             TYPE
                       ) AS NVARCHAR(MAX)), 1, 1, '') AS ReferrerColumns ,
            ISNULL(( SELECT TOP 1
                            1
                     FROM   sys.foreign_key_columns fkc
                            JOIN sys.columns c ON fkc.parent_object_id = c.object_id
                                                  AND fkc.parent_column_id = c.column_id
                     WHERE  fkc.constraint_object_id = fk.object_id
                            AND c.is_nullable = 1
                   ), 0) AS ReferrerColumnsCanBeNull ,
            SCHEMA_NAME(r.schema_id) AS ReferencedSchema ,
            r.name AS Referenced ,
            STUFF(CAST(( SELECT ',' + c.name
                         FROM   sys.foreign_key_columns fkc
                                JOIN sys.columns c ON fkc.referenced_object_id = c.object_id
                                                      AND fkc.referenced_column_id = c.column_id
                         WHERE  fkc.constraint_object_id = fk.object_id
                         ORDER BY fkc.constraint_column_id ASC
                       FOR
                         XML PATH('') ,
                             TYPE
                       ) AS NVARCHAR(MAX)), 1, 1, '') AS ReferencedColumns ,
            fk.delete_referential_action_desc AS deleteAction ,
            fk.update_referential_action_desc AS updateAction ,
            fk.object_id AS FKId ,
            p.object_id AS ReferrerId ,
            r.object_id AS ReferencedId
    FROM    sys.foreign_keys fk
            JOIN sys.tables p ON p.object_id = fk.parent_object_id
            JOIN sys.tables r ON r.object_id = fk.referenced_object_id
GO

FOREIGN_KEY_COLUMNS

CREATE VIEW dbo.FOREIGN_KEY_COLUMNS
AS
SELECT  SCHEMA_NAME(fk.schema_id) AS FKSchema ,
        fk.name AS FK ,
        SCHEMA_NAME(p.schema_id) AS ReferrerSchema ,
        p.name AS Referrer ,
        pc.name AS ReferrerColumn ,
		pc.is_nullable AS ReferrerColumnCanBeNull ,
        SCHEMA_NAME(r.schema_id) AS ReferencedSchema ,
        r.name AS Referenced,
        rc.name AS ReferencedColumn ,
        fk.object_id AS FKId ,
        fkc.constraint_column_id AS FKColumnId ,
        p.object_id AS ReferrerId ,
        r.object_id AS ReferencedId
FROM    sys.foreign_keys fk
        JOIN sys.foreign_key_columns fkc ON fkc.constraint_object_id = fk.object_id
        JOIN sys.tables p ON p.object_id = fk.parent_object_id
        JOIN sys.columns pc ON fkc.parent_object_id = pc.object_id
                               AND fkc.parent_column_id = pc.column_id
        JOIN sys.tables r ON r.object_id = fk.referenced_object_id
        JOIN sys.columns rc ON fkc.referenced_object_id = rc.object_id
                               AND fkc.referenced_column_id = rc.column_id
GO

April 27, 2012

Modifying Tables Online – Part 5: Just One More Thing

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 10:50 am

SERIES: Modifying Tables Online

In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

Just One More Thing

So Rob Volk commented on Part 3 of my blog. The comment prompted me to write a part five (which you’re reading now). So you can thank him for this bonus section.

I’m going to quote Rob directly. He wrote:

One question/suggestion, have you considered using a new schema (e.g. Sales_New) and creating the new table in that schema? The benefit is that all the defaults, keys, triggers, and indexes can retain the same names as the original, and the switch becomes a single ALTER SCHEMA…TRANSFER operation. (or two, one for the old table and one for the new)

He’s absolutely right. The switch step becomes much much simpler and simpler is almost always better (as it is in this case). My migration would start by creating some extra schemas and then creating the staging table. In the Adventureworks example I’ve been using, that looks something like this:

use AdventureWorks2012
go
create schema staging;
go
create schema obsolete;
go
CREATE TABLE staging.SalesOrderHeader(
-- etc...

So now our DB is prepared and looks like this:

Copy the data over to the staging table the same way. In my SalesOrderHeader example, that code doesn’t change at all except that I replace “Sales.SalesOrderHeader_new” with “Staging.SalesOrderHeader”.

When ready, the switch now includes code like this:

ALTER SCHEMA obsolete TRANSFER Sales.SalesOrderHeader;
ALTER SCHEMA Sales TRANSFER staging.SalesOrderHeader;
 
drop trigger obsolete.t_i_SalesOrderHeader;
drop trigger obsolete.t_u_SalesOrderHeader;
drop trigger obsolete.t_d_SalesOrderHeader;

Looks simple right! It is. Simple is better. The schema now looks something like this:

A few things to be careful of:

  • Foreign keys: Although no renaming is necessary, they still have to dropped from and to the obsolete table. And foreign keys pointing to the new table still need to be added.
  • Existing triggers: They should recreated on the new table.

That’s It

So dear reader, that’s the series. It was fun for me to explore in depth an intermediate topic. Thanks for bearing with me. I know the series was a little dry, but I figured that I wanted to use this walk-through for myself as a template for future migration projects and that if I thought it was useful, maybe you would think so too.

April 26, 2012

Modifying Tables Online – Part 4: Testing

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 12:00 pm

SERIES: Modifying Tables Online

In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

This is part 4 of the series (you can consider this Appendix 3, The main article is found here).

So my testing is also pretty straightforward. I

  1. Backup the Adventureworks2012 database on my dev machine
  2. Start sending a whack of queries to the database to simulate concurrent activity
  3. Make Adventureworks2012.Sales.SalesOrderHeader larger by adding lots of rows
  4. Migrate the database using the script from part 3
  5. Look for any errors, fix them, restore the db and repeat the process if necessary
  6. Otherwise … profit?

You should be able to handle each of these steps. Except that I want to explain exactly how I personally handle number 2: I use an application to send queries to the database many times. Sounds easy right? The trick is that I want to call the stored procedure many times at once.

Activity Generator

This is a C# program which I compile into a .net application (I can’t wait for .net 4.5′s async and await)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SqlClient;
using System.Threading;
 
namespace ConsoleApplication1 {
    class Program {
        static int counter = 0;
        static void Main( string[] args ) {
            SqlConnectionStringBuilder cs = new SqlConnectionStringBuilder();
            cs.DataSource = @".";
            cs.InitialCatalog = "Adventureworks2012";
            cs.IntegratedSecurity = true;
            cs.AsynchronousProcessing = true;
            string connectionString = cs.ToString();
            string sql = @"s_DoSomething";
 
            for( int i = 0; i < 100000; i++ ) {
                SqlConnection conn = new SqlConnection( connectionString );
                conn.Open();
                SqlCommand cmd = new SqlCommand( sql, conn );
                cmd.CommandType = System.Data.CommandType.StoredProcedure;
                cmd.BeginExecuteNonQuery( new AsyncCallback( EndExecution ), cmd );
            }
            Console.WriteLine( string.Format( @"Error count: {0}", counter ) );
            Console.ReadLine();
        }
 
        static void EndExecution( IAsyncResult c ) {
            SqlCommand endCmd = ( c.AsyncState as SqlCommand );
            try {
                endCmd.EndExecuteNonQuery( c );
            } catch( Exception ex ) {
                //counter++;
                Interlocked.Increment( ref counter );
                Console.WriteLine( ex.Message );
            } finally {
                endCmd.Connection.Close();
            }
        }
    }
}

You might have noticed that this program is extremely boring. It’s just calling one stored procedure: s_DoSomething over and over as fast as it can. The advantage is that I’m free to modify s_DoSomething (which is for me a lot easier than modifying this application).

So what does this procedure do? I’m glad you asked:

s_DoSomething

This procedure executes at random one task chosen from a set of typical OLTP tasks that use the table Sales.SalesOrderHeader. Here’s the one I used:

create procedure s_DoSomething 
as
begin
 
    declare @choice int = 1 + RAND() * 6
    declare @SalesOrderId int;
 
    if (@choice = 1)
        insert Sales.SalesOrderHeader(DueDate, OrderDate, CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID)
        select top 1000 DueDate, DATEADD(day, -1, DueDate), CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID
        from Sales.SalesOrderHeader
    else if (@choice = 2)
        delete top (100) Sales.SalesOrderHeader
    else if (@choice = 3)
        begin
            declare @personid int;
 
            select top 1 @personid = BusinessEntityID
            from Sales.SalesPerson
            order by newid();
 
            select * 
            from sales.SalesOrderHeader
            where SalesPersonID = @personid;
        end;
    else if (@choice IN (4,5,6))
        begin
            select top 1 @SalesOrderId = SalesOrderId
            from Sales.SalesOrderHeader with (nolock);
 
            select top 100 * 
            from Sales.SalesOrderHeader 
            where SalesOrderId > @SalesOrderId
        end
 
end
« Newer PostsOlder Posts »

Powered by WordPress