Michael J. Swart

September 12, 2012

When I Use Nested Joins

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

I want to explain how I avoid RIGHT OUTER joins in favor of LEFT OUTER joins and how I avoid OUTER joins in favor of INNER joins.

There's no place bitchin-er than Kitchener

Inner joins have no direction, but outer joins do so that we have three kinds of joins:

  • INNER JOIN (JOIN for short)
  • LEFT OUTER JOIN (LEFT JOIN for short)
  • RIGHT OUTER JOIN (RIGHT JOIN for short)

I leave out FULL OUTER JOINS for now because I never use them. And in fact RIGHT OUTER JOINS can always be written as LEFT OUTER JOINS, so in practice I only use just the two kinds of joins:

  • INNER JOIN
  • LEFT OUTER JOIN

Now as a rule of thumb, inner joins are more efficient than outer joins so it would be better to write queries that avoid outer joins. Let’s get to an example. My goal here is to write a query that gives me a list of employees and their director (if any) based on this schema.

CREATE TABLE STAFF
(
    Id NVARCHAR(20) NOT NULL PRIMARY KEY,
    Name NVARCHAR(400) NOT NULL,
    Department NVARCHAR(20),
    Role NVARCHAR(20) NOT NULL
)
 
CREATE TABLE BOSSES
(
    EmployeeId NVARCHAR(20)
        REFERENCES STAFF(Id),
    BossId nvarchar(20)
        REFERENCES STAFF(Id),
    PRIMARY KEY (EmployeeId, BossId)
)

BOSSES is a table that contains not just direct reports, but all direct and indirect reports (making it handy for this query).

Using a RIGHT JOIN

The answer is fairly straightforward. I join the BOSSES table with the STAFF table to give me all the directors and their reports:
Two inner joins

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Boss
join BOSSES B
    on B.BossId = S_Boss.Id
join STAFF S_Employee
	on B.EmployeeId = S_Employee.Id
where S_Boss.Role = 'Director'

But wait, this isn’t a complete list of employees. What about those in the company that don’t report to any director (not even indirectly). This is where the right outer join comes in:
An inner join and a right join

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Boss
join BOSSES B
    on B.BossId = S_Boss.Id
RIGHT join STAFF S_Employee
    on B.EmployeeId = S_Employee.Id
    and S_Boss.Role = 'Director'

Notice that the S_Boss.Role filter can’t belong to the where clause any more. If it did, we’d lose director-less employees again and we’d be back where we started.

Using LEFT JOINS Only

That works… but for the sake of style, let’s only use left joins. I prefer using only left joins in this case because logically, the results are meant to be the set of employees. So I like to start with that list of employees and then join in the bosses if necessary. That’s why I start with the set of employees as the first table. The other joined tables aren’t the stars of this query; they’re only there to help provide that director attribute. So I have this:
Two left joins

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Employee
left join BOSSES B
    on B.EmployeeId = S_Employee.Id
left join STAFF S_Boss
    on B.BossId = S_Boss.Id
    and S_Boss.Role = 'Director'

But you notice that I’m now using two left joins… Really I only want the one outer join that I was using in the first example. Turns out I can do that:

Using Nested Joins

Well that just looks like this:
One left join

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Employee
left join BOSSES B
    join STAFF S_Boss
        on B.BossId = S_Boss.Id
        and S_Boss.Role = 'Director'
    on B.EmployeeId= S_Employee.Id

This is logically equivalent to the right join but it uses left joins instead and only uses one outer join. Great!

A lot about this example is only about style. Maybe you prefer the right join example and think that I’m breaking some grammar rule here. It feels like I’m splitting an infinitive or something. Let me know what your preference is.

August 28, 2012

Sharpshooting Query Plans From The Cache

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

Takeaway: I talk a bit about DBCC FREEPROCCACHE, using the plan_handle parameter to remove exactly one query plan from cache. I also explain why and when you might want to.

This is another post where I share a script of mine. It’s one of my favorite kinds of posts so I’ve created a new category for it: SQL Scripts

The Situation

  • Crisis mode: The database server is slow (all of a sudden) and the performance is unacceptable.
  • You used your troubleshooting checklist and you see CPU is suddenly pegged on the database server.
  • And sp_whoisactive (or other monitoring software) tells you that there’s a particular query that is running frequently. In this example, the query starts with “SELECT TOP (@ResultLimit) M.Name FROM MyTable M JOIN … etc“.
  • The query is called in exactly one place in your application (Oh man! A fix will take time to recompile, QA and deploy).
  • The query usually runs for a few minutes and using up almost all the CPU.
  • You’ve collected the query plan and the query text.
  • But, if you run the query in SQL Server Management Studio, the query completes quickly.

Some Analysis

  • The query plan looks different when run in Management Studio
  • The production database must have picked a plan based on unlucky compile-time parameters.
  • Confirmed! When running the query in Management Studio using the OPTIMIZE FOR clause with similar parameters, you can reproduce the bad plan.
  • To buy some time, it would be good to have SQL Server recompile that plan.

This is what I want to explore in this post. What is the best way to get SQL Server to recompile that plan?

Before SQL Server 2008

How do we get SQL Server to recompile this plan? Before SQL Server 2008, there were a couple things I could do:

  • DBCC FREEPROCCACHE: Holy cow! That’s a bit extreme. We ask SQL Server to recompile all of its plans. (Missed opportunity: Illustration featuring Darth Sidious saying “Wipe them out. All of them.”)
  • UPDATE STATISTICS [tablename]: Find a table that is used in the query and have SQL Server update statistics on it. All plans using the table will be recompiled afterwards (including our troubled plan!) We might not actually need updated statistics; it’s the side effect of recompiled plans that we want here. I talked a bit about that in Updating Statistics Helps, But Not For The Reasons You Think.
  • EXEC sp_recompile [tablename]: This technique is similar to UPDATE STATISTICS in that plans are dropped. The benefit is that we don’t have all the overhead and time spent creating the statistics. It’s still not ideal though. It takes a schema modification lock and based on the server’s current busy state, that will mean at least a couple minutes of blocked processes.

Targeting a Single Query Plan

SQL Server version 2008 and later allows you to take out a single query plan from cache using the FREEPROCCACHE command with a plan handle. This is a sharp-shooter technique compared to the other techniques.

I like this technique because it takes no important locks. Existing calls to this query will continue to execute and complete (using the old bad plan). But in the meantime, new calls to this query will use a recompiled plan based on the different (and hopefully better) parameters. If so you’ve just bought yourself time to find and fix this query correctly.

But you can’t just use FREEPROCCACHE without some preparation. Finding the plan handle is not straightforward. So I’ve got a script that only asks that you identify a query based on its text.

The Script

First find out how bad your query is. I’m arbitrarily defining more than 10 active queries as bad. Change the query text here. Use enough text to uniquely identify the problem query. In my case I used “TOP (@ResultLimit)”.

-- Part 1: 
-- Use this query to look for bad performing queries using "TOP (@ResultLimit)"
select 
       count(1) as [count of running queries, should be less than 10], 
       max(datediff(second, start_time, getdate())) as [longest running such query in seconds],
       min(datediff(second, start_time, getdate())) as [shortest running such query in seconds]
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self
GO

Then run this part to knock out that one query (remembering again to adjust the query text).

-- Part 2:
-- If the above indicates poor performance (i.e. many active MyTable queries), 
-- use the following to clear bad plan from cache.
declare @count int = 0;
declare @planhandle varbinary(64)
 
select 
    @count = count(1), 
    @planhandle = max(plan_handle) -- any arbitrary plan
from sys.dm_exec_requests er
cross apply sys.dm_exec_sql_text(er.sql_handle) st
where st.text like '%TOP (@ResultLimit)%'
    and st.text not like '%80B82FDD-0297-4057-9AEB-D811F3B5C7DB%' -- filter self
 
-- this query recompiles the one identified plan.
if (@count > 10)
    --working
    dbcc freeproccache (@planhandle);
GO

Hopefully You’ve Bought Yourself Time

The query is recompiled with a better plan and you can sit back (for now). Your query is still vulnerable to bad plans and you’ve got to fix that, but you’ve got time.

In practice I’ve used this script maybe three or four times successfully. Your mileage may vary.

How Risky Is This Script?

Well, What’s the worst thing that could happen?

To be honest, I can’t think of much that could go wrong other than it doesn’t solve your problem. In the worst case scenario:

  • your analysis was wrong, the plan is recompiled and still performs poorly.
  • Or perhaps somehow the script picks the wrong plan to recompile, that’s not bad news either, it’s just news. I certainly can’t say the same for the other techniques that were available before version 2008.
  • Maybe expectations were set incorrectly. You propose this change, It doesn’t fix anything and now it may look like your guessing (This isn’t a technical issue, but it’s something to keep in mind).

By the way, don’t take my word for it. You need to tell others that the risk is small and you have to back this claim up without my help. So you should understand what’s going on here.

 

May 23, 2012

The View of Foreign Keys That I Wanted

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: , — Michael J. Swart @ 8:47 pm

tropical sunset

So I’ve been working on an automation project that makes frequent use of foreign key metadata. I find myself writing queries for this data but I discovered that there’s no super-easy out-of-the-box view of foreign keys for me to use. Here are the ones I considered.

INFORMATION_SCHEMA views

INFORMATION_SCHEMA views that give me foreign key information include

  • INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE for the foreign key info
  • INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS for the referring columns,
  • INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE for the referenced columns

The first view here gives a list of foreign keys and you have to join to the other two tables in order to find the column names. But it’s a crummy solution. First of all, if the foreign key has multiple columns, there’s no real way to match the referring columns to the referenced columns.

The second thing is that we don’t see foreign keys that point to unique Keys (as pointed out by Aaron Bertrand in his post The case against INFORMATION_SCHEMA views.

So that’s out. What else have we got in?

Microsoft SQL Server system views

These views include

  • sys.foreign_keys
  • sys.foreign_key_columns
  • with support from sys.columns and sys.tables

These are the views I deserve, but not the views I need right now. The joins are just too annoying to remember and type each time.

Besides, the word “parent” used here changes with context. The parent table in a foreign key relationship owns the foreign key and does the pointing. But say I’m modeling a hierarchy. In the context of the data model, children records point to their parent records. The mental effort needed to keep these straight is not difficult, but it’s annoying.

My Own Views

So I’ve created my own, the goal is to simplify typing and minimize joins. I skip the word “parent” all together and use “referrer” and “referrenced”. Feel free to use and build on these.

Update June 7, 2012: I added columns ReferrerColumnCanBeNull to each view. I found I wanted it, so I added it here.

FOREIGN_KEYS

CREATE VIEW dbo.FOREIGN_KEYS
AS
    SELECT  SCHEMA_NAME(fk.schema_id) AS FKSchema ,
            fk.name AS FK ,
            SCHEMA_NAME(p.schema_id) AS ReferrerSchema ,
            p.name AS Referrer ,
            STUFF(CAST(( SELECT ',' + c.name
                         FROM   sys.foreign_key_columns fkc
                                JOIN sys.columns c ON fkc.parent_object_id = c.object_id
                                                      AND fkc.parent_column_id = c.column_id
                         WHERE  fkc.constraint_object_id = fk.object_id
                         ORDER BY fkc.constraint_column_id ASC
                       FOR
                         XML PATH('') ,
                             TYPE
                       ) AS NVARCHAR(MAX)), 1, 1, '') AS ReferrerColumns ,
            ISNULL(( SELECT TOP 1
                            1
                     FROM   sys.foreign_key_columns fkc
                            JOIN sys.columns c ON fkc.parent_object_id = c.object_id
                                                  AND fkc.parent_column_id = c.column_id
                     WHERE  fkc.constraint_object_id = fk.object_id
                            AND c.is_nullable = 1
                   ), 0) AS ReferrerColumnsCanBeNull ,
            SCHEMA_NAME(r.schema_id) AS ReferencedSchema ,
            r.name AS Referenced ,
            STUFF(CAST(( SELECT ',' + c.name
                         FROM   sys.foreign_key_columns fkc
                                JOIN sys.columns c ON fkc.referenced_object_id = c.object_id
                                                      AND fkc.referenced_column_id = c.column_id
                         WHERE  fkc.constraint_object_id = fk.object_id
                         ORDER BY fkc.constraint_column_id ASC
                       FOR
                         XML PATH('') ,
                             TYPE
                       ) AS NVARCHAR(MAX)), 1, 1, '') AS ReferencedColumns ,
            fk.delete_referential_action_desc AS deleteAction ,
            fk.update_referential_action_desc AS updateAction ,
            fk.object_id AS FKId ,
            p.object_id AS ReferrerId ,
            r.object_id AS ReferencedId
    FROM    sys.foreign_keys fk
            JOIN sys.tables p ON p.object_id = fk.parent_object_id
            JOIN sys.tables r ON r.object_id = fk.referenced_object_id
GO

FOREIGN_KEY_COLUMNS

CREATE VIEW dbo.FOREIGN_KEY_COLUMNS
AS
SELECT  SCHEMA_NAME(fk.schema_id) AS FKSchema ,
        fk.name AS FK ,
        SCHEMA_NAME(p.schema_id) AS ReferrerSchema ,
        p.name AS Referrer ,
        pc.name AS ReferrerColumn ,
		pc.is_nullable AS ReferrerColumnCanBeNull ,
        SCHEMA_NAME(r.schema_id) AS ReferencedSchema ,
        r.name AS Referenced,
        rc.name AS ReferencedColumn ,
        fk.object_id AS FKId ,
        fkc.constraint_column_id AS FKColumnId ,
        p.object_id AS ReferrerId ,
        r.object_id AS ReferencedId
FROM    sys.foreign_keys fk
        JOIN sys.foreign_key_columns fkc ON fkc.constraint_object_id = fk.object_id
        JOIN sys.tables p ON p.object_id = fk.parent_object_id
        JOIN sys.columns pc ON fkc.parent_object_id = pc.object_id
                               AND fkc.parent_column_id = pc.column_id
        JOIN sys.tables r ON r.object_id = fk.referenced_object_id
        JOIN sys.columns rc ON fkc.referenced_object_id = rc.object_id
                               AND fkc.referenced_column_id = rc.column_id
GO

January 25, 2012

Rebuild Your Indexes Online (When You Can)

So I was recently reading SQL Server’s blog, specifically the article Customers using SQL Server 2012 today! and some brave businesses are already using and enjoying some of SQL Server 2012 features that help their databases stay available. I’m excited about the new features too. For example, index rebuilds have been improved. We can now rebuild indexes online that include “blob” columns (like nvarchar(max), image, etc…). This means that (almost) every index can be rebuilt without requiring a table lock for the entire operation. And that’s good news for availability!

This wasn’t the case in earlier versions. In earlier versions, you couldn’t rebuild an index online if it included columns that were large strings. If you tried, you would get this message:

/*
An online operation cannot be performed for index 'ix_t1_1' because the 
index contains column '<column name>' of data type text, ntext, image, varchar(max), 
nvarchar(max), varbinary(max), xml, or large CLR type. For a non-clustered 
index, the column could be an include column of the index. For a clustered 
index, the column could be any column of the table. If DROP_EXISTING is 
used, the column could be part of a new or old index. The operation must be 
performed offline. 
*/

Logic to determine when ONLINE=ON is supported

So that means there’s one more thing to check when finding out whether you can rebuild an index on-line. The logic now goes like this. You can rebuild an index online if:

  • You’re using Enterprise Edition or higher
  • And you’re using SQL Server 2012 or later
  • Or you’re using SQL Server 2008 or earlier
  • And your index is clustered and the table contains no blob columns
  • Or your index is non-clustered and the index includes no blob columns

Hmm… You can’t really put parentheses in a bullet point list. Here try this flow chart:

Clear as mud

Hmm… that really doesn’t clear things up too well either. And I haven’t even mentioned partitioned indexes.

Just Try It

Maybe you’re like me, you may have to deal with multiple versions, multiple editions, and multiple tables and their indexes. Instead of wading through that logic above, just TRY it and let SQL Server figure it out. Here’s a sproc which takes a table name and index name and tries to rebuild the index online. If it can’t, it builds it offline.

create procedure s_TryRebuildOnlineOtherwiseOffline
(
	@schema sysname = 'dbo',
	@tablename sysname,
	@indexname sysname
)
as
begin
	set @schema = QUOTENAME(@schema);
	set @tablename = QUOTENAME(@tablename);
	set @indexname = QUOTENAME(@indexname);
 
	declare @sqlRebuild nvarchar(max)
	set @sqlRebuild = N'ALTER INDEX ' + @indexname + ' ON ' + @schema + '.' + @tablename + ' REBUILD';
	declare @sqlRebuildOnline nvarchar(max)
	set @sqlRebuildOnline = @sqlRebuild + ' WITH (ONLINE=ON)';
 
	begin try
		EXEC sp_executesql @sqlRebuildOnline;
		print @sqlRebuildOnline;
	end try
	begin catch
		EXEC sp_executesql @sqlRebuild;
		print @sqlRebuild;
	end catch
end
go

Alternatively, you can adjust the script to REORGANIZE an index when it can’t rebuild it online.

By the way, if you use Ola Hallengren’s maintenance scripts, he’s already taken all this logic into account! If you write your own maintenance scripts, feel free to incorporate what I have here.

December 8, 2011

CXPACKET? What’s That? And What’s Next?

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 8:54 pm

Takeaway: Is the wait type CXPACKET bothering you? It means parallelism. Here’s how to find the queries that are giving you this grief.

The Wait Type CXPACKET

Since 2005, SQL Server provides dynamic management views (DMVs). These views help monitor database health and they’re the first place I go to when troubleshoot problems. For example the view sys.dm_os_wait_stats gives an aggregated way to tell you where the waits are. Often you’ll come across the wait type CXPACKET. Microsoft explains:

“[This wait type] Occurs when trying to synchronize the query processor exchange iterator.”

Hmmm… I know what those words mean individually… Any way, if you keep searching, you’ll see others explain: “CXPACKET means parallelism”. We’re getting a bit closer now. If a query is run with multiple threads, and one thread completes before the others, then SQL Server will report the wait type CXPACKET for that thread.

It’s like several chefs are making soup. If one chef’s only job is to add the cilantro, there’s going to be some waiting involved. So is the extra help worth the trouble?

SQL Server thought it could use more cooks for a particular “recipe”… Maybe it’s time to make simpler recipes. (The answer is almost always yes, especially for OLTP systems). SQL Server doesn’t use parallelism often. It only comes up with a multi-threaded query plan when it needs to, i.e. when the plan is estimated to be costly enough. It’s usually those queries with the high estimated cost that need to be fixed, the CXPACKET wait type is a symptom of that.

Those High Cost Queries

So which queries are those? Again, I like to go to the DMVs. The one I like to look at is sys.dm_exec_query_stats except that it doesn’t have a flag called uses_parallelism. For the definitive way to find parallel queries, you have to scan and parse all the query plans that are cached using sys.dm_exec_query_plan.

But that takes for ever so I don’t recommend it. But recently I found out that we don’t really need it: Parallel queries are interesting, they’re one of the only kind of queries that can use more CPU time than actual elapsed time. So if that’s true, then we just need to examine these queries:

select * from sys.dm_exec_query_stats 
where total_elapsed_time < total_worker_time
	-- total_worker_time is measured to the microsecond, but is accurate to the millisecond so:
	and ( ( total_worker_time – total_elapsed_time ) / execution_count ) * 1.0 > 1000 -- avg difference is at least 1 ms

But it might not be true, for example, a parallel query might be suffer from blocking too long. Long enough that the elapsed time might be longer than the total amount of CPU time. That’s an interesting question: How good an indication of parallelism is the ratio between elapsed time and CPU time?

(total_elapsed_time < total_cpu_time) Vs. uses_parallelism

I’m lucky. I’ve got a large set of query data that can help me.

This chart needs a little explanation. Pick a dot in the chart above. It represents a set of queries whose ratio of cpu to elapsed time are the same. Look at the value on the x-axis to see what that common ratio is. Say that it’s 1.5, this means that that dot represents a set of queries whose elapsed time is exactly 50% greater than its cpu time. Now look at its value on the Y axis. You might see 2%. That’s means that 2% of those queries use parallelism.

I expected a slope, not a waterfall and this is what that means to me. Knowing that a query’s worker time is greater than its elapsed time is a great indication of parallelism. And knowing by how much doesn’t change that.

CXPACKET? Here’s What’s Next

This query lists the top 20 worst parallel queries cached in your database.

-- TOP 20 parallel queries (by CPU)
SELECT TOP (20)
    [Total CPU] = total_worker_time,
	[Total Elapsed Time] = total_elapsed_time,
	[Execution Count] = execution_count,
    [Average CPU in microseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
    [DB Name] = DB_NAME(ST.dbid),
    [Object Name] = OBJECT_NAME(ST.objectid, ST.dbid),
    [Query Text] = (SELECT [processing-instruction(q)] = CASE 
            WHEN [sql_handle] IS NULL THEN ' '
            ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                (CASE 
                        WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                        ELSE QS.statement_end_offset
                        END - QS.statement_start_offset) / 2))
            END
			FOR XML PATH(''), type),
    [Query Plan] = qp.query_plan
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
WHERE total_elapsed_time < total_worker_time
	AND ( ( total_worker_time - total_elapsed_time ) / execution_count ) * 1.0 > 1000 -- average difference is more than a millisecond
ORDER BY total_worker_time DESC

Some interesting things about this query:

  • I’ve written it for SQL Server Management Studio. In SSMS, you can click on the query and the query plan to explore these queries further.
  • I’ve added a filter to weed out queries whose average cpu time is more than a millisecond. I’m sure you won’t miss those stats. Microsoft reports CPU time in microseconds, but it’s only accurate to milliseconds. So you can get false-positives if you don’t ignore the blazing fast queries. That’s why I added that filter.

November 28, 2011

SQL Fingertips

They Might Be Giants (TMBG) released an album called Apollo 18. It has 38 tracks and it runs for 42 minutes. It’s an album that will always remind me of my friend and first University roommate (Hey, what’s up Chris!). So the average track is about a minute long and this is due to a set of 21 tracks collectively called “fingertips”. Most of these fingertips are five to fifteen seconds long. Some are a bit longer. These fingertips are probably not good enough to be put on an album on their own. But collectively, they make a really cool album (especially on shuffle mode).

The List

So when I found myself with a collection of tiny tips that might not qualify as a blog post on their own, I thought of that album and a microsecond later, we’ve got SQL Fingertips!

So here they are, in no particular order.

  • Stackoverflow tip: Use the bounty feature without being afraid of the hit to your reputation. A bounty offered on a decent question usually attracts enough attention to get your question up-votes. And unless it’s a crummy question, those up-votes will earn you back the reputation points you spent.
  • T-SQL tip: You can use the following script as a template to be able to run a single query on multiple databases and have it return a single result set:
    DECLARE @temp TABLE (
    	ServerName sysname,
    	DBName sysname,
    	[Row Count] int
    );
     
    declare @schemaname sysname = 'Production';
    declare @tablename sysname = 'Product';
    declare @sql nvarchar(max) = N'use [?];
    	if exists (select 1 from sys.tables where name = ''' + @tablename + N''' and SCHEMA_NAME(schema_id) = ''' + @schemaname + N''')
    		exec sp_executesql N''
    			select @@servername, db_name(), Count(1) as [count]
    			from ' + QUOTENAME(@tablename) + '''';
     
    insert @temp (ServerName, DBName, [Row Count])
    exec master..sp_MSforeachdb @sql;
    select * from @temp;
  • SQL Server Management Studio tip: GO can take a parameter. If you write GO 100, your batch will be executed 100 times. Many people know this but I always here a couple “wow”s each time I demonstrate it. And here’s a couple more notes about that batch separator.
    • Don’t forget to SET ROWCOUNT OFF.
    • The batch separator doesn’t have to be GO. It’s configurable! I used to set it to ENGAGE for a while, but the joke got old and I switched it back.
  • Blog tip: If you like following blogs or people, but maybe you’re falling out of love with your RSS reader. No worries. You can get any RSS feed at all sent straight to your inbox.  There are lots of RSS-to-inbox websites out there. The one I recommend is FeedMyInbox.com. So, for example, to get my own blog delivered to your inbox, start here.
  • Data → Eye → Brain, A Data Visualization tip: Data visualizations (charts, diagrams, etc..) help us gain knowledge and understand data that maybe we couldn’t understand otherwise (See Anscombe’s Quartet). And for large and complex datasets, the best visualizations are usually custom designed and interactive. There’s a lot of power in customized visualizations: See what David McCandless and Miriah Meyers have to say.
  • Twitter DBA Tip #1: I put a call on twitter asking people to tweet the one database tip they thought more people should know. The first one comes from @BrentO: “Don’t leave your database in full recovery mode if you’re not doing t-log backups.” He’s right. Full recovery mode usually implies a backup strategy that includes transaction log backups. (And full or differential backups are not enough either).
  • Twitter DBA Tip #2: This one’s from @wnylibrarian“If using SQLExpress & no SQLAgent, try using SQLCMD in a batchfile.” I’m glad Gary tweeted this one. I’ve come up against this problem in my own work. It turns out that when downgrading editions of SQL Server, often one of the first things that you miss is SQL Agent. If you follow Gary’s advice, it’s not a hurdle, it’s only an inconvenience.
  • Twitter DBA Tip #3: Next I heard from @DaveH0ward who wants to remind people that “You need to size your data /log files and set autogrowth amounts appropriately.” I think a lot of people don’t know that the default auto-growth settings for SQL Server databases are a bad idea. Tim Ford warns us of dire consequences for not following this tip.
  • Twitter DBA Tip #4: Finally I heard from @SQLSandwhiches (aka Adam Mikolaj). He shared a pet peeve “I’ve seen so many servers that don’t have logs to auto cycle. Don’t you love opening a log file with 300,000 entries?” Adam wrote about his experience earlier this year (including a solution of course).
  • MixedTape Fingertip Tip: Say you find yourself back in the nineties making a mixed tape and you find that you have a minute left to fill on one side of the tape. They Might Be Giants to the rescue! Pick several fingertips from their Apollo 18 album and optimize every second of that mixed tape! My personal favorite is the 45 second Minimum Wage off of their other album Flood.

So how many of these SQL Fingertips did you know? Have you got a small tip? Share it in the comments. Have you got a whole bunch of small tips? Write your own SQL Fingertip post. I’d be especially grateful to read a set of Business Intelligence fingertips.

October 26, 2011

Where Are Your Popular Joins?

So you know a lot about your databases right? You’re familiar with their schemas and tables and the queries that run on them. Personally I use sys.dm_exec_query_stats to understand what the most popular queries are.

But I recently started wondering about popular table joins.

I was wondering: “What tables in my database are most commonly joined together?” I already have a pretty good idea based on the data model. But I wanted to find out if the popular queries are in sync with my understanding. Unfortunately there’s no system view called sys.dm_exec_join_stats. The whole reason that I was curious is that I wanted to find a set of common table joins whose queries might be improved with a indexed view.

So I wrote something that gives me a bit of an idea. It’s a query that looks at cached query plans and counts nested loop joins (multiplied by execution count).

USE tempdb;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION
 
;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select 
	cp.usecounts as numberOfJoins,
	seeknodes.query('.') as plansnippet
into #my_joins
from sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_query_plan(cp.plan_handle)
	as qp
cross apply query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//SeekKeys/Prefix[@ScanType="EQ"]') 
	as seeks(seeknodes)
where seeknodes.exist('./RangeColumns/ColumnReference[1]/@Database') = 1
	and seeknodes.exist('./RangeExpressions/ScalarOperator/Identifier/ColumnReference[1]/@Database') = 1;
 
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/showplan' as p1)
select sum(numberOfJoins) as [Number Of Joins],
	myValues.lookupTable + '(' + myValues.lookupColumn + ')' as lookupColumn,
	myValues.expressionTable + '(' + myValues.expressionColumn + ')' as expressionColumn
from #my_joins
cross apply plansnippet.nodes('./p1:Prefix/p1:RangeColumns/p1:ColumnReference[1]')
	as rangeColumns(rangeColumnNodes)
cross apply plansnippet.nodes('./p1:Prefix/p1:RangeExpressions/p1:ScalarOperator/p1:Identifier/p1:ColumnReference[1]')
	as rangeExpressions(rangeExpressionNodes)
cross apply (
	select
		rangeColumnNodes.value('@Database', 'sysname') as lookupDatabase, 
		rangeColumnNodes.value('@Schema', 'sysname') as lookupSchema,
		rangeColumnNodes.value('@Table', 'sysname') as lookupTable,
		rangeColumnNodes.value('@Column', 'sysname') as lookupColumn,
		rangeExpressionNodes.value('@Database', 'sysname') as expressionDatabase, 
		rangeExpressionNodes.value('@Schema', 'sysname') as expressionSchema,
		rangeExpressionNodes.value('@Table', 'sysname') as expressionTable,
		rangeExpressionNodes.value('@Column', 'sysname') as expressionColumn	
	) as myValues
where myValues.expressionTable != myValues.lookupTable
group by myValues.lookupTable, myValues.lookupColumn, myValues.expressionTable, myValues.expressionColumn
order by SUM(numberOfJoins) desc;
 
rollback;

Some caveats:

  • Parsing xml takes a lot of time and a lot of CPU (the subtree cost is huge and execution time is measured in seconds or minutes)
  • It’s only useful on a system that is used a lot (as opposed to a dev database).
  • It only reports statistics about queries that are found in cached plans. So the stats are only relevant since the last server restart
  • It only counts loop joins (not hash or merge joins)
  • If you want, you can adjust the query to include schema and database names in the results

I hope you find it useful. This query gives hints for further investigation into potential indexed views. It worked well for me and so I thought it was worth sharing. I ran this query against a server I know well and I was surprised at some of the results. Good luck.

May 4, 2011

When To Use Blocked Processes Reports

I introduced the SQL Server Blocked Process Report Viewer a couple weeks ago and I realize that I took something for granted. Not everyone has practice watching for Blocked Process Reports and not everyone understands how or when they can use my utility to help troubleshoot concurrency issues.

Steps For Finding Concurrency Problems

Remember: When it comes to concurrency problems, you don’t have to guess what’s wrong!!!

And I explain it flowchart style!

Wait Statistics Who Is Active SQL Trace SQL Server Blocked Process Report Viewer Event Notifications The Future – Tracking Blocking in Denali
  • Checking for LCK_M_XX waits Paul Randal has a great script that interprets the data in the dmv sys.dm_os_wait_stats. I like this script because when all else fails. This script is a great starting point for understanding where the system’s bottlenecks are.
  • Using sp_WhoIsActive Adam Machanic wrote Who Is Active as an tricked out version of sp_who and sp_who2. I recommend it because it is a great view into what’s active on your server right now. And that includes blocked processes and other concurrency issues. (i.e. For any acute problem go there. For chronic concurrency problems, come back here).
  • Using SQL Trace You might know this as Profiler. Capture a trace with the “Blocked Process Report” event which is located in the Error and Warnings event list. But don’t forget! You first have to decide on what it means for your system to have excessive blocking and configure the blocked process threshold accordingly. I’ve learned very recently that peoples’ ideas of excessive blocking vary a lot. In my own environment, I often look for blocking longer than 10 seconds. Other people use a threshold of 10 minutes!
  • Analyzing Traces With Blocked Process Report Viewer This is the tool I wrote that I hope you find useful. Right now it tells you who the lead blocker is. And I hope to expand the features into analysis soon.
  • Configuring Server for Event Notifications I’m really not too familiar with this option and don’t use it much. As an alternative you can also use WMI queries and events mapped to a sql agent job (Thanks Vincent Salard, for that tip).
  • Using Extended Events Once Denali Arrives Jonathan Kehayias knows extended events backwards and forwards. In his blog post here, he describes how in the next version of SQL Server, the blocked process report will be traceable using extended events.

An Ounce of Prevention

In an extremely timely post Kendra Little writes about understanding locks in It’s a Lock. Following much of the same steps, you can understand what your app is doing beforehand and avoid any blocking problems from the start (e.g. understanding locks held during schema changes).

Next Week

  • I’ll be releasing SQL Server Blocked Process Report Viewer. (Right now we’re only in beta and I’m also open to suggestions about a new name for the tool)
  • Understanding blocking is the first step. Next week I’ll talk about what happens after analysis. I’ll write about the steps I’ve taken and had success with. after analysis.

April 20, 2011

A New Way to Examine Blocked Process Reports

Solving concurrency problems are a large part of troubleshooting. Often solutions include tuning the blockers to minimize the blocked time or tweaking locks and isolation levels to make processes play nicely with each other. But to dig into the problem, you have to understand the blocking chain.

If you’re troubleshooting a concurrency problem that’s happening on your server right now then you can get information from the DMVs or even better, by using Adam Machanic’s Who Is Active stored procedure.

But what if the excessive blocking behavior is intermittent? Then the best strategy is to monitor the server and try to capture a SQL trace that includes the “blocked process report” event. I’ve had a lot of luck with that event, it can really tell you a story about excessive blocking. But I find that interpreting the trace can be tricky, especially when there’s a large blocking chain. Sorting through hundreds of events to find the lead blocker is not fun.

New and Free: sp_blocked_process_report_viewer

So I wrote a script! And I stuffed it in a stored procedure! Here’s the syntax (BOL-Style):
Syntax

sp_blocked_process_report_viewer [@Trace = ] 'TraceFileOrTable'
    [ , [ @Type = ] 'TraceType' ]

Arguments
[@Trace = ] ‘TraceFileOrTable’

    Is the name of the trace table or trace file that holds the blocked process reports

[@Trace = ] ‘TraceType’

    Is the type of file referenced by TraceFileOrTable. Values can be TABLE, FILE or XMLFILE. The default is FILE

Download it Now!

Go to the https://github.com/mjswart/sqlblockedprocesses site and download it. Once you’re there, click on the big green download button (as shown to the right) and you’ll have the stored procedure!

Here’s a sample output. It shows clearly who the lead blocker is:

A screenshot showing output for this sproc

Nicely organized, at least better than usual

I’m Promoting This Script to a Project

Although, you still have to know how to read a blocked process report, this utility makes the structure of the blocking chain clear. I find this script useful for my own purposes. In fact I like it enough that I’m going to maintain it on github as:  SQL Server Blocked Process Report Viewer

Let Me Know How It Goes

Run the script! Use it! Tell your friends. Tell me what you think of it (for once in my life, I’m seeking out criticism).

Going forward, I do have some plans for the script. There’s a number of things I eventually want to do with it:

  • Add error handling
  • Really make the sproc perform well (it’s already decent).
  • Develop a test suite (sample traces that exercise the procedure)
  • There’s an opportunity to look up object names, index names and sql text based on object ids if the traces belong to the local server.
  • A SQL Server Management Studio plugin. A treeview control would really be useful here. (This might be easier after Denali comes out)

I plan to do the work, but if you’re really keen and you want to pitch in, you’re welcome to. If you see any errors you can

January 5, 2011

Table Valued Parameters, A Short, Complete Example

After surfing for a while I had trouble finding an example of an application that shows how to use table valued parameters (tvp) from beginning to end using C#, ado.net, and SQL Server 2008. Official docs covering TVPs are found at Table-Valued Parameters in SQL Server 2008 (ADO.NET).

So this is my own TVP example, it consists of a SQL Script, a C# script and a batch file that runs and executes the program.

The DB Setup (a SQL Script)

Run this on your SQL Server 2008 (or later) database that you can test on:

use tempdb;
 
CREATE TYPE BigIntList
    AS TABLE (i bigint)
GO
 
CREATE PROCEDURE ReturnEvenNumbers (@list BigIntList READONLY) AS
SELECT i
FROM @list
WHERE i % 2 = 0
GO

The C# program

Aptly called Program.cs, this is the definition of a program that calls the new procedure with a list of seven numbers and prints the list of numbers that comes back (i.e. the even numbers).
Edit the connection string here and then save this as Program.cs in some directory.

using System.Data.SqlClient;
using System.Data;
 
namespace TVParameterTest {
    class Program {
        static void Main( ) {
            // build table
            DataTable dt = new DataTable();
            dt.Columns.Add( "i", typeof( long ) );
            foreach( long l in new long[] {1,2,3,4,5,6,7} )
                dt.LoadDataRow( new object[] { l }, true );
 
            // build connection and command
            SqlCommand cmd = new SqlCommand(
                "ReturnEvenNumbers",
                new SqlConnection() );
            cmd.Connection = new SqlConnection( @"Data Source=.\sql2k8;Initial Catalog=tempdb;Integrated Security=True" );
            cmd.Connection.Open();
            cmd.CommandType = CommandType.StoredProcedure;
            cmd.Parameters.Add( "list", SqlDbType.Structured).Value = dt;
 
            // execute and output
            SqlDataReader reader = cmd.ExecuteReader();
            while( reader.Read() )
                System.Console.WriteLine( reader[0].ToString());
            cmd.Connection.Close();
 
        }
    }
}

Running it!

First make sure you’ve got Visual Studio on your machine, then you should be able to open a command prompt (or powershell!) and see this:

E:\Temp>csc program.cs
Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.1
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.

E:\Temp>Program.exe
2
4
6
« Newer PostsOlder Posts »

Powered by WordPress