Michael J. Swart

June 25, 2008

Splitting and Joining strings in SQL Server

Filed under: Technical Articles — Michael J. Swart @ 7:00 pm

Wouldn’t it be nice to do something like this:

--split
SELECT t.id, split.value
FROM someTable t
CROSS APPLY dbo.Split(t.csv, ',') as split

or this:

--join
SELECT ManagerID, JOIN(EmployeeID + ' ')
FROM HumanResources.Employee
GROUP BY ManagerID

Well the first example can be implemented as a table-valued function and it’s possible if you’re careful. Here’s my implementation of Split:

CREATE FUNCTION [dbo].[Split]
(
	@SearchString NVARCHAR (max),
	@Separator NVARCHAR (5)
)
 
RETURNS @strtable TABLE (strval NVARCHAR (max))
AS
 
BEGIN
	DECLARE @SeparatorLength INT
	SET @SeparatorLength = DATALENGTH(@Separator) / 2
 
	IF (@SeparatorLength = 0)
		RETURN
 
	WHILE 1=1
	BEGIN
		IF (CHARINDEX(@Separator,@SearchString) = 0)
		BEGIN
			INSERT INTO @strtable
			VALUES (@SearchString);
			BREAK;
		END
 
		INSERT INTO @strtable
		VALUES ( LEFT(@SearchString, CHARINDEX(@Separator,@SearchString) - 1))
 
		SET @SearchString = RIGHT(@SearchString, DATALENGTH(@SearchString)/2 - CHARINDEX(@Separator,@SearchString) - @SeparatorLength + 1)
	END
	RETURN
END

For the second example, the join, I don’t think there there’s a simple way, But there are a couple options. There’s an example of creating the CONCATENATE aggregate with CLR code here. But it’s pretty clunky to enable clr and integrate an assembly. But here’s another alternative if you’re willing to resort to a kludge is to use select a column called data() and use the for xml path syntax:

SELECT
	M.ManagerID,
	(
		SELECT CAST(E.EmployeeID AS VARCHAR(10))
			+ ',' AS [data()]
		FROM HumanResources.Employee E
		WHERE E.ManagerID = M.ManagerID
		FOR XML PATH('')
	) AS EmployeeID
FROM
	(
		SELECT DISTINCT ManagerID
		FROM HumanResources.Employee
	) AS M

Your mileage may vary. I give no guarantees about the performance. It breaks a rule of thumb of mine where I don’t like to include subqueries in the SELECT clause of SELECT Statements.

Update! Jul 4, 2008: Shown up by a pro
In SQL Server Magazine’s “Performance Tuning and Optimization” forum,Andrew J Kelly has a much nicer split function than I do. For one thing it’s set based:
(Caveat: because of the recursion limit of 100, this function doesn’t work on lists with more than 100 elements)

CREATE FUNCTION [dbo].[fn_split_inline_cte]
    (@list NVARCHAR (MAX),
    @delimiter NCHAR(1) = ',')
RETURNS TABLE
AS
 
RETURN
WITH cte_list([BeginChar], [EndChar]) AS (
    SELECT [BeginChar] = CONVERT(BIGINT, 1), [EndChar] = CHARINDEX(@delimiter, @list + @delimiter)
    UNION ALL
    SELECT [BeginChar] = [EndChar] + 1, [EndChar] = CHARINDEX(@delimiter, @list + @delimiter, [EndChar] + 1)
    FROM cte_list
    WHERE [EndChar] > 0
)
SELECT LTRIM(RTRIM(SUBSTRING(@list, [BeginChar],
    CASE WHEN [EndChar] > 0 THEN [EndChar] - [BeginChar] ELSE 0 END))) AS [ParsedValue]
FROM cte_list
WHERE [EndChar] > 0 ;

June 19, 2008

T-SQL Intellisense for SQL 2005, There’s no going back.

Filed under: Miscelleaneous SQL — Tags: , — Michael J. Swart @ 5:28 am

I have had SQL Server 2008 Installed (Feb2008 CTP) for a few months now and the intellisense that comes with Management Studio is really really useful. So much so that when I find myself using Management Studio on some other machine, I find that I miss it immediately. For me there’s no going back. It’s one of the best usability features that is included in 2008. I used it as much as I could with any SQL Server connection I made.

Unfortunately, Intellisense is being dropped for connections to SQL Server 2005 in RTM and RC0. I was warned here. Also see here.

Apparently the reasoning is that while the #1 goal of Intellisense is to allow people to script more efficiently, On 2005, any incorrect suggestions will cause developers to spend more time figuring out what went wrong. I totally disagree. Intellisense has been so useful that I can’t see how any potential wrong-version intellisense-confusion could possibly outweigh the benefits. I don’t believe them when they say that intellisense on 2005 would negatively impact development.

They also complain that supporting multiple versions of SQL Server would multiply time and cost to develop. As an end user, I don’t care. Redgate has intellisense. I don’t see why MS can’t. MS should also consider this a great opportunity to improve their SQL Server management tools.

Well yesterday I installed RC0 and sure enough 2005 Intellisense support has been dropped. I missed it immediately. In fact, I have some colleagues who plan to stay on the Feb CTP until their license expires.

If only there were something we could do.
There is! And it’s (slightly) better than a petition. Vote at Microsoft’s Connect site on this issue:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=341872

June 11, 2008

Identifying High CPU SQL Processes

Filed under: Technical Articles — Tags: , , , , , , — Michael J. Swart @ 12:09 pm

UPDATE February 13, 2012: As of this update, this post is about four years old. And a lot of things have changed since then. For one, the video is no longer available (as many have discovered). But if you’re trying to identify a query or queries that are consuming the most CPU, I strongly recommend using the sp_whoisactive stored procedure. It’s the method I use now rather than using process explorer What follows is the original 2008 post:

Recently when trying to identify sql processes consuming the most CPU, I came across this most excellent paper written by Jeff Stevenson. It’s called Identifying High CPU SQL Processes.

It describes how to identify which sql processes that are consuming the most cpu cycles. The good part is that he describes how to do it using only tools that are available via SQL Server or the Operating System (i.e. Profiler, Performance Monitor, Management Studio).

It was very useful but the most tedious part of the process was finding the SQL Server process id. The process required a scavenger hunt where it was necessary to find the instance, then thread instance id, then kernel process id, then SQL process id. This required two performance monitor sessions with very awkward settings.

I believe I have an improvement that allows us to skip a number of steps. It requires the use of a free Microsoft tool, but the drawback is that the tool is not available either with the operating system or SQL Server. It’s a free download and the installation is easy, but if, as a dba, you get a call to troubleshoot a problem on a remote server, and the remote server doesn’t have this tool, then you won’t be able to use this tool.

The tool I’m talking about is Process Explorer (made by the guys once known as SysInternals). It’s a free download here. I highly recommend it. At least install it on your development box.

I’ve attached a video (no audio) that gives you an idea of what’s involved when identifying the kpid:

The video shows how to use Process Explorer to find the SQL instance and the thread of the offending sql process. In this particular case, I sort the processes by cpu and I see the largest CPU consumer is sqlservr.exe (If it’s not then this post isn’t for you). I select the SQL Server process and click on the properties icon to launch the properties window. On the image tab I learn that the database instance that is consuming cpu is called “SQL2005”. On the threads tab I learn that the thread id is 3960. So 3960 is what SQL Server refers to as the kernel process id (kpid).

You now have the SQL Server instance and the kpid of the process. After that, it’s time to find out what this process is doing. You can do this by following Jeff Stevenson’s method that I linked to at the top of this post. It’s a matter of using a set of sql queries similar to the following:

select * from master..sysprocesses where kpid = 3960
-- turns out to have spid = 55

dbcc inputbuffer(55)
-- shows what 55 is doing:

-- and if absolutely necessary:
kill 55 -- be careful with this :-)

There’s one last thing I want to mention. You may find that all the connections are taken. It’s an very likely scenario if SQL Server is misbehaving. In this case you can log in using the Dedicated Administrator’s Connection. Definitely practice this before any crisis so that you don’t have to look up any syntax at the time. Essentially, you want to log onto ADMIN:SERVER\INSTANCE instead of SERVER\INSTANCE.

Update !!!
In an extremely timely post, John Paul Cook describes how to get around the drawback of not having sysinternals installed on the remote machine. If the machine can access the internet, then you can simply run the following:

\\live.sysinternals.com\tools\procexp.exe

June 3, 2008

Keep your constraints trusted!

Filed under: Technical Articles — Michael J. Swart @ 6:21 am

It’s always a good idea to keep your constraints trusted. You can check to see which constraints in your database are not trusted with these commands:

select fk.name from sys.foreign_keys fk where fk.is_not_trusted = 1
select cc.name from sys.check_constraints cc where cc.is_not_trusted = 1

Constraints can become untrusted after a bulk load operation is performed without the CHECK CONSTRAINTS option. Foreign keys can be untrusted if they were defined with the NOT FOR REPLICATION option.

The obvious reason for keeping constraints trusted is that it means SQL Server is enforcing the data integrity defined by these constraints.

But there’s also a more subtle performance gain that I want to demonstrate below. If SQL Server can trust constraints, then sometimes it can decide it doesn’t need to do a join. Such queries are rare, but I’ve seen them pop up.

The following example works against the MS’s Adventureworks sample database. The example below uses two tables, Sales.Customers and Sales.SalesOrderHeader. There is a foreign key making sure that every SalesOrderHeader record references one Customer record.

use AdventureWorks
GO

-- the inner join in the following query has no effect here and sql server 
-- eliminates the join from the query plan. It's because the foreign 
-- key relationship that SQL Server can trust and simplify the query plan.
-- It only has to count orders.
select COUNT(1)
from Sales.Customer c
join Sales.SalesOrderHeader soh
on soh.CustomerID = c.CustomerID
GO

-- Drop the foreign key and sql server can no longer simplify.
-- It can't guarantee that every order has a customer.
-- Again it has to join with customers.
ALTER TABLE [Sales].[SalesOrderHeader] DROP CONSTRAINT [FK_SalesOrderHeader_Customer_CustomerID];
select COUNT(1)
from Sales.Customer c
join Sales.SalesOrderHeader soh
on soh.CustomerID = c.CustomerID;
GO

-- Create the key and try again.
ALTER TABLE [Sales].[SalesOrderHeader]  
ADD CONSTRAINT [FK_SalesOrderHeader_Customer_CustomerID] 
    FOREIGN KEY([CustomerID])
    REFERENCES [Sales].[Customer] ([CustomerID])
    NOT FOR REPLICATION;
select COUNT(1)
from Sales.Customer c
join Sales.SalesOrderHeader soh
on soh.CustomerID = c.CustomerID;
-- hmmm the foreign key is back in force, but we're still joining with customers.
-- It's because the foreign key is not trusted (related to the not for replication clause)
select fk.is_not_trusted from sys.foreign_keys fk where fk.name = 'FK_SalesOrderHeader_Customer_CustomerID'
GO

-- let's try again.
ALTER TABLE [Sales].[SalesOrderHeader] DROP CONSTRAINT [FK_SalesOrderHeader_Customer_CustomerID];
ALTER TABLE [Sales].[SalesOrderHeader]  
ADD CONSTRAINT [FK_SalesOrderHeader_Customer_CustomerID] 
    FOREIGN KEY([CustomerID])
    REFERENCES [Sales].[Customer] ([CustomerID]);
select COUNT(1)
from Sales.Customer c
join Sales.SalesOrderHeader soh
on soh.CustomerID = c.CustomerID;
GO
-- much better

Powered by WordPress