So last week, I explained different ways to generate concurrent activity. I have my own favorites, but you may have your own. So pick your favorite method; whichever method you picked, it will help you call the procedure s_DoSomething super-often.
Now comes the task of defining the procedure s_DoSomething. It can be whatever you like depending on what functionality you want to test or exercise. I want to demonstrate some patterns that I follow when I define that procedure. Those patterns all start with …
the Basic Definition
To test the concurrency of a single procedure just call it:
use tempdb
GO
CREATEPROCEDURE dbo.s_DoSomethingASEXEC Adventureworks2012.dbo.uspGetManagerEmployees14;
use tempdb
GO
CREATE PROCEDURE dbo.s_DoSomething AS
EXEC Adventureworks2012.dbo.uspGetManagerEmployees 14;
Of course, I could have changed the framework to call my procedure directly but I don’t out of habit. I always leave s_DoSomething in tempdb hard-coded in the framework.
With Arbitrary Parameter Values
Often the procedures I want to test are defined with parameters. If variety is important, but the parameter values are not, then that’s when the random tricks come in:
ALTER PROCEDURE dbo.s_DoSomething AS
DECLARE @BusinessEntityId int;
SELECT TOP 1 @BusinessEntityId = BusinessEntityID
FROM AdventureWorks2012.HumanResources.Employee
ORDER BY newid();
EXEC AdventureWorks2012.dbo.uspGetEmployeeManagers @BusinessEntityId;
Calling More Than One Procedure
It’s as simple as calling one after the other. But sometimes I want the frequency of the calls “weighted”.
For example, I want to have a DELETE, INSERT and UPDATE statements called 10% of the time each. The remaining 70% of the time I want to call a SELECT statement. Then I have something like:
ALTERPROCEDURE dbo.s_DoSomethingASdeclare @r int=RAND()*10;
IF(@r =0)-- delete 10% of the timeDELETE AdventureWorks2012.Person.BusinessEntityWHERE BusinessEntityID =CAST(RAND()*1000asINT);
IF(@r =1)-- insert 10% of the timeINSERT AdventureWorks2012.Person.BusinessEntity(rowguid)VALUES(newid());
IF(@r =2)-- update 10% of the timeUPDATE AdventureWorks2012.Person.BusinessEntitySET rowguid = newid()WHERE BusinessEntityID =CAST(RAND()*1000asINT);
IF(@r >2)-- select the rest of the timeSELECT BusinessEntityId, rowguid, ModifiedDate
FROM AdventureWorks2012.Person.BusinessEntityWHERE BusinessEntityID =CAST(RAND()*1000asINT);
ALTER PROCEDURE dbo.s_DoSomething AS
declare @r int = RAND() * 10;
IF (@r = 0)
-- delete 10% of the time
DELETE AdventureWorks2012.Person.BusinessEntity
WHERE BusinessEntityID = CAST(RAND()*1000 as INT);
IF (@r = 1)
-- insert 10% of the time
INSERT AdventureWorks2012.Person.BusinessEntity (rowguid)
VALUES (newid());
IF (@r = 2)
-- update 10% of the time
UPDATE AdventureWorks2012.Person.BusinessEntity
SET rowguid = newid()
WHERE BusinessEntityID = CAST(RAND()*1000 as INT);
IF (@r > 2)
-- select the rest of the time
SELECT BusinessEntityId, rowguid, ModifiedDate
FROM AdventureWorks2012.Person.BusinessEntity
WHERE BusinessEntityID = CAST(RAND()*1000 as INT);
Counting Errors Concurrently
I want to track (server side) how often s_DoSomething fails. But I don’t want tracking to be a concurrency bottleneck itself. Here’s a cool trick for that:
First define these procedures:
createprocedure dbo.s_IFailedas
go
createprocedure dbo.s_IPassedas
go
alterprocedure dbo.s_DoSomethingasbegintrydeclare @i int=rand()*10;
select @i =1/ @i -- might divide by 0!exec dbo.s_IPassed;
endtrybegincatchexec dbo.s_IFailed;
endcatch
create procedure dbo.s_IFailed as
go
create procedure dbo.s_IPassed as
go
alter procedure dbo.s_DoSomething as
begin try
declare @i int = rand() * 10;
select @i = 1 / @i -- might divide by 0!
exec dbo.s_IPassed;
end try
begin catch
exec dbo.s_IFailed;
end catch
This lets me use DMVs to monitor the success rate because I can check the execution count of my dummy procedures. For example,
execsp_recompile'dbo.s_IFailed'; -- reset counts from other tests.execsp_recompile'dbo.s_IPassed';
go
/* run concurrent test here, or...*/setnocounton;
exec dbo.s_DoSomething;
go 10000selectobject_name(object_id), execution_count
from sys.dm_exec_procedure_statswhereobject_name(object_id)in('s_IFailed','s_IPassed')/*
-- count
s_IPassed 9031
s_IFailed 969
*/
exec sp_recompile 'dbo.s_IFailed'; -- reset counts from other tests.
exec sp_recompile 'dbo.s_IPassed';
go
/* run concurrent test here, or...*/
set nocount on;
exec dbo.s_DoSomething;
go 10000
select object_name(object_id), execution_count
from sys.dm_exec_procedure_stats
where object_name(object_id) in ('s_IFailed','s_IPassed')
/*
-- count
s_IPassed 9031
s_IFailed 969
*/
This relies on the DMV sys.dm_exec_procedure_stats which was introduced in 2008. It’s like a cheap do-it-yourself, performance counter.
Next
Next week I want to show a demo. I want to show this technique in action. I’ll be troubleshooting a common concurrency problem, the deadlock.
I’ve discovered that DBAs use different methods to accomplish what amounts to the same thing: Generating concurrent activity on a server. I want to explore a number of methods I’ve seen. For each method, I want to call one particular procedure, many times at once, and often… Oh, and for free.
Why are we doing this?
For Testing: The whole point is to generate concurrent activity on the server. Testing with this kind of activity is one of the only ways to avoid resource contention issues.
For Demos: Concurrency issues are so common and so varied, that it’s not surprising how often we see demos that depend on this kind of generated database activity.
For Troubleshooting: This is my favorite reason. A concurrency issue can’t be fixed well unless it can be reproduced reliably on some test environment. That’s why the methods below have a spot on my troubleshooting toolbelt.
The whole idea is to get many workers active on SQL Server at once.
For each method below, look out for the place where I specify these “parameters”
Number of virtual workers (50 in my example)
Number of iterations (for instance 10,000)
Connection string
Procedure to execute (s_DoSomething in my case)
One other thing to watch for is the overhead that this tool puts on the machine. Ideally, the method is suitable to run this from the same test environment that the SQL Server sits on. So I want my method to be fairly lightweight. This means that it’s best to handle iterations on SQL Server which cuts down on overhead associated with opening connections. So in most cases, instead of
exec dbo.s_DoSomething
exec dbo.s_DoSomething
I have
declare @i int=0; while(@i <10000)beginexec dbo.s_DoSomething; set @i+=1; end
declare @i int = 0; while (@i < 10000) begin exec dbo.s_DoSomething; set @i+= 1; end
Notepad + DOS Method
I adapted this method from a clever trick I saw once. It was Paul Randal giving a demo on tempdb contention. You can find a video of that demo by visiting this newsletter.
It’s a simple idea. You have two batch files, the first is called Run.bat:
echo off
sqlcmd -S MYSERVER\MYINSTANCE -E -Q "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set@i+= 1; end" >NULexit
echo off
sqlcmd -S MYSERVER\MYINSTANCE -E -Q "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set @i+= 1; end" > NUL
exit
foreach($worker in 1..50) `
{ `
Start-Job `
-ScriptBlock `
{ `
Import-Module sqlps; `
Invoke-Sqlcmd `
-Query "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set @i+= 1; end" `
-ServerInstance "MYSERVER\MYINSTANCE" `
| Out-Null `
} `
}
Except that I don’t think I got it quite right. Whatever I changed makes this method unsuitable.
The script schedules a job 50 times, but it takes about a minute just to schedule them all. Once they’re scheduled, the jobs take time to start and not enough of them work in parallel to generate the needed concurrent activity so I give this method a “skip”. If you’re really comfortable with powershell, maybe you can get this to work faster. If you can, let me know.
C#
This is my preferred method. It started out as the program I used to test upsert concurrency at https://michaeljswart.com/go/upsert but a friend at work showed me that .net 4.5 has some nifty new asynchronous methods that make it look nicer, perform faster and weigh lighter.
using System;
using System.Linq;
using System.Data.SqlClient;
using System.Threading.Tasks;
namespace AConsoleApplication {
class Program {
static void Main( string[] args ) {
var cs = new SqlConnectionStringBuilder();
cs.DataSource = @"MYSERVER\MYINSTANCE";
cs.InitialCatalog = "tempdb";
cs.IntegratedSecurity = true;
cs.AsynchronousProcessing = true;
string connectionString = cs.ToString();
Task[] tasks = Enumerable.Range( 0, 50 )
.Select( i => RunCommand( connectionString ) )
.ToArray();
Task.WaitAll( tasks );
}
private static async Task RunCommand(
string connectionString
) {
string sql = @"s_DoSomething";
SqlConnection conn = new SqlConnection( connectionString );
conn.Open();
SqlCommand cmd = new SqlCommand( sql, conn );
cmd.CommandType = System.Data.CommandType.StoredProcedure;
for( int i = 0; i < 10000; i++ ) {
try {
await cmd.ExecuteNonQueryAsync();
} catch( Exception ex ) {
Console.WriteLine( ex.Message );
}
}
}
}
}
HammerDB
HammerDB (originally HammerOra) is a free utility that allows users to run benchmarks against various environments. Although it was originally built for running benchmarks on Oracle, the utility now works on Windows and for SQL Server (hence the name change). I was first introduced to the utility via Kendra Little (again):
Follow these links to learn how to use the tool for its typical function, running benchmarks. Then, once you know how to do that, it’s a quick step to repurpose the tool for your own concurrent activity. For example, replace HammerDB’s generated script with this one:
#!/usr/local/bin/tclsh8.6package require tclodbc 2.5.1
database connect odbc "DRIVER=SQL Server Native Client 11.0;SERVER=MYSERVER\\MYINSTANCE;PORT=1433;TRUSTED_CONNECTION=YES";
odbc "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set @i+= 1; end"
odbc disconnect
#!/usr/local/bin/tclsh8.6
package require tclodbc 2.5.1
database connect odbc "DRIVER=SQL Server Native Client 11.0;SERVER=MYSERVER\\MYINSTANCE;PORT=1433;TRUSTED_CONNECTION=YES";
odbc "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set @i+= 1; end"
odbc disconnect
It’s steady and lightweight and works really well.
SQLQueryStress
SQL Query Stress is a tool written by Adam Machanic that you can download and install for free.
It’s fairly idiot proof and avoids a lot of the complexity of the other methods. For my own purposes, I want the utility to be lightweight. So I …
remember to turn off “Collect I/O Statistics”
remember to turn off “Collect Time Statistics”
Set “Number of iterations” to one.
Iterate in the query window, i.e. modify the query to call s_DoSomething 10,000 times.
My Ranking
Concurrent Activity Method
Stars
Notes
C# console app
★★★★½
It performs fastest and uses the least resources. Also, because I’m biased, it gets a small bump by not suffering from NIH issues.
HammerDB
★★★
Very fast (2nd fastest), but the interface is clumsy and modifications need tclsh experience <yuck> It’s best to use this for its intended purpose, as a benchmark tool.
Notepad and DOS
★★★
Quick and dirty and really fast. However, it’s still pretty clumsy. Interrupting a test part way through is difficult.
Powershell
★½
Yuck. I couldn’t get two runs that looked the same and it was near impossible to sort powershell cpu and memory pressure from SQL Server pressure.
SQLQueryStress
★★★★
It does the job well. It was a little difficult to interrupt a test. It also takes care to make it a lightweight framework.
Next Week
I’ll show some typical ways I define s_DoSomething.
Takeaway: Keep your database queries simple. Simple for you, simple for others and simple for SQL Server.
This isn’t a controversial topic. It’s widely accepted that simplicity is a virtue. That’s the message, I hope to give some methods and motivation.
Write Simple SQL For Yourself
Revisit old code from time to time and write code that you won’t mind revisiting.
All SQL Developers are translators. We translate English descriptions of what we want into SQL. Simpler queries mean less mental energy is required when “translating” these requirements into SQL. It also goes the other way. If I were write some SQL and then revisit it a year later, then I have to translate backwards in order to understand the query’s “intent”.
What? you never revisit old code? Whether it’s code or blog articles, when I look back at what I wrote two short years ago, I’m often not happy with what I read. I sometimes cringe a bit at what I’ve written and will often update old blog posts that need it. SQL is no exception. But reviewing old writing is a useful exercise. Joan Didion, a favorite author of mine, points out “We are well advised to keep on nodding terms with the people we used to be.”
Write Simple SQL For Others
Try organizing complicated queries using CTEs. It helps others understand them.
Simpler SQL doesn’t need to be shorter SQL. Jeremiah Peschka pointed me at a blog post by Selena Deckelmann How I Write Queries Using PLSQL: CTEs
Selena describes (better than I can) how useful CTEs can be when developing SQL. The CTEs provide a way to construct “a set of understandable ‘paragraphs’ of SQL” which can be used to “explain the logic of the query with others”.
Now that is how you do self-documenting code with SQL. When written with CTEs, your SQL will be much clearer than when you use subqueries. But don’t take my word for it. Check out a brilliant example in Brad Schulz’s “Playing Poker With SQL“. In the blog post he develops a single query which reports the results of a 10 hand poker game. Here’s an abbreviated version of his final query:
with DeckOfCards as(. . .)
,ShuffleAndDeal as(. . .)
,HandEvaluation1 as(. . .)
. . .
,HandEvaluation7 as(. . .)select PlayerID
,Hand=Card1+' '+Card2+' '+Card3+' '+Card4+' '+Card5
,HandDescript
from HandEvaluation7
pivot (max(CardName)for CardSeqName in(Card1,Card2,Card3,Card4,Card5)) P
orderby PlayerRanking
/*
PlayerID Hand HandDescript
-------- --------------- ---------------
3 2♠ 7♠ 8♠ 9♠ 10♠ Flush
1 5♥ 5♣ 5♠ 6♣ K♣ Three of a Kind
5 2♥ 2♦ 6♥ K♠ K♦ Two Pair
4 6♠ 10♦ Q♠ Q♥ K♥ Two of a Kind
6 4♦ 7♣ 7♦ 8♥ J♦ Two of a Kind
2 2♣ 3♣ 3♠ 9♥ J♣ Two of a Kind
9 5♦ 9♦ J♥ Q♦ A♦ Nothing
7 3♥ 4♣ 6♦ 10♥ A♠ Nothing
8 3♦ 4♥ 8♣ 9♣ A♣ Nothing
10 4♠ 8♦ 10♣ J♠ Q♣ Nothing
*/
with DeckOfCards as
(. . .)
,ShuffleAndDeal as
(. . .)
,HandEvaluation1 as
(. . .)
. . .
,HandEvaluation7 as
(. . .)
select PlayerID
,Hand=Card1+' '+Card2+' '+Card3+' '+Card4+' '+Card5
,HandDescript
from HandEvaluation7
pivot (max(CardName) for CardSeqName in (Card1,Card2,Card3,Card4,Card5)) P
order by PlayerRanking
/*
PlayerID Hand HandDescript
-------- --------------- ---------------
3 2♠ 7♠ 8♠ 9♠ 10♠ Flush
1 5♥ 5♣ 5♠ 6♣ K♣ Three of a Kind
5 2♥ 2♦ 6♥ K♠ K♦ Two Pair
4 6♠ 10♦ Q♠ Q♥ K♥ Two of a Kind
6 4♦ 7♣ 7♦ 8♥ J♦ Two of a Kind
2 2♣ 3♣ 3♠ 9♥ J♣ Two of a Kind
9 5♦ 9♦ J♥ Q♦ A♦ Nothing
7 3♥ 4♣ 6♦ 10♥ A♠ Nothing
8 3♦ 4♥ 8♣ 9♣ A♣ Nothing
10 4♠ 8♦ 10♣ J♠ Q♣ Nothing
*/
In his post, Brad develops his query incrementally using CTEs. It accomplishes something very complicated, but it seems simple. I use it as a model for how to organize complex queries (when I must).
Write Simple SQL For SQL Server
Shorter SQL performs better. Consider breaking larger queries up into smaller chunks.
Why?
Fewer joins means the optimizer has fewer query plans to evaluate. And that means superior (or even optimal) plans.
Larger query trees can mean less effective cardinality estimates. With less effective estimates, inferior plans can be chosen resulting in poor performance behaviors like excessive CPU, excessive IO or tempdb spills.
Read his tips. They’re very well explained. After his explanation, he mentions a strategy to deal with complex queries. He writes “My usual approach is to break the query into manageable parts, storing reasonably-sized intermediate results in #temporary tables.” I want to show an example demonstrating how something like that could work.
I use a query I made up against Adventureworks2012 which, for blogging purposes, we call complicated:
USE Adventureworks2012
SELECT
p.BusinessEntityID
,p.FirstName
,p.LastName
,e.JobTitle
,pp.PhoneNumber
,pnt.NameAS PhoneNumberType
,a.AddressLine1
,a.AddressLine2
,a.City
,sp.NameAS StateProvinceName
,a.PostalCode
,cr.NameAS CountryRegionName
,cust.Purchases
,sale.Salesas SalesCount
FROM Person.Person p
LEFTJOIN HumanResources.Employee e
ON p.BusinessEntityID= e.BusinessEntityIDLEFTJOIN Person.BusinessEntityAddress bea
JOIN Person.[Address] a
ON a.AddressID= bea.AddressIDJOIN Person.StateProvince sp
ON sp.StateProvinceID= a.StateProvinceIDJOIN Person.CountryRegion cr
ON cr.CountryRegionCode= sp.CountryRegionCodeON bea.BusinessEntityID= p.BusinessEntityIDLEFTJOIN Person.PersonPhone pp
JOIN Person.PhoneNumberType pnt
ON pp.PhoneNumberTypeID= pnt.PhoneNumberTypeIDON pp.BusinessEntityID= p.BusinessEntityIDLEFTJOIN(SELECTCOUNT(1), c.PersonIDFROM Sales.SalesOrderHeader soh
JOIN Sales.Customer c
ON c.CustomerID= soh.CustomerIDGROUPBY c.PersonID)as cust(Purchases, PersonID)ON p.BusinessEntityID= cust.PersonIDLEFTJOIN(SELECTCOUNT(1), SalesPersonID
FROM Sales.SalesOrderHeaderGROUPBY SalesPersonID
)as sale(Sales, PersonId)ON p.BusinessEntityID= sale.PersonIdWHERE p.FirstName='Michael'
USE Adventureworks2012
SELECT
p.BusinessEntityID
,p.FirstName
,p.LastName
,e.JobTitle
,pp.PhoneNumber
,pnt.Name AS PhoneNumberType
,a.AddressLine1
,a.AddressLine2
,a.City
,sp.Name AS StateProvinceName
,a.PostalCode
,cr.Name AS CountryRegionName
,cust.Purchases
,sale.Sales as SalesCount
FROM Person.Person p
LEFT JOIN HumanResources.Employee e
ON p.BusinessEntityID = e.BusinessEntityID
LEFT JOIN Person.BusinessEntityAddress bea
JOIN Person.[Address] a
ON a.AddressID = bea.AddressID
JOIN Person.StateProvince sp
ON sp.StateProvinceID = a.StateProvinceID
JOIN Person.CountryRegion cr
ON cr.CountryRegionCode = sp.CountryRegionCode
ON bea.BusinessEntityID = p.BusinessEntityID
LEFT JOIN Person.PersonPhone pp
JOIN Person.PhoneNumberType pnt
ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID
ON pp.BusinessEntityID = p.BusinessEntityID
LEFT JOIN
(
SELECT COUNT(1), c.PersonID
FROM Sales.SalesOrderHeader soh
JOIN Sales.Customer c
ON c.CustomerID = soh.CustomerID
GROUP BY c.PersonID
) as cust(Purchases, PersonID)
ON p.BusinessEntityID = cust.PersonID
LEFT JOIN
(
SELECT COUNT(1), SalesPersonID
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID
) as sale(Sales, PersonId)
ON p.BusinessEntityID = sale.PersonId
WHERE p.FirstName = 'Michael'
Most people’s intuition is that a single query is preferable. But just like Paul White, I have found that performance can sometimes be improved when the work is split into many queries. Here’s an example of what that might look like:
use AdventureWorks2012
CREATETABLE #Results (
BusinessEntityID int,
FirstName nvarchar(50),
LastName nvarchar(50),
JobTitle nvarchar(50),
PhoneNumber nvarchar(25),
PhoneNumberType nvarchar(50),
AddressLine1 nvarchar(60),
AddressLine2 nvarchar(60),
City nvarchar(30),
StateProvinceName nvarchar(50),
PostalCode nvarchar(15),
CountryRegionName nvarchar(50),
Purchases int,
SalesCount int);
INSERT #Results (BusinessEntityID, FirstName, LastName)SELECT BusinessEntityID, FirstName, LastName
FROM Person.PersonWHERE FirstName ='Michael';
UPDATE #Results
SET JobTitle = e.JobTitleFROM #Results r
JOIN HumanResources.Employee e
on r.BusinessEntityID= e.BusinessEntityID;
UPDATE #Results
SET AddressLine1 = a.AddressLine1,
AddressLine2 = a.AddressLine2,
City = a.City,
StateProvinceName = sp.Name,
PostalCode = a.PostalCode,
CountryRegionName = cr.NameFROM #Results r
JOIN Person.BusinessEntityAddress bea
ON bea.BusinessEntityID= r.BusinessEntityIDJOIN Person.[Address] a
ON a.AddressID= bea.AddressIDJOIN Person.StateProvince sp
ON sp.StateProvinceID= a.StateProvinceIDJOIN Person.CountryRegion cr
ON cr.CountryRegionCode= sp.CountryRegionCode;
UPDATE #Results
SET PhoneNumber = pp.PhoneNumber,
PhoneNumberType = pnt.NameFROM #Results r
JOIN Person.PersonPhone pp
ON pp.BusinessEntityID= r.BusinessEntityIDJOIN Person.PhoneNumberType pnt
ON pp.PhoneNumberTypeID= pnt.PhoneNumberTypeID;
WITH cust (Purchases, PersonID)AS(SELECTCOUNT(1), c.PersonIDFROM Sales.SalesOrderHeader soh
JOIN Sales.Customer c
ON c.CustomerID= soh.CustomerIDGROUPBY c.PersonID)UPDATE #Results
SET Purchases=cust.PurchasesFROM #Results r
JOIN cust
on cust.PersonID= r.BusinessEntityID;
WITH sale (SalesCount, PersonId)AS(SELECTCOUNT(1), soh.SalesPersonIDFROM Sales.SalesOrderHeader soh
GROUPBY soh.SalesPersonID)UPDATE #Results
SET SalesCount=sale.SalesCountFROM #Results r
JOIN sale
ON sale.PersonId= r.BusinessEntityID;
SELECT*FROM #Results;
use AdventureWorks2012
CREATE TABLE #Results (
BusinessEntityID int,
FirstName nvarchar(50),
LastName nvarchar(50),
JobTitle nvarchar(50),
PhoneNumber nvarchar(25),
PhoneNumberType nvarchar(50),
AddressLine1 nvarchar(60),
AddressLine2 nvarchar(60),
City nvarchar(30),
StateProvinceName nvarchar(50),
PostalCode nvarchar(15),
CountryRegionName nvarchar(50),
Purchases int,
SalesCount int
);
INSERT #Results (BusinessEntityID, FirstName, LastName)
SELECT BusinessEntityID, FirstName, LastName
FROM Person.Person
WHERE FirstName = 'Michael';
UPDATE #Results
SET JobTitle = e.JobTitle
FROM #Results r
JOIN HumanResources.Employee e
on r.BusinessEntityID = e.BusinessEntityID;
UPDATE #Results
SET AddressLine1 = a.AddressLine1,
AddressLine2 = a.AddressLine2,
City = a.City,
StateProvinceName = sp.Name,
PostalCode = a.PostalCode,
CountryRegionName = cr.Name
FROM #Results r
JOIN Person.BusinessEntityAddress bea
ON bea.BusinessEntityID = r.BusinessEntityID
JOIN Person.[Address] a
ON a.AddressID = bea.AddressID
JOIN Person.StateProvince sp
ON sp.StateProvinceID = a.StateProvinceID
JOIN Person.CountryRegion cr
ON cr.CountryRegionCode = sp.CountryRegionCode;
UPDATE #Results
SET PhoneNumber = pp.PhoneNumber,
PhoneNumberType = pnt.Name
FROM #Results r
JOIN Person.PersonPhone pp
ON pp.BusinessEntityID = r.BusinessEntityID
JOIN Person.PhoneNumberType pnt
ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID;
WITH cust (Purchases, PersonID) AS
(
SELECT COUNT(1), c.PersonID
FROM Sales.SalesOrderHeader soh
JOIN Sales.Customer c
ON c.CustomerID = soh.CustomerID
GROUP BY c.PersonID
)
UPDATE #Results
SET Purchases=cust.Purchases
FROM #Results r
JOIN cust
on cust.PersonID = r.BusinessEntityID;
WITH sale (SalesCount, PersonId) AS
(
SELECT COUNT(1), soh.SalesPersonID
FROM Sales.SalesOrderHeader soh
GROUP BY soh.SalesPersonID
)
UPDATE #Results
SET SalesCount=sale.SalesCount
FROM #Results r
JOIN sale
ON sale.PersonId = r.BusinessEntityID;
SELECT * FROM #Results;
When is this technique appropriate?
I like to use this performance technique before I consider query hints, (but after other simpler improvements like indexing). Even so, this technique is not always appropriate. I’ve seen it work best on complicated queries (How do you know when they’re complicated?). And I’ve seen this work best against large datasets (processing millions of rows for example). Complicated queries have a higher risk of generating poor query plans. But breaking these huge queries into smaller parts addresses this problem.
In my example, I’ve split the original query into seven. That was just for illustration. Maybe better is splitting your monster query into only three queries. Always test.
Empirical evidence tells me that simpler SQL performs better. I’ve split up complicated queries and often they become much easier to maintain but almost as often I’ve never needed to!
Really?
I’ve added a few comments below that qualify some of the things I’ve said here.