Takeaway: Keep your database queries simple. Simple for you, simple for others and simple for SQL Server.
This isn’t a controversial topic. It’s widely accepted that simplicity is a virtue. That’s the message, I hope to give some methods and motivation.
Write Simple SQL For Yourself
Revisit old code from time to time and write code that you won’t mind revisiting.
All SQL Developers are translators. We translate English descriptions of what we want into SQL. Simpler queries mean less mental energy is required when “translating” these requirements into SQL. It also goes the other way. If I were write some SQL and then revisit it a year later, then I have to translate backwards in order to understand the query’s “intent”.
What? you never revisit old code? Whether it’s code or blog articles, when I look back at what I wrote two short years ago, I’m often not happy with what I read. I sometimes cringe a bit at what I’ve written and will often update old blog posts that need it. SQL is no exception. But reviewing old writing is a useful exercise. Joan Didion, a favorite author of mine, points out “We are well advised to keep on nodding terms with the people we used to be.”
Write Simple SQL For Others
Try organizing complicated queries using CTEs. It helps others understand them.
Simpler SQL doesn’t need to be shorter SQL. Jeremiah Peschka pointed me at a blog post by Selena Deckelmann How I Write Queries Using PLSQL: CTEs
Selena describes (better than I can) how useful CTEs can be when developing SQL. The CTEs provide a way to construct “a set of understandable ‘paragraphs’ of SQL” which can be used to “explain the logic of the query with others”.
Now that is how you do self-documenting code with SQL. When written with CTEs, your SQL will be much clearer than when you use subqueries. But don’t take my word for it. Check out a brilliant example in Brad Schulz’s “Playing Poker With SQL“. In the blog post he develops a single query which reports the results of a 10 hand poker game. Here’s an abbreviated version of his final query:
with DeckOfCards as (. . .) ,ShuffleAndDeal as (. . .) ,HandEvaluation1 as (. . .) . . . ,HandEvaluation7 as (. . .) select PlayerID ,Hand=Card1+' '+Card2+' '+Card3+' '+Card4+' '+Card5 ,HandDescript from HandEvaluation7 pivot (max(CardName) for CardSeqName in (Card1,Card2,Card3,Card4,Card5)) P order by PlayerRanking /* PlayerID Hand HandDescript -------- --------------- --------------- 3 2♠ 7♠ 8♠ 9♠ 10♠ Flush 1 5♥ 5♣ 5♠ 6♣ K♣ Three of a Kind 5 2♥ 2♦ 6♥ K♠ K♦ Two Pair 4 6♠ 10♦ Q♠ Q♥ K♥ Two of a Kind 6 4♦ 7♣ 7♦ 8♥ J♦ Two of a Kind 2 2♣ 3♣ 3♠ 9♥ J♣ Two of a Kind 9 5♦ 9♦ J♥ Q♦ A♦ Nothing 7 3♥ 4♣ 6♦ 10♥ A♠ Nothing 8 3♦ 4♥ 8♣ 9♣ A♣ Nothing 10 4♠ 8♦ 10♣ J♠ Q♣ Nothing */
In his post, Brad develops his query incrementally using CTEs. It accomplishes something very complicated, but it seems simple. I use it as a model for how to organize complex queries (when I must).
Write Simple SQL For SQL Server
Shorter SQL performs better. Consider breaking larger queries up into smaller chunks.
- Fewer joins means the optimizer has fewer query plans to evaluate. And that means superior (or even optimal) plans.
- Larger query trees can mean less effective cardinality estimates. With less effective estimates, inferior plans can be chosen resulting in poor performance behaviors like excessive CPU, excessive IO or tempdb spills.
A better and more in depth explanation by Paul White here in an answer to a SqlPerformance.com question.
Read his tips. They’re very well explained. After his explanation, he mentions a strategy to deal with complex queries. He writes “My usual approach is to break the query into manageable parts, storing reasonably-sized intermediate results in #temporary tables.” I want to show an example demonstrating how something like that could work.
I use a query I made up against Adventureworks2012 which, for blogging purposes, we call complicated:
USE Adventureworks2012 SELECT p.BusinessEntityID ,p.FirstName ,p.LastName ,e.JobTitle ,pp.PhoneNumber ,pnt.Name AS PhoneNumberType ,a.AddressLine1 ,a.AddressLine2 ,a.City ,sp.Name AS StateProvinceName ,a.PostalCode ,cr.Name AS CountryRegionName ,cust.Purchases ,sale.Sales as SalesCount FROM Person.Person p LEFT JOIN HumanResources.Employee e ON p.BusinessEntityID = e.BusinessEntityID LEFT JOIN Person.BusinessEntityAddress bea JOIN Person.[Address] a ON a.AddressID = bea.AddressID JOIN Person.StateProvince sp ON sp.StateProvinceID = a.StateProvinceID JOIN Person.CountryRegion cr ON cr.CountryRegionCode = sp.CountryRegionCode ON bea.BusinessEntityID = p.BusinessEntityID LEFT JOIN Person.PersonPhone pp JOIN Person.PhoneNumberType pnt ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID ON pp.BusinessEntityID = p.BusinessEntityID LEFT JOIN ( SELECT COUNT(1), c.PersonID FROM Sales.SalesOrderHeader soh JOIN Sales.Customer c ON c.CustomerID = soh.CustomerID GROUP BY c.PersonID ) as cust(Purchases, PersonID) ON p.BusinessEntityID = cust.PersonID LEFT JOIN ( SELECT COUNT(1), SalesPersonID FROM Sales.SalesOrderHeader GROUP BY SalesPersonID ) as sale(Sales, PersonId) ON p.BusinessEntityID = sale.PersonId WHERE p.FirstName = 'Michael'
Most people’s intuition is that a single query is preferable. But just like Paul White, I have found that performance can sometimes be improved when the work is split into many queries. Here’s an example of what that might look like:
use AdventureWorks2012 CREATE TABLE #Results ( BusinessEntityID int, FirstName nvarchar(50), LastName nvarchar(50), JobTitle nvarchar(50), PhoneNumber nvarchar(25), PhoneNumberType nvarchar(50), AddressLine1 nvarchar(60), AddressLine2 nvarchar(60), City nvarchar(30), StateProvinceName nvarchar(50), PostalCode nvarchar(15), CountryRegionName nvarchar(50), Purchases int, SalesCount int ); INSERT #Results (BusinessEntityID, FirstName, LastName) SELECT BusinessEntityID, FirstName, LastName FROM Person.Person WHERE FirstName = 'Michael'; UPDATE #Results SET JobTitle = e.JobTitle FROM #Results r JOIN HumanResources.Employee e on r.BusinessEntityID = e.BusinessEntityID; UPDATE #Results SET AddressLine1 = a.AddressLine1, AddressLine2 = a.AddressLine2, City = a.City, StateProvinceName = sp.Name, PostalCode = a.PostalCode, CountryRegionName = cr.Name FROM #Results r JOIN Person.BusinessEntityAddress bea ON bea.BusinessEntityID = r.BusinessEntityID JOIN Person.[Address] a ON a.AddressID = bea.AddressID JOIN Person.StateProvince sp ON sp.StateProvinceID = a.StateProvinceID JOIN Person.CountryRegion cr ON cr.CountryRegionCode = sp.CountryRegionCode; UPDATE #Results SET PhoneNumber = pp.PhoneNumber, PhoneNumberType = pnt.Name FROM #Results r JOIN Person.PersonPhone pp ON pp.BusinessEntityID = r.BusinessEntityID JOIN Person.PhoneNumberType pnt ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID; WITH cust (Purchases, PersonID) AS ( SELECT COUNT(1), c.PersonID FROM Sales.SalesOrderHeader soh JOIN Sales.Customer c ON c.CustomerID = soh.CustomerID GROUP BY c.PersonID ) UPDATE #Results SET Purchases=cust.Purchases FROM #Results r JOIN cust on cust.PersonID = r.BusinessEntityID; WITH sale (SalesCount, PersonId) AS ( SELECT COUNT(1), soh.SalesPersonID FROM Sales.SalesOrderHeader soh GROUP BY soh.SalesPersonID ) UPDATE #Results SET SalesCount=sale.SalesCount FROM #Results r JOIN sale ON sale.PersonId = r.BusinessEntityID; SELECT * FROM #Results;
When is this technique appropriate?
I like to use this performance technique before I consider query hints, (but after other simpler improvements like indexing). Even so, this technique is not always appropriate. I’ve seen it work best on complicated queries (How do you know when they’re complicated?). And I’ve seen this work best against large datasets (processing millions of rows for example). Complicated queries have a higher risk of generating poor query plans. But breaking these huge queries into smaller parts addresses this problem.
In my example, I’ve split the original query into seven. That was just for illustration. Maybe better is splitting your monster query into only three queries. Always test.
Empirical evidence tells me that simpler SQL performs better. I’ve split up complicated queries and often they become much easier to maintain but almost as often I’ve never needed to!
I’ve added a few comments below that qualify some of the things I’ve said here.