Michael J. Swart

June 8, 2011

Forget About PIVOT, You Don’t Need It

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm

Takeaway: Until recently, I was a huge T-SQL PIVOT fan. But the more I think about it, the more I realize that it’s probably better to pivot data anywhere else than to pivot data in SQL Server. I talk about easier and better alternatives to PIVOT. Remember PIVOT is a T-SQL command that was introduced in SQL Server 2005 that lets you return results as pivot tables or more commonly as cross tabulations.

SQL Server 2000 and Earlier

I remember the first time I came across the need for pivot tables. I knew what I needed the data to look like, but I didn’t know what they were called. It didn’t matter, PIVOT wasn’t yet a T-SQL keyword and I couldn’t make use of it any way.

I was asked to write a query that returned data about the number of widgets and their categories. That seems like no problem right? Except I was asked to return the data in a two dimensional grid (with widgets as rows and categories as columns). It was a perfect use case for PIVOT! Except that it wasn’t yet available and so I was forced to write something complicated in T-SQL that used subqueries, CASE statements and other crazy stuff.

The resulting query was at least 100 lines. It looked complicated but it did the trick. I was actually congratulated on my SQL skillz (although in hindsight I shouldn’t have been).

But it got me to thinking. Why was that so hard? I used to think that any reasonable English question can be translated into simple SQL. But here was a big fat hairy counter-example. I eventually came up with an answer: It’s because the requested data has information about different sets of data in the same row.

SQL Server 2005 and Later

Fast forward a few years. I was learning about all the new stuff that SQL Server 2005 could do. And I came across the new PIVOT keyword. I knew immediately what it could do for me. I knew immediately how it was used. And I used it when the results called for it. I never remembered the syntax because I knew I could always look it up.

Eventually I gained a reputation as a SQL know-it-all (whether I deserved it or not) and I started fielding database questions. If any colleagues were struggling to return a cross-table, it was easy to recognize what they were doing and easier to point them to the online docs for PIVOT.

Or Just Let The App Deal With It

But I realized recently that it’s really not necessary. At the database level, I probably don’t have to pivot this. I can ask the question: Do I really need to deliver the data that way – pre-pivoted? Heck No! I can delegate that stuff. It’s actually a lot easier to pivot this data almost anywhere else besides SQL Server.

So in essence I’m giving myself (and you!) permission to forget something: T-SQL’s PIVOT syntax.

Pivot Inside Excel

Say the data is a one-time-only query and you want to include a cross table in a spreadsheet or email. Well Excel’s pivot feature turns out to be dead simple. This pivot functionality is also found in Open Office’s Calc and any other spreadsheet application built this century. Just a couple extra tips:

  • Using the “format as table” feature can save yourself a couple clicks.
  • Find the “Create Pivot Table” feature under the Insert tab.
  • If you’re still having any trouble, I’m sure there’s tons more help here.

Pivot Inside a .Net App

Okay, so say you’re writing some C# and you have a DataTable that you wish were more pivot-y. Ask and ye shall receive:

DataTable Pivot( DataTable dt, DataColumn pivotColumn, DataColumn pivotValue ) {
	// find primary key columns 
	//(i.e. everything but pivot column and pivot value)
	DataTable temp = dt.Copy();
	temp.Columns.Remove( pivotColumn.ColumnName );
	temp.Columns.Remove( pivotValue.ColumnName );
	string[] pkColumnNames = temp.Columns.Cast<DataColumn>()
		.Select( c => c.ColumnName )
		.ToArray();
 
	// prep results table
	DataTable result = temp.DefaultView.ToTable(true, pkColumnNames).Copy();
	result.PrimaryKey = result.Columns.Cast<DataColumn>().ToArray();
	dt.AsEnumerable()
		.Select(r => r[pivotColumn.ColumnName].ToString())
		.Distinct().ToList()
		.ForEach (c => result.Columns.Add(c, pivotColumn.DataType));
 
	// load it
	foreach( DataRow row in dt.Rows ) {
		// find row to update
		DataRow aggRow = result.Rows.Find(
			pkColumnNames
				.Select( c => row[c] )
				.ToArray() );
		// the aggregate used here is LATEST 
		// adjust the next line if you want (SUM, MAX, etc...)
		aggRow[row[pivotColumn.ColumnName].ToString()] = row[pivotValue.ColumnName];
	}
 
	return result;
}

If you know the shape of your data ahead of time, you could have coded this more easily by hard coding column names. But what I’ve given here is a general procedure (It works on most test cases. Error handling and extra testing are left as an exercise for the reader :-) ).

Pivot Inside Other B.I. Clients

Most other clients I can think of have even easier pivot features:

  • Reporting Service’s matrix control is a pivot table.
  • Integration Service’s pivot transformation is easy and handy.
  • Analysis Service’s cubes are nothing but a multi-dimensional pivot tables.
  • Other Business Intelligence software is all over this. If your B.I. software can’t do cross-tables or pivot tables, I would ask for your money back.

You don’t need T-SQL’s PIVOT table at all. Forget that stuff…

… Except When You Do Need T-SQL’s PIVOT

Okay, maybe I’m a bit over-zealous there. Here are some reasons why you might want to use T-SQL’s PIVOT tables. They’re not great reasons but I present them any way. You’ll need to know this stuff:

  • when studying for certifications, interviews or other tests,
  • when the application or data destination you’re working with is not suited to pivoting (although nothing comes to mind). Your choice is then PIVOT using T-SQL or not at all.
  • when the data’s destination is the SSMS results pane. That’s fair.
  • when you don’t have control over the app, or the app is too difficult to modify and it’s expecting a cross-table for data.

So for the most part, you can forget about PIVOT syntax. Next week maybe I’ll talk about remembering the syntax for the MERGE statement. In my opinion, SQL’s MERGE statement is a statement worth remembering, but that’s another story.

24 Comments »

  1. I find myself NOT using the PIVOT command more and more as time goes on… instead I’m just doing a GROUP BY and putting together the aggregates myself… because it gave me more flexibility.

    The biggest problem with PIVOT that people have trouble figuring out is that the data on the left side of the PIVOT operator must have a pivot column (whose values will become columns) and the aggregate column… and whatever columns are left will be GROUPed… so you have to make sure that you only have those basic columns (and no extraneous columns) or else you’ll get undesired results.

    –Brad

    Comment by Brad Schulz — June 8, 2011 @ 12:23 pm

  2. That’s a good way of looking at it Brad. I mean there are hints everywhere (like forcing aggregations for the pivot data) that would indicate that PIVOT is just a GROUP BY in different clothing.

    But now that I don’t have to remember PIVOT syntax. What am I goiong to do with all these extra brain cells. Maybe I can try to master MDX… err… maybe not.

    Comment by Michael J. Swart — June 8, 2011 @ 12:56 pm

  3. > But now that I don’t have to remember PIVOT syntax. What am I going to do with all these extra brain cells.

    Have you ever considered piracy? You’d make an excellent Dread Pirate Roberts.

    Comment by David Swart — June 8, 2011 @ 3:10 pm

  4. [Michael considers piracy and expresses a piqued interest]

    Comment by Michael J. Swart — June 9, 2011 @ 10:29 am

  5. Michael, great post. I have to admit I’m a first time reader AND not technical by any stretch (which may be clearer in my question). I was wondering – do you think that the proliferation of (admittedly BI) tools like Qlikview, Tableau, Powerpivot, etc. to end users, both technical and non-technical, reduces the DBA’s need to take advantage of, or more importantly understand, these types of complex functionality in the database? Is this a good thing or a bad thing? Because it seems like it makes the “role” of accidental DBA a more likely and acceptable outcome for someone who finds themselves close to the database.
    Thanks again for the post and I look forward to reading more!

    Comment by Andy Grant — June 9, 2011 @ 1:54 pm

  6. Hi Andy!
    From 2 blokes marketing eh? So you must know Brent Ozar? Now that’s a man who’s got strong opinions on a database server doing much more than serving data fast. If a database server is doing more than serving data, it’s got a bit of an identity crisis. And I think I share that opinion (to a point).

    As a DB Developer (rather than a DBA) I write and tune a lot of stuff on the DB. Even so, I’ll often ask myself “Do I have to do this on the DB?” before I tackle “How do I do this on the DB?”. And so if applications like Qlikview and the rest can help give the database some breathing room, I’m all for it. So to answer your question, I think it’s a good thing.

    But even if a DBA never writes a line of SQL, there will always be a need for them to do security, disaster recovery and high availability stuff (you can google those if you’re curious).

    In short, I don’t think that clever applications reduce the required skillset for a DBA (or DB Developer) much at all. Does that answer your question?

    Comment by Michael J. Swart — June 9, 2011 @ 2:30 pm

  7. HA! Strong opinions, funny – yeah, I usually ask those kinds of tough questions. It’s so much easier to scale when you push as much load as possible out onto the application tier rather than having SQL Server do it.

    If your goal is to answer a one-off business question quickly, sure, use pivot in T-SQL. If you expect to take that data and then automate the answer so that lots of people can ask it quickly, that won’t scale as well inside the database engine.

    Comment by Brent Ozar — June 9, 2011 @ 4:01 pm

  8. Good point Brent: “Know your audience”

    Comment by Michael J. Swart — June 9, 2011 @ 4:04 pm

  9. This totally does answer my question (by “this” I mean both Michael’s and Brent’s responses :) ). Thanks for the feedback!

    Comment by Andy Grant — June 9, 2011 @ 5:53 pm

  10. [...] Forget About PIVOT, You Don’t Need It – I can never remember the syntax for PIVOT offhand, so true and finally Michael J. Swart (Blog|Twitter) gives us permission not to. Be sure to also check out the great discussions in the comments. [...]

    Pingback by SFTW - SQL Server Links, News and Community Stuff This Week — June 10, 2011 @ 7:43 am

  11. One company I worked for they got a BI expert. Everyone laughed and told me that I’d soon be on the street-corner selling popcorn. The guy installed SSAS and put all the hooks into the existing OLTP databases and was soon happily vacuuming up all the data.
    Soon he was doing all sorts of fancy reports and my credibility was at an all-time low, especially when he was reporting different profit figures.
    Obviously I put in a few late nights to see why the figures didn’t tally.
    They didn’t tally, of course because any serious data extraction has to understand that data, and he’d made all sorts of mistakes and hadn’t double-checked and reconciled the various intermediate stages. He caused complete chaos and it took months to get correct figures. In the meantime all sorts of business decisions had been made on the assumption that his figures were correct (They included a lot of fraudulent trading for a start that had to be written off) The cost to the company was pretty significant.
    Sure, BI tools are great, but they are only as great as the data that supports them, and getting the data correct is the difficult bit. And, yes, the BI expert was the one who ended up selling popcorn at the street corner: well, in truth he was soon bobbing up again, pursuing the next hot IT craze. He’s running a cloud hosting company now.

    Comment by Phil Factor — June 11, 2011 @ 1:55 pm

  12. Well Kudos to you Phil for putting in those late nights to find out why the figures didn’t tally. I’m not too surprised that you were willing. I know a lot of people who care about data (security, integrity and yes accuracy). What sucks though is that you had to. When Person A has results different than Person B, the onus to explain the discrepancy goes to the one with less clout (unless you’ve built a good reputation as a data-discrepancy-explainer).

    About the tools, I’m with you 100% Phil, the BI tools *are* great, but they’re just that: tools. And they have to go hand in hand with good people/processes. If I were to build a house with the best power tools available. No one would blame the poor quality of the shack I built on the tools :-)

    Comment by Michael J. Swart — June 12, 2011 @ 10:20 pm

  13. Just my 2cents.. I am not a DBA per se, but I am our primary database developer. We have a daily feed of pricing data that comes from live markets for specific time frames – one record for a 30day price, another record for a 60day price, and another for a 90day price, etc. We need to generate reports that display various Products with all the prices on one line, so Products on Y axis, Prices on X axis. The PIVOT works great for this. I expected the performance might be slow, but we were pleasantly surprised to find that it is not. Since our output goes to Reporting Services, using .NET as a middle tier would just add extra work. So, from my perspective, I wouldn’t throw PIVOT out for all scenarios. It’s always dangerous to make a blanket statemtent that you can just forget it. I, for one, am very happy to have this functionality in SQL. As for not remembering the syntax, well frankly I look up syntax for a large number of both SQL and .NET methods, so that’s just par for the course for me. Why memorize when you can look it up!

    Comment by VGrimes — June 13, 2011 @ 10:21 am

  14. @VGrimes

    I think you touch on a lot of things that I agree with. My main point of this whole article here was that it’s easier to do PIVOT in other places. And in my opininion, it’s sooo much easier to do the pivoting in Reporting Services than in SQL Server, either with RS’s matrix control or by using group headers to display data (Especially using Report Builder 2.0 and later).

    But any way, a careful reading of my article actually advocates the point you’re making. “You might want to use PIVOT … when the app is too difficult to modify”. The app in this case is SSRS and if it’s difficult because T-SQL’s PIVOT is more familiar, then by all means, use PIVOT.

    But thanks for stopping by, I hope you’ll continue to watch the site!

    p.s. I don’t think anyone would advocate using .NET as a middle tier when the end application is SSRS.

    p.p.s. I don’t like blanket statements either, see the last paragraph of the article ;-)

    Comment by Michael J. Swart — June 13, 2011 @ 10:50 am

  15. Thanks Michael, I did read the last paragraph so I know you made room for the caveat “Except When You Do Need T-SQL’s PIVOT”. ;) What I didn’t know was that SSRS has a way to handle this as well (you can probably tell I’m new to SSRS!). However, one last consideration that extends to a wider discussion that our dev team has recently been having: consolidating our business logic to be non-repeatable and where is the best place to do that. The answer is different for each business environment, no doubt. We have been talking about creating WCF modules to encapsulate certain calculations we do over & over, but for our team, the familiarity with SQL means that there is a strong contingent of folks who would rather see SQL functions do this instead. And as long as the performance is good, and so far our tests have shown that it is, it may be preferable for us to encapsulate this logic in SQL functions rather than in a .NET or WCF business layer.

    I know from experience that that is rare to encapsulate BL in SQL functions, but I don’t think we’re the only ones out there to consider it. In any case, if that’s what we decide to do, then the PIVOT will stay in SQL so that we guarantee the same results in SSRS or in a .NET page or in a public WCF object.

    Comment by VGrimes — June 13, 2011 @ 11:44 am

  16. Hey @VGrimes

    Wow, it sounds like the development practices in your business environment are well thought out. And it seems like any standards or best-practices that are adopted there are done carefully.

    So in your case, when you weigh the benefits of SQL familiarity, decent performance, consistent business logic against this article’s proposed benefit of ease of use. Then T-SQL’s PIVOT is the way to go.

    But you can still forget the PIVOT syntax because as you said “Why memorize when you can look it up!” :-)

    Comment by Michael J. Swart — June 13, 2011 @ 2:37 pm

  17. What about using this method via Entity Framework? Since CopyToDataTable() never made it out of the box because of DataRow strong-typing missing from IQueryable results, is there an easy way to map a query result from EF to your method above?

    At present is seems the lesser headache will be to just back up and punt with ADO.NET vs. EF.

    Comment by Boyd Campbell — January 17, 2012 @ 3:36 pm

  18. Hi Boyd,

    I have to admit, I’m not an expert at Entity Framework stuff (although that’s probably going to change sooner than later). So I don’t know if there’s an easy way to map the results.

    My gut feeling is that it is easier to use ADO.NET rather than E.F. But that may be out of ignorance. If (when!) I find out I’ll update this post. … Unless you beat me to it :-)

    Cheers, Michael

    Comment by Michael J. Swart — January 30, 2012 @ 9:08 am

  19. Hi,

    Apparently the Query Optimiser Engine treats GROUP BY + Aggregates quite different from PIVOT.

    In my case, when executing both versions in the same Batch, the GROUP BY takes 1% and the PIVOT takes 99%. This seems to be down to a LEFT JOIN elimination in the underlying VIEW I am ‘grouping’/'pivotting’ over (only the GROUP BY eliminates the JOIN while the PIVOT does not).

    Comment by Hans Lindgren — May 16, 2013 @ 4:44 am

  20. Hi Michael…

    Hans’ comment this morning is absolutely true… it’s something that I only discovered recently at a client site.

    Let me elaborate by example…

    Let’s say you have a table containing phone numbers. Each customer may have several entries in this phone number table… a home number, a work number, a mobile number, a fax number, etc:

    create table phones
    (
      customerid int
     ,phonetype varchar(10)
     ,phonenumber varchar(20)
    )
    insert phones values (1,'Home','12345')
    insert phones values (1,'Work','23456')
    insert phones values (1,'Mobile','88888')
    insert phones values (2,'Home','98765')

    Let’s say you create a view (called vCustomerFullData) that JOINs the Customer table with this Phone table, PIVOTing the phone numbers into columns like HomePhone, WorkPhone, MobilePhone, FaxPhone:

    alter view vCustomerFullData 
    as
    select c.customerid,c.customername
          ,HomePhone=p.home
          ,WorkPhone=p.work
          ,MobilePhone=p.mobile
          ,FaxPhone=p.fax
    from customers c
    left join (select *
               from phones
               pivot (max(phonenumber) for phonetype in (Home,Work,Mobile,Fax)) x
              ) p on c.customerid=p.customerid

    So the view now has all the columns of the Customer table AND these great new phone number columns.

    People love this view and it becomes so popular that everyone refers to this view in all of their queries instead of referring to the original Customer table itself:

    SELECT customerid,customername,workphone FROM vCustomerFullData WHERE ...

    So inevitably people will reference the view and not refer to any of the phone numbers at all:

    SELECT customername FROM vCustomerFullData WHERE ...

    Even though the view has a LEFT JOIN in it, the optimizer will STILL insist on accessing the PHONES table, even though we aren’t referring to any phone columns at all.

    And that’s PIVOT’s fault.

    If instead we wrote the view the old-fashioned way…

    alter view vCustomerFullData
    as
    select c.customerid,c.customername
          ,HomePhone,WorkPhone,MobilePhone,FaxPhone
    from customers c
    left join (select customerid
                     ,HomePhone=max(case when phonetype='Home' then phonenumber end)
                     ,WorkPhone=max(case when phonetype='Work' then phonenumber end)
                     ,MobilePhone=max(case when phonetype='Mobile' then phonenumber end)
                     ,FaxPhone=max(case when phonetype='Fax' then phonenumber end)
               from phones
               group by customerid
              ) p on c.customerid=p.customerid

    Then if you do a query on the view that doesn’t include phones…

    SELECT customername FROM vCustomerFullData WHERE ...

    Then the optimizer will ELIMINATE the Phones table from the query plan, as it should.

    Interesting, huh?

    –Brad

    Comment by Brad Schulz — May 16, 2013 @ 5:21 pm

  21. Thanks Hans, Brad.

    It is interesting.

    I posted this example to SQLPerformance.com.
    Paul White gives (I think) a definitive explanation of some of the details of what’s going on in the query optimizer.

    Comment by Michael J. Swart — May 29, 2013 @ 8:51 am

  22. Thanks for digging further Michael (and thanks for formatting the code I posted… looks much better!)

    I appreciate Paul’s analysis… I figured there was something about the properties of the PIVOT operator that (for whatever reason) prevented the removal of the redundant JOIN. I always found the PIVOT to be a little clumsy anyway… converting the view to a CASE/MAX/GROUP approach has made vast improvements in the client queries that use the view.

    –Brad

    Comment by Brad Schulz — May 29, 2013 @ 10:10 am

  23. [...] might ask yourself, why pivot? The simple answer is because I love them. I have heard the arguments,1 "don’t do this type of data transformation on a server do it in the application [...]

    Pingback by did somebody say pivot? | only bluefeet — August 8, 2013 @ 8:56 pm

  24. The Kettic PivotGrid Control support of full OLAP, Pivoting, tabular and compact layouts, filtering and sorting, group, and more functionalities.

    Comment by Leonard — August 29, 2013 @ 5:59 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress