The 80-20 rule (aka the Pareto principle) says that 80% of effects come from 20% of the causes. When you apply this to Quality Control in a manufacturing environment, you can then also say that 80% of defects come from 20% of the causes.
The Pareto Chart
This gives rise to the pareto chart. See this google image search to get a sense of some other examples.
Essentially you plot a histogram of the top x defects in descending frequency. On a second axis, you plot the cumulative total and show that value in percentage of all causes. The picture on the right would be a perfect example if this post were called the 65-35 rule.
How does this help?
If your goal is to reduce defects, then the pareto chart can help you keep focused on the important things. My brother says this is a way of formalizing common sense. It helps make sure that the most serious defects get attention.
What does this have to do with SQL Server?
Two things really.
QC for SQL Server
The first is that you can apply the same principle of Quality Control to the performance of database servers. Instead of measuring defects, you can measure total IO or Total CPU.
SQL Server comes shipped with a few canned performance reports that give you great information. (Right-click your server in Object Explorer, Select Reports, then Standard Reports, then pick one of the Performance * reports.
I think Performance – Top Queries by Total IO and Performance – Top Queries by Total CPU are the most useful. These are histograms, and are one step away from being a Pareto chart.
I’ve made my own charts with titles like Top CPU, TOP I/O, and TOP Blocked Tables .
This second thing is that this task seems like a job for Reporting Services! Well it seems that way but it’s not.
I was given a task to create such a report once and it was hard to work with SSRS to display the report exactly as I wanted. In the end, I’m not sure the report itself was used very often.
(Update (10/17/2008): SRSS 2008 improved a lot… For an solution in 2008 see SSRS is fun again)
Recently, when faced with a task of creating another such chart, I decided to try using a third party chart. I downloaded a trial version of Dundas chart for Reporting Services. They’ve got a Pareto chart that looks like it fits the bill. But there were two things I didn’t like. The cumulative percentage line only shows the cumulative percentage of the shown data. Which makes it tricky if I want to show the TOP 20 bad queries but show percentages based on 1000 queries. Also the scale of the histogram matches the scale of the percentages which I don’t like.
Excel to the rescue!
Well in the end I found that Excel charts do what others couldn’t. Excel is the jack of all trades and master of none. By being flexible, Excel lets me get a nice pareto chart like the one shown here. Here’s how I do it.
- Paste the sorted data into a column in Excel. Say, for example, you’ve pasted into column B.
- In the next column (C), create a column with values that contain the cumulative values. So the formula in C4 would be
=B4 + C3
Fill this formula down for the whole column.
- In the next column (D), create a column with values that contain the cumulative percentages. These will be the values for the line in the pareto chart. So the formula in D4 would be
Fill this formula down for the whole column
- Select the top x values in column B and D. This is the data for the pareto chart. Then select the chart wizard.
- In the chart wizard, the kind of chart to pick is in the Custom Types tab. Pick Line – Column on 2 Axes
- Press Finish and season to taste.
Why is this better?
- I think it’s better because it’s slightly less automatic and slightly more manual. Giving users more options, especially savvy users, is wise in some cases. This is one of those cases.
- You’ll never have to field a question like: Can I export this into Excel?
- If you’re the consumer of the data, then you know what the data means already.
- Updating the data is a matter of copy-paste. A little more work than a refresh button, but worth it.