Prioritize This List of Issues

July 5, 2016

Prioritize This List of Issues

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:14 am

“One thing at a time / And that done well / Is a very good thing / As any can tell”

But life isn’t always that easy is it? I spend a lot of my workday juggling priorities. And I want to compare what I think to others. So I wrote a survey which explores the importance people place on different SQL Server issues. It’s easy to say avoid redundant indexes. But does it follow that it’s more important to clean up redundant indexes before rewriting cursors?

The List

Prioritize this list from greatest concern to least. So if an item appears above another item, then you would typically tackle that issue first.

Corrupted database
A SQL injection vulnerability
Stale statistics
Fragmented indexes
Developers use a mix of parameterized SQL and stored procedures
Sprocs prefixed with “sp_”
Cursors
GUID identifiers
NOLOCK hints
No automated backups

I want to hear what you think. Submit your prioritized list in the comments, or by using this survey: https://www.surveymonkey.com/r/MV9F9YT

I’ll be posting my own answers on Thursday, July 5, 2016.

Update: I’ve shared the results. Prioritize This List Of Issues (Results)

-- Comments (19)

19 Comments »

•No automated backups
•A SQL injection vulnerability
•Corrupted database
•Coffee cup empty
•NOLOCK hints
•Cursors (assuming that set-based operation is possible)
•GUID identifiers
•Stale statistics
•Sprocs prefixed with “sp_”
•Developers use a mix of parameterized SQL and stored procedures
•Fragmented indexes

Comment by Robert L Davis — July 5, 2016 @ 9:33 am
Priority order would change based upon the relative usage of the databases in question. The only thing that would be first in all situations would be the backups, other than that, difficult to make the call.

Comment by Nic Cain — July 5, 2016 @ 9:41 am
Corrupted database
A SQL injection vulnerability
No automated backups
Stale statistics
Fragmented indexes
Cursors
NOLOCK hints
Developers use a mix of parameterized SQL and stored procedures
Sprocs prefixed with “sp_”
GUID identifiers

Comment by kevin3nf — July 5, 2016 @ 9:42 am
1) Updated Resume
2) Corrupted Database
3) No Automated Backups
4) A SQL injection vuln
5) …

Comment by Daniel — July 5, 2016 @ 10:48 am
No automated backups – because before I break anything else, I need to cover my butt
Corrupted database
A SQL injection vulnerability

As people complained about performance, I’d touch:
Stale statistics – by setting up automated stats updates with Ola
Fragmented indexes – Ola’s jobs will take care of this too

If my developers brought me performance concerns, then I’d let them prioritize between these:
Developers use a mix of parameterized SQL and stored procedures
Cursors

Things I wouldn’t bother touching because the ROI is rarely there for existing apps:
Sprocs prefixed with “sp_”
GUID identifiers
NOLOCK hints (I’d only touch if we were implementing RCSI or snapshot isolation, or users told me they wanted more accuracy in their reports)

Comment by Brent Ozar — July 5, 2016 @ 12:51 pm
At first glance I would place these into 2 buckets, the first having corruption, injection, and backup; the other having everything else. I’d say the first bucket are things a DBA would want to address first, and the second bucket are primarily performance-related items which should come later. If pressed for an immediate answer, that’d be as far as I’d go, I wouldn’t waste time trying to differentiate between corruption, injection, and backups, they all should be fixed asap. Top priority is protect the data.

But given a little more time and thought, I’d want to know more about the environment, and whose priorities we are talking about. I could invent situations where several of these would become top priority. You could also plot these on a urgent/important matrix too, not just a one-dimensional priority list. Maybe even add “ease/speed of remediation” attributes too, as those can influence priority. Sometimes the urgent and important items come with hard and slow remediations so they get shoved down the priority list anyway.

What kind of environment would justify prioritizing performance over something like corruption or injection? If an x% increase in TPS had a direct correlation with y% increase in revenue, I could imagine someone pushing for performance over anything else, which is why a DBA should push for the first bucket first. (Again, whose priorities?)

Then I start considering what underlying relationships might exist between these items. Does the injection exist because developers are mixing parameterized sql and SPs? Is it possible the corruption is a result of the injection? If so, maybe shutting down the injection vector becomes top priority to prevent things from getting worse than they already are. There could be several other compliance-related reasons why shutting down the injection could get pushed higher too.

But it’s hard to see how a corrupted db is anything other than the top priority, especially when you say corrupted – past tense. If it’s already corrupted I’d wager you’re already getting yelled at about it. Even without automated backups, you should try to take a manual backup of the db you’re about to try taking corrective action on, in case you make it worse. If the backup works on the corrupted db, you can attempt to restore elsewhere and test any fixes without impacting the original. If a manual backup of the corrupted db fails, and assuming you don’t have any other clean backups, manual or automated, you’ll likely be busy trying to copy out whatever salvageable data you can into another db (one that ideally is on a different storage system, since storage could be the reason for the corruption). Maybe you get lucky and the corruption isn’t bad, or someone created a manual backup if you do need to restore in order to fix the corruption. Maybe you are even more lucky that even though a dbcc checkdb fails that nobody is complaining, though in that case I’d say it feels like a ticking time bomb.

This leads you to the lack of automated backups. The key word there is automated, which leaves open the chance that maybe someone manually runs some backups somewhere sometimes. It’s also easy to assume “no automated backups” means just for the db which already has corruption, but it could mean no automated backups for the whole company, or some other scope, in which case maybe that takes a temporary back seat to the corruption. It’s one thing to quickly run Ola’s scripts on a single instance to get automated backups running for it, assuming you have enough disk space somewhere other than local disk, but another thing if you have to think about setting it up for 100s or 1000s of instances for the whole company if that’s what “no automated backups” means.

If you’ve addressed the corruption, injection, and backup issues, I’d say the performance issues get prioritized by any kind of numbers you can assign to the relative performance increase you’d get by fixing each, weighted by how widespread the problem is throughout the codebase. However, there’s a chance several of these are related. GUID ids are often behind fragmented indexes, maybe stale stats too if you’re lucky. Stale stats combined with parameterized sql could lead to parameter sniffing issues, which may then be related to the perceived need for NOLOCK hints, though cursors with NOLOCK hints also might be a bad sign too.

But consider which things you can address from the db side alone vs. which need application changes. Many have been the times I’ve heard people say I need to make the db faster while requiring not a single change in the app code.

In summary, my list:
1. corruption (assumes it’s already on fire, but you can take a manual backup of the db)
2. injection (unless compliance overrides you to make it #1)
3. automated backups (assuming it’s more than just for the instance with the corrupted db)
—
4. stale stats
5. cursors (didn’t say if they were w/in sp or app side)
6. fragmented indexes (though I’d want proof it changed performance)
7. mixing parameterized sql and sps (as long as it’s properly parameterized, it’s not bad)
8. nolock (consider RCSI which may be more in line with what they want anyway)
9. guids (good luck trying to rip those out once they exist)
10. “sp_” (unless those get called so often that you can see the impact of the cache miss in master before finding the proc in the user db, in which case bump that up the list b/c it should be an easy refactor.)

Comment by Chuck Rummel — July 5, 2016 @ 8:01 pm
I went with

1) Corruption
2) SQL Injection
3) Automated backups –of course, getting *some* backup in place would be higher on the list
4) Stale stats –I’d let an sp_updatestats rip through while I was tackling other things lower on the list
5) GUIDS
6) fragmentation (5&6 in that order because if the indexes are *on* guids, I’m fighting a losing battle)
7) cursors
8) mixing procs and parameterized sql
9) nolock –a personal pet peeve of mine, but pretty low on the DBA hierarchy of needs
10) sp_* procs

Comment by Ben Thul — July 5, 2016 @ 11:51 pm
I’m a bit late to the game but here is mine.

— Actual/Possible disasters
No automated backups — I don’t do anything without backups
Corrupted database — Not going to do anything until this is fixed

— Performance issues
Stale statistics — Easily fixed during any quiet period during the day
Fragmented indexes — Easily fixed during the next maintenance window
NOLOCK hints — Easily changed & possible incorrect results but could impact concurency some testing required
Cursors — High performance impact but also likely to be time consuming to fix

— Potential Problems
A SQL injection vulnerability — While this is important it’s a potential problem & will require some work to fix

— Low impact to fix ratio
GUID identifiers — This has a performance impact but may be there for a reason. Change slowly & with care
Developers use a mix of parameterized SQL and stored procedures — Requires re-coding, fix over time
Sprocs prefixed with “sp_” — No real performance impact, naming issue only. Fix during slow work periods

The last 6 NOLOCK, Cursors, SQL injection, GUIDs, parameterized SQL & sprocs, naming sprocs w sp_ I’d work on all at the same time. Fix one table/related code at a time fixing all of these at once.

Comment by Kenneth Fisher — July 6, 2016 @ 11:19 am
[…] this week I asked people to help me out prioritizing a list of issues. I was surprised by the number of people who participated. I think I missed an opportunity to […]

Pingback by Prioritize This List of Issues (Results) | Michael J. Swart — July 7, 2016 @ 8:01 am
This is where I landed:

No automated backups
Corrupted database
A SQL injection vulnerability
Fragmented indexes
Stale statistics
GUID identifiers
NOLOCK hints
Cursors
Developers use a mix of parameterized SQL and stored procedures
Sprocs prefixed with “sp_”

Depending on how well you know your database, and its data access patterns, the order of the items after “Stale Statistics” is subjective (IMO).

Comment by Tony — July 11, 2016 @ 10:30 am
•A SQL injection vulnerability
•Corrupted database
•No automated backups
•Developers use a mix of parameterized SQL and stored procedures
•NOLOCK hints
•Stale statistics
•Fragmented indexes
•Cursors
•GUID identifiers
•Sprocs prefixed with “sp_”

Comment by Graeme Martin — July 11, 2016 @ 11:31 am
No automated backups
Corrupted database
Stale statistics
A SQL injection vulnerability
Developers use a mix of parameterized SQL and stored procedures
NOLOCK hints
Cursors
Fragmented indexes
GUID identifiers
Sprocs prefixed with “sp_”

Comment by Marian — July 12, 2016 @ 5:20 am
“It depends.”
Isn’t that our standard answer?
Maybe no one uses the corrupt database. Drop it.
Maybe someone is standing over your shoulder saying fix the queries (hitting the fragmented indexes).
Maybe everything’s running just fine.
Of course, Brent is right! We’ve got to get backups running before we leave for the day.
If we install the Ola scripts or similar we’ll be a little better off come morning!

Comment by DoubleBarrellDarrell — July 12, 2016 @ 10:40 am
My director found this blog and sent it to the entire team for us to review and provide feedback. Fascinating discussion. These are my rankings that I sent back to him. I divided the items by order of precedence based on priority.

High Priority

1. SQL injection vulnerability
* Secure the database system(s) first and foremost.
2. Corrupted database
* This is basic system maintenance: keep the systems up – the systems include the databases.
3. No automated backups
* Argument could be made that this is #2 and corrupted databases are #3. But … Back in 1998/1999, the company I had just started working for had a contractor handling their Microsoft Office database. Right after I started, the database crashed. And of course, the backup restoration strategy had never been tested, and the only backup that did work was two weeks old. Corrupted database. Two weeks of data gone. Automated backups are great if they’re tested and they work. But when a database is corrupted and backups don’t work, fixing the corruption is the higher priority.

Medium Priority

4. Stale statistics
* Poor performance attributable to poor query plans due to out-of-date statistics can easily increase system resource usage (CPU, IO)
5. Fragmented indexes
* Even with great query plans and current statistics, badly fragmented indexes means longer scans over more data pages (SQL Server will use a scan over a seek when it estimates touching more than 5% of the data pages)
6. GUID identifiers
* My personal feelings [about the greatness of GUIDs — my director and I have had many arguments about GUIDs] aside, GUIDs improperly used would take precedence here. A clustered index on a GUID using newid() causes enormous index fragmentation and page splits; causing IO and CPU usage increases (and, yes, I’m willing to concede that if the vast majority of cases – a newid() GUID is probably not warranted)
7. Developers use a mix of parameterized SQL and stored procedures
* (Or they don’t use stored procedures at all … *growl*)
* Stored procedures create query plans that are cached and reused (usually efficiently); parameterized SQL can cause one-time query plans or cached query plans to be reused inefficiently on a different value for the parameter (sniffing) causing far more work to be done than is needed.
8. Cursors
* Let’s be fair: [although cursors are powerful when used correctly] in the vast majority of cases, … cursors are misused and negatively impact performance where a set-based solution would perform better. (That’s the text book answer but we all know it well.)

Low Priority

9. Sprocs prefixed with “sp_”
* I consider this a low priority item just from my experience. Yes, it impacts the system because of the inherent scan in the master database, but that’s a Microsoft issue and we are talking in terms of milliseconds – but an impact is an impact, especially on a critical system.
10. NOLOCK hints
* While I abhor NOLOCK for all its many splendorous traps, it is – in all reality – the least of these overall. Now, I could argue that because NOLOCK is misunderstood – its more effective synonym, READUNCOMMITTED, is better understood – it provides a negative impact to the end-user who expect queries to always return accurate data

Comment by Seth Stokes-Leifeste — July 14, 2016 @ 11:59 am
I see the list as operational and Developmental.

As a DBA with an operational focus I can do things about

Backup
Stats
Indexes

Corruption – hmmm, interesting. Why/how is it corrupt? Are we talking DB offline, or are we talking about data that isn’t what it should be or incomplete? If you were looking at that list and say hey the DB is corrupt and offline, then that would be your first priority. If it has bad data then there is a whole lot of different components and situations that may be contributing to that state. It’s no longer a set fix point and recovery methodology that a corrupt offline database would require.

All the other issues should require vendor & developer resources because a actual decision was made to code in that state. It is those decisions that need to be understood and worked on to bring about a resolution.

If you are a developer DBA the order is completely different.

Just make sure you got some sort of backup and can recover from it in line with your business expectations.

Comment by Gio — July 20, 2016 @ 8:27 pm
Scary how many posters have not put Backups first. We DBA’s are guardians/protectors of the Data! This should be the first thing we do.

Comment by Scott — July 25, 2016 @ 7:08 am
Hi Scott, The results (https://michaeljswart.com/2016/07/prioritize-this-list-of-issues-results/#comment-187730) survey shows that most people ranked corrupted data as a higher priority than no automated backups. I have to say that I agree with them for a couple of reasons:

1) The way I worded it was that there were no automated backups.
2) Backups are taken to mitigate against disasters (such as corruption).

As a DBA, I would prefer to inherit a db without backups over a db known to have corruption any day.
Your mileage may vary.

Comment by Michael J. Swart — July 26, 2016 @ 4:15 pm
Agile teams have a slightly different way of prioritizing issues. Agile teams deliver new work by working via a backlog , which is a list of all issues on the team s radar. The list is always in a prioritized order so that the team can simply pull work off of the top of the list. Iteration based scrum teams work by sprints. They pull a set issues from the top of the backlog to work on during an iteration. Flow based kanban teams continually pull the top item off of the backlog.

Comment by Kerry Rois — August 8, 2016 @ 8:48 pm
Hi Kerry,

That works in a pure development world, but the list of priorities I mentioned kind of straddles both development and operations.
If you’re a tiny shop where teams supporting production, then I’d say a large number of the items I listed would interrupt any sprint work.

In other words, when the database is on fire, it’s unacceptable to say “We’ll get to that next sprint”.

Comment by Michael J. Swart — August 9, 2016 @ 5:10 pm

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

July 5, 2016