Troubleshooting Concurrency - A Demo | Michael J. Swart Michael J. Swart

February 13, 2014

Troubleshooting Concurrency – A Demo

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:00 am

Testing Concurrency

Part One: Generating Concurrent Activity
Part Two: Building Concurrency Tests
Part Three: Troubleshooting Concurrency – A Demo

So you’ve read about how I generate concurrent activity by calling s_DoSomething super often. And you’ve even seen examples of how I define s_DoSomething. Now, you can watch me put all that stuff in action in this screencast.

Then download demo scripts and other files by clicking TroubleshootingConcurrency.zip and then Ctrl+S to save.

Click play and follow along.

The Transcript:

Intro

Hey everyone,
So I’ve posted a couple articles about ways to troubleshoot concurrency problems.
Now I want to demo what that looks like in action.
So what I’m going to do, I’m going to take a common concurrency problem.
In this case, I’m going to tackle the dreaded deadlock and I’m going to investigate the problem and I want to show you the actions that I take so you can see what I’m doing.
And I’ll be using my concurrency generator to help me test a solution that I come up with.
Ready? Here we go

Identify Deadlock To Troubleshoot

So now what I’m gonna do is I’m going to be troubleshooting a deadlock or a set of deadlocks.
And just to set up the scenario a bit, I would normally attack this problem if I’m handed a set of deadlocks to troubleshoot.
In this case <let’s look in here> what I have is a set of deadlock graphs.
Each graph is a file which contains a whole bunch of information about an instance of a deadlock that SQL Server detected and dealt with.
Lets take a look at one.
I’m going to open up in SQL Server Management Studio and first thing you notice is that there are four ovals and four rectangles.
The ovals are processes and the rectangles are resources.
The ovals are fighting over access to the resources and they happen to be fighting in a circle so they get deadlocked.
No one can do anything.
Everyone’s waiting on everybody else.
And you can actually see that SQL Server detected and chose this process as the deadlock victim.
All right, now that’s four processes.
Because I happen to know that each one of these deadlock graphs indicates the same problem that I’m trying to troubleshoot, I’m going to pick a deadlock graph that’s actually smaller.
I’m going to pick graph 112 and open it.
Yeah, I can deal with this.
This one only has two processes fighting over two resources.
This process owns an exclusive lock on this key lock.
I like to think of key locks as a row, just mentally.
And this one owns an exclusive lock on this row up here.
But they’re both waiting for the other guy to release that so they can get a shared lock on it.
Now shared locks and exclusive locks are mutually exclusive.
They are incompatible and so that’s why we wait.
So let’s look at the specifics.
This resource is a key lock which means it’s a row in the table MyPeople and that table’s in the schema Person which is in the database Adventureworks.
In particular it’s got an exclusive lock on a row in an index called pk_Person_MyPeople.
Because of the naming convention, I happen to know this is a primary key.
And it looks like this resource is the same kind of row, or the same kind of key lock.
And the processes, what do we know about this?
Not much. It’s running a stored procedure inside database 9 which is Adventureworks and the ObjectId of the procedure is given there.
I don’t know much else about that yet.
I do know that this process was running the same procedure.
Great, so I think now it’s time to look a little bit closer at this deadlock.

Find Root Cause

So the way I dig deeper into these deadlock graphs is by opening up these files in an xml editor.
It turns out that these deadlock graphs are simply defined as xml.
So you can open this up in notepad or your favorite xml editor.
I’m going to use Visual Studio, because that’s what I like to use.
There it is. Right away, you notice that it’s got the same information.
There’s the processes, but it’s got a lot more details.
So before we knew that this procedure…
Now we know that not only is this procedure called Adventureworks.dbo.s_Person_MyPeople_Delete, but that it was running that query when it deadlocked.
We also know that the other procedure was running the same thing s_Person_MyPeople_Delete.
I want to find out more about this procedure because it was the thing that was deadlocking.
So I’m going to copy. That’s in the clipboard now. I’m going to use Management Studio to be able to do that.
So (sp_helptext and results-to-text, f5) there’s the definition, because we’re connected to Adventureworks.
It’s a very simple statement. DELETE Person.MyPeople WHERE BusinessEntityId = @BusinessEntityId.
Now it’s very unusual for a procedure this small to participate in a deadlock.
Especially when Person.MyPeople …
Let’s look at this (results to grid alt+f1)
Okay, there’s the indexes… especially… this table has a clustered unique index on BusinessEntityId.
So I know that something else is going on here.
Maybe there’s a trigger
Or maybe there’s an indexed view on Person.MyPeople that needs to be updated every time there’s a modification.
Or maybe there’s a foreign key that needs to be enforced.
Something has to explain why after a modification, it continues to want to read other different rows in the same table.
In order to find out more, I’m actually going to look at the execution plan.
So (let’s copy paste and let’s see) I’m going to highlight that and display estimated execution plan.
Alright there we go, this gives me a lot more information.
Here are the things I notice.
There’s a missing index warning of course.
It looks like my unindexed foreign key theory is looking better.
Cause look, there’s the delete.
It goes into the table spool, the rows that are about to be deleted.
It comes out there and it looks … for each row, it makes sure that it scans the table in order to find … let’s see … Mentorid.
It’s looking for Mentorid.
It wants to assert that there are no rows where Mentorid is pointing to BusinessEntityId.
So my guess is that looking at the missing index details… sure enough it wants an index on MentorId.
It’s pretty clear to me that this table needs to enforce… (results-to-grid, alt+f1)
It needs to enforce this foreign key fk_MyPeople_MyPeople where MentorId references BusinessEntityId.
So it’s a foreign key to itself and it needs to scan the whole table in order to enforce that foreign key.
That’s looking like my root cause and I believe that adding the missing index that’s suggested here will help solve my deadlock problem.
But how do we know for sure? That’s next.

Verify Solution With Concurrency Generator

Okay, where are we?
We looked a little at this procedure s_person_MyPeople_delete and in particular, this query, this delete statement that was giving us some deadlock problems.
And we have a pretty good idea of its root cause.
Basically, this statement needed to scan Person.MyPeople in order to enforce a foreign key, a self referencing foreign key.
And it’s that reason that it’s participating in deadlocks.
So it’s my theory that if we have an index on MentorId that’s suggested in these missing index details.
If we create that index, it will not only speed up this query, but it will help concurrency and avoid deadlocks and that’s the solution I’m recommending.
But before I can say that I nailed this problem, I need to reproduce it and see that my fix solves the problem.
And I can do that using my load generator that I’ve been talking about in my blog lately.
First I want to define an s_DoSomething in tempdb. Let’s create
(use tempdb)
And I’m just going to modify an example from my blog.
And I’m going to alter a procedure s_DoSomething
Now I only care about an integer parameter right?
Because the stored procedure I care about only has the one parameter.
Let’s change it. There that looks pretty good.
But I actually want to modify that value.
In order to do that let’s look at Person.MyPeople
I’m going to look at the range of values. (max(businessentityid) and max(mentorid))
Let’s see how that does.
That’s interesting, so the largest MentorId is 10388.
So i want my integer value to be at least that and the range is that … so that range.
Okay, so just to avoid off by one errors, I’m going to call that nine and I’m going to shrink that range a bit.
So that sounds good. Let’s create it.
Oh, it doesn’t exist yet. There we go. That sounds good.
Now that s_dosomething is defined, I want to call it a lot of times concurrently.
This is done with the utility that I have.
This is the concurrency launcher and you can look at the definition of it, which is in my blog.
And after it’s compiled, I’m going to launch it.
And we wait a little while, and oh look there’s a deadlock and as things start blocking up, I expect a few more.
More blocking, more deadlocks, more, more, yeah, there we go.
Okay, I would say that we reproduced the problem.
I’m going to cancel that.
Let’s launch it again, just to make sure.
Deadlock, deadlock, deadlock deadlock, ahh, very good. Isn’t that great?
I can reproduce this problem at will.
Let’s see if my theory is sound.
My theory is that if I create this index (call it ix_myindex) name it something better later. (Tada)
My theory is that once that’s created, I should no longer see any concurrency problems.
Let’s launch it again.
I’m hoping to see nothing,
I’m hoping to see nothing,
It’s looking pretty good.
Wow, It completed.
So It completed which tells me that it has executed a procedure 500000 times.
All concurrently 50 at a time without a single deadlock.
That is encouraging news.
So now I would feel comfortable recommending this index as a solution to the set of deadlocks that I was troubleshooting.

Lemme explain… No there’s no time, Lemme Sum Up

So in conclusion, that’s an example of how I use my utility.
I find it useful enough that I’ve added a shortcut to my windows taskbar.
That gives me one-click concurrent database activity.
It helps me look closer at issues where processes suffer from unlucky timing or other problems that are hard to reproduce because of fussy concurrency conditions.
I hope you found this useful.

Cheers.

-- Comments (7)

7 Comments »

[…] Troubleshooting Concurrency – A Demo – Michael J. Swart (Blog|Twitter) […]

Pingback by (SFTW) SQL Server Links 14/02/14 • John Sansom — February 14, 2014 @ 4:50 am
I was hoping you’d show the updated execution plan for the stored procedure after adding the new index, but otherwise this was excellent.
Thanks for sharing!

Comment by ross — February 14, 2014 @ 6:35 am
D’oh! You’re absolutely right! I should have. If this were a written article, I could have edited it to include it.
Thanks for watching Ross.

Comment by Michael J. Swart — February 14, 2014 @ 8:02 am
[…] J Swart’s Troubleshooting Concurrency – A Demo is the 3rd in an interesting series of posts about concurrency testing on SQL […]

Pingback by My links of the week – February 16, 2014 | R4 — February 16, 2014 @ 8:00 pm
Hi,

Very interesting. If one is encountering deadlocks involving > 1 sproc, is it possible to use your framework to validate that fixes actually do fix the concurrency issue? The example shown here involves a single sproc deadlocking against other threads executing the same sproc…

Comment by Lorrin — April 10, 2014 @ 11:38 am
You bet Lorrin, The same method can be used to validate a fix to a deadlock involving more than one sproc.
This example only used one stored procedure. If you want to handle two sprocs, you define s_DoSomething so that it can call more than one procedure. To do that, look at Building Concurrency Tests and look for the heading “Calling More Than One Procedure”.

I actually use this technique myself. Deadlocks with more than one participating procedure are more common than deadlocks which involve a single procedure.

Comment by Michael J. Swart — April 10, 2014 @ 2:19 pm
[…] Part Three: Troubleshooting Concurrency – A Demo […]

Pingback by Generating Concurrent Activity | Michael J. Swart — April 25, 2017 @ 11:07 am