Michael J. Swart

November 16, 2011

Be Quick About Reading The Data You Asked For

So last week, I mentioned that if an application takes it’s time processing data returned from a query it executed, it will not cause ADO.net to time out. This week I’m going to qualify that and on the way, I’ll explain a problem associated with “lazy” readers. I’ll talk about what lazy readers look like from SQL Server’s point of view. And I’ll talk about a potential pitfall with the C# keyword “yield“.

So say you’ve written an application that accesses the database using ADO.net’s SqlDataReader. As you process the rows returned from the server, you might take a while between one Read() call and the next. If that happens, it can lead to problems. In other words:

If…

  • Your application has issued a query using a DataReader,
  • And the amount of data returned is more than 8 kilobytes (the size of an SNI packet),
  • And your application takes its time processing rows (i.e. time between Read() calls),

Then…

  • Your database query is still active!
  • And your transaction is still open! (Remember that even if you don’t use an explicit BEGIN TRANSACTION, SQL Server’s AUTOCOMMIT mode means that each query is it’s own mini-transaction and that transaction isn’t over yet!)
  • Memory used by the database for your query is still in use
  • And resource locks that were taken to process your query are still held!

But you might get lucky…

  • For example, you might be using READ COMMITTED SNAPSHOT isolation level (which is “READ COMMITTED with more awesome” according to Kendra Little). Because even though your query is still using the same number of resources, there’s a smaller risk of seeing concurrency issues.
  • Or if you’re using the default isolation level READ COMMITTED (which you probably are) SQL Server might have given up some of the locks that have been acquired while processing your rows. In this case, the number of locks held at any time is fairly small. And you might not see any problems (at first).

Or you might not be so lucky…

  • More likely, your query will be doing something that’s not simple. It could be using a sort, or parallelism or a particularly involved join. In that case, the query will hold onto the locks it uses without releasing them.
  • Then you might have concurrency issues.
  • Concurrency problems lead to deadlocks and excessive blocking.
  • And that can lead to…

Timeouts!
All because of the processing  on the client side took too long.

But How Can You Tell?
How do you know if this is a problem in your case?

From The Server Side

Look at the wait stats that are collected with sys.dm_os_wait_stats system view. (Or even better, look at Glenn Berry’s DMV query that reports on the same thing). Now take a look at the top wait statistics. If SQL Server is waiting on a client to process data, you’ll see ASYNC_NETWORK_IO as one of the top wait statistics. One interesting note about SQL Server 2012. I noticed that you might see this wait stat instead: PREEMPTIVE_OS_WAITFORSINGLEOBJECT.

That tells you whether this has been a problem since the last server restart. But if you want to know if the database is waiting on some client right now, you can use sp_whoisactive by Adam Machanic. (You could also query sys.dm_exec_requests, or in a pinch sys.sysprocesses). Again, if you see a query waiting on things like ASYNC_NETWORK_IO or PREEMPTIVE_OS_WAITFORSINGLEOBJECT, then that query is waiting for a lazy client application to read its data.

From The Client Side

Make sure your DataReaders are quick. You can tell pretty easily most of the time. Check out these two examples:

IEnumerable<int> ReturnASum()
{
	int sum = 0;
	using (SqlConnection conn = new SqlConnection("blah blah"))
	using(SqlCommand cmd = new SqlCommand(sqlStatement, conn))
	{
		conn.Open();
		using (SqlDataReader dr = cmd.ExecuteReader())
		{
			while (dr.Read())
			{
				sum += dr.GetInt32(0);
			}
		}
	}
	return sum;
}

Pretty straightforward, but compare that to the next example. This next example might be a problem:

void SendDataIntoSpace()
{
	using (SqlConnection conn = new SqlConnection("blah blah"))
	using(SqlCommand cmd = new SqlCommand(sqlStatement, conn))
	{
		conn.Open();
		using (SqlDataReader dr = cmd.ExecuteReader())
		{
			while (dr.Read())
			{
				SendToMarsRover(dr.GetInt32(0));
			}
		}
	}
}

Sometimes it can be tricky to tell whether code will be processed quickly or not. If your data rows are yielded to the calling procedure, then the processing speed is no longer under your control. In this next example, we can’t tell by inspection how quickly the data will be processed. To do that, we have to look at all the places where this method is called.

IEnumerable<int> LetSomeoneElseReadDataAtTheirOwnPace()
{
	using (SqlConnection conn = new SqlConnection("blah blah"))
	using(SqlCommand cmd = new SqlCommand(sqlStatement, conn))
	{
		conn.Open();
		using (SqlDataReader dr = cmd.ExecuteReader())
		{
			while (dr.Read())
			{
				yield return dr.GetInt32(0);
			}
		}
	}
}

A good data access layer shouldn’t make the rest of the application care about how fast they process rows. If you’re familiar with yield already, then you have to weigh memory benefits of yield against concurrency problems and make an informed decision (good luck!)

Powered by WordPress