Michael J. Swart

March 16, 2011

What Does Microsoft Mean By Scans?

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm

Takeaway: There are a number of ways Microsoft lets you measure database scans. But Microsoft doesn’t use the word “scan” consistently. I sort through them all here.

I want to tell you about the investigation that led to this blog post. As part of a SQL Risk Assessment Program, we found an issue that identified a high number of full scans versus index searches. It seemed natural to start digging into where these scans were occurring.

The starting point was the report I received. It mentioned the Full Scans/sec performance counter. I needed to find out where these scans were occurring. But looking closer at the various places scans are reported in SQL Server, I quickly realized that I wasn’t comparing apples and oranges. And there are a lot of places Microsoft reports scans. Here are the ones that I care about:

  • Performance Counters: (Access Methods) Full Scans Explained as “Number of unrestricted full scans. These can either be base table or full index scans.”
  • Performance Counters: (Access Methods) Range Scans Explained as “Number of qualified range scans through indexes.”
  • Performance Counters: (Access Methods) Probe Scans Explained as “Number of probe scans per second that are used to find at most one single qualified row in an index or base table directly.”
  • Scan:Started Event As seen in profiler for example. Books online says this event “occurs when a table or index scan is started.”
  • Scan:Stopped Event The (supposed) counterpart of the Scan:Started event.
  • Various Execution Plan Operators Table Scans, Index Scans or Seeks
  • Scan Count as reported with SET STATISTICS IO. Documented as  “Number of index or table scans performed.”
  • sys.dm_db_index_usage_stats (user_scans) Simply documented as “Number of scans by user queries.”

Before trusting these numbers, I had to find out exactly what they measured. And surprisingly – or maybe not – Microsoft doesn’t use the word scan in the same way for each place it’s reported above.

... I do not think it means what you think it means

Demo Time

I’m going to show you a modified set of scripts that I used in my digging. You can skip the demo if you like, but it’s interesting. First the setup:

use tempdb
 
create table clusteredtable (
    id int not null primary key, -- this becomes the clustered index
    filler char(500) default 'hey',
    doubleId int not null
);
 
create nonclustered index ix_ct 
    on clusteredtable(doubleId)
    include (filler);
 
insert clusteredtable (id, doubleId)
select top (50)
    ROW_NUMBER() OVER(ORDER BY (SELECT 1)),
    2 * ROW_NUMBER() OVER(ORDER BY (SELECT 1))
from  sys.columns;
 
create table heaptable
(
	id int not null,
	filler char(500) default 'hey',
	doubleId int not null
);
 
insert heaptable (id, doubleId)
select TOP (50) ROW_NUMBER() OVER(ORDER BY (SELECT 1)),
    2 * ROW_NUMBER() OVER(ORDER BY (SELECT 1))
from sys.columns

Full Scans on Clustered Indexes

--clustered index scan
select filler from clusteredtable order by id;
 
-- still a clustered index scan
select top 10 * from clusteredtable order by id;
 
--two clustered index scans
select a.filler, b.filler
from clusteredtable a
inner merge join clusteredtable b
on a.id = b.doubleid
 
/* A keen eye will notice:
All count towards full scans in os counters. (not probe or range)
All count toward user_scans in sys.dm_db_index_usage_stats
All figure in scan count when STATISTICS IO is set
All show up as clustered index scans operators in query plans
All generate Scan:Started events in profiler (but not Scan:Ended)
Scan:Started events also raised on statistics objects used to compile plans.
*/

Scans on a Heap

--table scan
select filler from heaptable;
 
-- still a table scan
select top 10 * from heaptable ;
 
/* If you care to look at these two queries you'll see
Both count towards full scans in os counters. (not probe or range) 
Both count toward user_scans in sys.dm_db_index_usage_stats
Both figure in scan count when STATISTICS IO is set
The query plans show both as table scans.
Both generate Scan:Started events in profiler 
Both generate Scan:Ended (unlike CI scans) but *reads* is always zero here no
   matter what the logical or physical reads actually are.
*/

After full scans, we start looking at range scans. That is, scans that may return more than one row, but the index columns are filtered somehow. (Remember, this means there’s no such thing as a range scan on a heap!)
Range Scans on Clustered Indexes

--clustered index seek
select filler 
from clusteredtable 
where id > 2;
 
/* This query ...
Counts towards range scans and index searches in os counters. (not probe or full)
Counts toward user_seeks in sys.dm_db_index_usage_stats
Figures in scan count when STATISTICS IO is set
Is shown as a clustered index seek operation in its query plan
Generates Scan:Started events in profiler (but not Scan:Ended)
*/

Seeks (into unique indexes)

--seek on primary or unique key
select filler 
from clusteredtable 
where id = 22;
 
/* So check out the seek. It ...
Counts towards probe scans and index searches in os counters. (not range or full)
Counts toward user_seeks in sys.dm_db_index_usage_stats
Does not count toward scan count in set statistics io
Is shown as a clustered index seek operations in its query plan
There are no Scan:Started/Stopped events generated
*/

Things That Surprised Me Most

  • The performance counters are extremely trustworthy once you know their definitions. The three kinds are Probe, Range and Full. Check them out, they won’t lie to you.
  • Full scans don’t need to read all rows while range scans might. This is important, because during analysis, I considered multiplying the number of scans with the number of pages in the index to assess the I/O impact and ended up throwing that idea out.
  • Probe scans (singleton seeks) are best demonstrated using unique indexes, not necessarily clustered indexes.
  • Scan:Started/Stopped events are garbage… pick another measurement.

Quick Reference

Because the scans tracked in the performance counters are so trustworthy, and granular, I can use them to clarify the other places that Microsoft reports on scans:
Table showing how Microsoft uses the term scan

What I Did Next

So that’s good. What I ended up doing after my investigation is going straight to the dmv sys.dm_db_index_usage_stats to find out which objects were getting scanned too much. It works out great because

  • these statistics are more reliable than the Scan:Started and Scan:Stopped events,
  • the dmv is gentler on the target system than combing through cached query plans.
  • and the dmv’s column user_scans indicate full scans only (not range scans) which are exactly the performance counters I was digging into.

So that’s how I got from having potentially too many full scans, to understanding exactly on which objects those scans were occurring. What I did after that is another story.

9 Comments »

  1. Inconceivable!!!!

    Comment by David Nelles — March 17, 2011 @ 3:36 pm

  2. […] What Does Microsoft Mean By Scans? – It’s that Database Whisperer full of artistic flair Michael J. Swart (Blog|Twitter) with a fantastic post here. […]

    Pingback by Something for the Weekend – SQL Server Links 18/03/11 | John Sansom - SQL Server DBA in the UK — March 18, 2011 @ 7:45 am

  3. For some reason, this post didn’t come through in my Google Reader subscriptions. I only found out you wrote a post because John Samson had mentioned it. The last post I see in the Reader is your T-SQL Tuesday post. Do you know why it didn’t come through?

    As you mention, a Full SCAN is not necessarily a “true” full scan… That’s important… it only depends on what the operator above the scan in the query tree asked for. In the TOP 10 query, the TOP operator only asked the SCAN operator for 10 rows, so it wasn’t really a “full” scan. The SCAN operator will only spew forth a row when commanded to and will stop when commanded to. The same is also true, of course, with Range Scans.

    Thanks for the detailed info. And I really enjoy your artwork, BTW.

    –Brad

    Comment by Brad Schulz — March 18, 2011 @ 12:19 pm

  4. Hey Brad, thanks for stopping by.

    I do not know why my last post didn’t come through. I’m very curious about that. Email me the exact feed url you’re using and I’ll see if I can track that down.

    Re: “Full scans not necessarily reading all rows”. It’s very important! And it surprised me and I don’t know if I did enough to highlight that fact. And in my demo, the third query (Ctrl+F for MERGE) is another example which does not scan all rows either.

    And thanks for the artwork compliment. It’s probably the funnest part of each week’s post 🙂

    Comment by Michael J. Swart — March 18, 2011 @ 1:49 pm

  5. More on 20/20 hindsight formatting.

    If I had to do this post over again, I’d put the SCAN terminology table up near the front. It’s probably the most valuable thing I got out of this investigation.

    And I’d make the demo less prominent.

    But I’d keep Inigo exactly where he is. (I originally had a joke about the TSA and “probe scans”. I think I chose wisely)

    Comment by Michael J. Swart — March 18, 2011 @ 1:51 pm

  6. […] @MJSwart posts What Does Microsoft Mean By Scans? Posted on March 18, 2011 by sqlmashup http://michaeljswart.com/2011/03/what-does-microsoft-mean-by-scans/ […]

    Pingback by @MJSwart posts What Does Microsoft Mean By Scans? | sqlmashup — March 18, 2011 @ 4:19 pm

  7. Hi Michael,

    Just happened on your post. Behind the times, I guess I am. Anyway, this is most valuable. When I wrote Perfmon back in the day, I also wrote a book that explained how to use controlled workloads to illustrate the counter behaviour. You may have never read “Optimizing Windows NT”, but you definitely got the message. Just last night I recommended this approach for determining what the counters told you. Today I find your post. Are the planets aligned or something?

    I want to thank you for the information in your post; I have been using these counters for a while but admit I have been too slack to define them as you have. THANK YOU. Mostly I want to commend you on your methodology. Nice job.

    –Russ Blake

    Comment by Russ Blake — November 24, 2011 @ 11:04 pm

  8. Russ, Thank you. I mentioned your feedback earlier today on twitter, so I’ll repeat it here: “This is why I blog. It’s feedback like this”

    Comment by Michael J. Swart — November 25, 2011 @ 2:34 pm

  9. […] What Does Microsoft Mean By Scans? – It’s that Database Whisperer full of artistic flair Michael J. Swart (Blog|Twitter) with a fantastic post here. […]

    Pingback by Something for the Weekend – SQL Server Links 18/03/11 — March 5, 2012 @ 2:16 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress