clustered index | Michael J. Swart Michael J. Swart

March 23, 2011

Last Word On Scans

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: clustered index, index scan, index seek, nonclustered index, rid lookup, table scan — Michael J. Swart @ 12:00 pm

So last week I talked about all the different ways that Microsoft uses the word scans when in SQL Server. I got together a quick reference guide that describes scans terminology in terms of the SQL Server performance counters (because those counters were granular and trustworthy). But I didn’t really go into as much detail as I could have when I dealt with query plans. Or as Rob Farley prefers execution plans

Sorting Out Execution Plan Scans

Showplan Operators That Read From Tables

So here are the execution plan operators that access data from tables. I give a short description of each operator and I explain how that affect the performance counters (Full/Range/Probe). I won’t explain what each one means, I’ll let others do that. But if you know a bit about internals, the icons are actually quite well designed.

Table Scan (Showplan Operator Documentation)
This operator corresponds with the Full Scans/sec performance counter. I actually don’t encounter table scans very often and that’s mostly because most of the tables I deal with have clustered indexes defined. You’ll only ever see table scans on heaps.

Clustered Index Scan (Showplan Operator Documentation)
This operator corresponds with the Full Scans/sec performance counter. This is your run-of-the-mill full scan on a clustered index. Notice the brackets in the showplan icon. Those brackets are supposed to indicate a clustered index. Contrast this with the nonclustered index scan:

Nonclustered Index Scan (Showplan Operator Documentation)
This operator corresponds with the Full Scans/sec performance counter. And this is a full scan of a nonclustered index. You’d probably see this if the nonclustered index contains all the columns to select. It’s easier to scan because it’s a narrower index than the clustered index.

Clustered Index Seek (Showplan Operator Documentation)
This operator corresponds with the Probe Scans/sec performance counter when the clustered index is unique (as with a primary key) and the “seek predicate” includes the key columns needed to return at most one record. But if that’s not the case, then this operator will count towards the performance counter Range Scans/sec.

Nonclustered Index Seek (Showplan Operator Documentation)
Exactly like its Clustered Index counterpart, this operator corresponds with the Probe Scans/sec performance counter when SQL Server can determine that when it looks up the requested row it is sure to get at most one row (i.e. the index is unique etc…) Otherwise it counts towards the performance counter Range Scans/sec.

Key Lookup (Showplan Operator Documentation)
This operator is also known as a bookmark lookup operator. It always counts towards the Probe Scans/sec performance counter. It’s interesting that even though it gets a single record at a time, this operator is often seen as a symptom of poor performance. That’s because the number of executions can get big. Many executions can kill the performance of the query. If you focus on the performance counters, you’ll notice that each of these executions will count towards the Probe Scans performance counter.

RID Lookup (Showplan Operator Documentation)
Just like the key lookup, this lookup counts towards the Probe Scans/sec performance counter. I’ve written about this operator before in an article called Get Rid of RID Lookups.

Update On Scan Quick Reference

In the meantime, thanks to Kendra Little’s script to search system tables, I discovered one more dmv that I had overlooked last week, sys.dm_db_index_operational_stats, and so I include it here:

-- Comments (15)

September 14, 2010

Guts Of An Clustered Index

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, clustered index — Michael J. Swart @ 8:31 am

So T-SQL Tuesday rolls around again. And this month it’s hosted by some handsome devil named Michael J Swart. I got to pick the topic, so you might expect great things. Well, we’ll see (no pressure) Here we go.

I want to show visually what’s happening on the disk when you create and grow a clustered index. I think this post will be most useful if you’re an application developer. You know what a table is and what a clustered index is, but beyond that, it’s not clear where SQL Server keeps your data physically. Hopefully you’ll soon find it clearer.

So I’m going to keep it simple. I’m going to show a simple table with a clustered index on it’s identity column (no partitions, no compression, one filegroup on one file). Remember a clustered index defines the ordering of the table and so essentially a clustered index is the table. That’s good, because it means that in the example below, I’ll only talk about one thing. You can call it either the table or the clustered index, it’s the same thing.

First Rule of Database Storage…

First thing you should know about database storage is that everything, I mean absolutely everything in the database is stored in pages. Each page is a chunk of storage on disk exactly 8 kilobytes in size, no more, no less. You’ll see that a page never contains records from different tables.

Create A Table

CREATE TABLE TEST (
  	id INT IDENTITY NOT NULL,
  	variableLength VARCHAR (MAX) NOT NULL DEFAULT (REPLICATE('#', 1000)),
  	CONSTRAINT pk_TEST PRIMARY KEY CLUSTERED (id)
);

Nothing happens on the disk here. There is no new space that is reserved in your database for this new table. Think of SQL Server as a lazy librarian. It’s not going to find shelves for books that don’t exist. But if you could look closely, you would see that there are changes to the system. SQL Server records in system tables that there’s a new table. Here’s one of the system tables:

New Row In A System Table

This particular system table describes the space that’s allocated to table TEST. It’s the system table you go to when you want to find where your table is located on disk. Right now no space is allocated until we create some data.

First Row

insert TEST default values;

Okay, something’s happening now. Two new pages are allocated for table TEST after just a single row.

Two New Pages

Page Addresses: Notice that pages always have a name (or address) that looks like 1:153. The first digit is the file number in that database (In our example, the digit will always be 1 because the database uses only one file). The second number here (153) is an incrementing counter that identifies the page number.

So a page with the address (1:153) is the 153^rd page (or stretch of 8 kilobytes) in the 1^st file of the current database.

The IAM page: The first you see here is an Index Allocation Map. If database tables were books on shelves, the IAM page would be like a floor plan of the library. Paul Randal wrote the definitive web resource on this stuff and so it’s something I’m not going to cover any deeper.

The Data Page: This is where your data is. There’s a header that SQL Server uses for some context. And there’s also a footer that acts helps track the order for the rows on a page. But still there’s roughly 8000 bytes left on data pages for your actual data. In my example, the rows I create are pretty big (about 1kb) and so there’s space for about six more rows in this page. What happens when we try to add seven?

First Index Node

set nocount on
insert TEST default values;
GO 7

Okay, There’s two data pages now. The first one, 1:153 got full and SQL Server allocated a second one, 1:156, for the eighth row:

New Index Page

The Index Page: A new kind of page is created. If you’re looking for a single row and you know the id, you don’t start at the first data page. It’s better to look at the root page. Using the information in the root page, you know the address of the data page your row is in.

In fact a page address and a integer Id column are pretty tiny fields and so there’s a lot of space in this index field. There’s space in this page to index over 600 pages of data. That works out to over 4000 rows (inside the 600 data pages). What happens when you create one too many?

More and More Rows

set nocount on
insert TEST (variableLength) default values;
GO 4347

New Index Level

There’s a new level!

Finding Row 42: So let’s say that you want to look up row with Id=42. First read the root page (1:791). This information tells you to look at page (1:155). You’re still not at a data page. So reading (1:155), you know to look at (1:160). And voila, after three page reads, there’s your row.

B-Trees: Tables that are quite large or that have really wide indexes can have more and more levels. This structure is called a B-Tree (B for balanced). The idea is that no matter where stuff is inserted or deleted, it’s not really difficult to shuffle page pointers around to make the tree balanced.

See For Yourself

This is all stuff that you can see for yourself. The following are examples of commands I’ve used. They’re not documented, but they’re extremely useful when trying to understand this stuff.

SELECT pl.*
FROM sys.system_internals_allocation_units iau
JOIN sys.partitions p
	ON iau.container_id = p.partition_id
CROSS APPLY sys.fn_PhysLocCracker(iau.root_page) as pl
WHERE p.object_id = OBJECT_ID('test')
	AND p.index_id = 1 -- the clustered index
	AND iau.type = 1 -- in row data
 
DBCC PAGE('sandbox', 1, 153, 3);
 
DBCC IND('sandbox', 'test',1);

So that’s it. There’s tons left to say about all this stuff (metric tons!) and I’m hoping that the other T-SQL Tuesday bloggers will cover some of it.

-- Comments (5)