September, 2010 | Michael J. Swart

September 24, 2010

How Would You Explain Your Job to Someone in 1950?

Filed under: Miscelleaneous SQL — Michael J. Swart @ 11:40 am

So I was talking with a friend of mine, Dariusz Grabka (blog | twitter). We were talking about what careers we would have if we had lived during different times in history. I realized that our careers today are so specialized that not only did they not exist 50 years ago, but they probably couldn’t be explained very easily to someone 50 years ago.

But it’s a fun exercise. Here’s my take on explaining my job to someone from 1950.

So I spent some time imagining myself in my grandfather’s shoes circa 1950 and what kind of context he would need to understand the job I have now. I explain the I.T. industry and then mention my part in it:

My Job

ME:
So there’s this computing machine called a computer that can be used to perform all sorts of calculations (There’s one the military uses outside of Baltimore called the ENIAC).

In 2010, these computers are so inexpensive and small that these machines are used everywhere. And they’re so powerful that they’re being used as communication tools for all sorts of information or data.

So there is a large demand to store all this data. I mean there’s thousands of LIBRARIES worth of information to store. And it can all be stored on something the size of a small brick!

Frequently, this information has to be organized so that it can be located and read. We organize the data using different techniques and we call this collection of information a database.

My job is to organize the data so that storing and retrieving the correct data is done efficiently.

Fellow From 1950: So, you’e like a librarian for computing machines?

Me: Exactly!

-- Comments (4)

September 20, 2010

T-SQL Tuesday #10 Round up, An index of participants

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: #TSQL2sDay, index, indexes, indices — Michael J. Swart @ 8:19 am

So, about a week ago, I had the honor of being your host for T-SQL Tuesday for the month of September. And an amazing 22 bloggers responded to that invitation. I want to thank each of the bloggers for the time they put into their contribution. We really had some high quality contributions this month.

And another thanks to Adam Machanic for letting me host this month.

Here are the participating blog posts:

1. Brad Schulz Little Known Index Facts
First to go live with his post was Brad Schulz, awesome as ever. He talks about Fill Factor percents and Unique Indexes. He shows his knack for writing in a way that after having read his post, you’ll never forget what he explained.
What Struck Me: Before he gets into the topic at hand, he throws his support behind team indexes (and takes a few shots at team indices) a must read by itself, I pointed my family at the post and heard out-loud laughs.

2. Rob Farley (@Rob_Farley) Table? No such thing…
Rob Farley explains how there is no such thing as a table. Essentially he explains how a clustered index is the table. (Good on ya Rob).
What Struck Me: Spoofing the Matrix:
“Don’t try to look up information in the table. That’s impossible. Only try to realize the truth.”
“What’s that?”
“There is no table.”

3. Pinal Dave (@pinaldave) Disabled Index and Index Levels and B-Tree
Pinal Dave tackles the question “What will be the status of the B-Tree structure when index is disabled?”
What Struck Me: He investigates and explains the answer thoroughly. He also includes a personal touch with his post when he talks about a mentor Vinod Kumar.

4. Noel McKinney (@NoelMcKinney) Constraints & Unused Indexes
Noel (team indexes) explains how an unused index (an index with few reads & writes) can still be useful.
What Struck Me: I like Noel’s writing style. His articles make me feel like he’s sitting right beside me explaining something at my computer. Maybe it’s his first-person narrative or the fact that he throws in his own experiences. Whichever it is it works.

5. Nicholas Cain (@SirSQL) Applying Indexes To Replicated Tables
Any time I read from someone who has tackled replication in a successful way, I’m immediately impressed. Nicholas explains how to extend a replication solution so that the subscriber has indexes that the publisher doesn’t.
What Struck Me: A must read for any one who uses replication (or who is considering replication)

6. Richard Douglas (@SQLRich) Unused Indexes (Indices)
For the first time, Richard Douglas (Team Indices) contributes a post for T-Sql Tuesday. He explains how indexes are not all rainbows and unicorns all the time. He shines light on problem indices that don’t do anyone any good: Unused Indices.
What Struck Me: With a working vocabulary that includes words like “amended” and “gracious”, Richard gives “credence” to the thought of switching to Team Indices.

7. Erin Stellato (@erinstellato) Desperately Seeking Seeks
Erin Stellato explains the steps she takes in her search for a seek (while simultaneously reminding us of 80’s movies).
What Struck Me: What I like about Erin’s take is that she includes in her explanation an Oracle solution. Keeping other platforms in mind keeps everyone honest.

8. Michael J Swart (@MJSwart) Guts Of A Clustered Index
A post by yours truly with fancy Visio pictures. I try to explain what physically happens when a clustered index grows.
What Struck Me: A frisbee.

9. Bob Pusateri (@SQLBob) Potty Chairs And Duplicate Indexes
Provides an embellished script that has been making the blog rounds to report on redundant indexes. Good job Bob!
What Struck Me: A gold painted potty chair! I can’t describe it properly, you just have to see for yourself.

10. Jeremiah Peschka (@peschkaj) Indexes
Jeremiah takes a (deliberately) rambling look at indexes, b-trees and … erm… horology.
What Struck Me: His post is the number one post here ranked in order of likeness to a daily show monologue. (And I love animated gif lolcats). It’s much appreciated.

Start humming "Girl from Ipanema" to yourself

11. Jen McCown (@MidnightDBA) Brief Intro To Indexes and INCLUDE
Jen pulls her post from the archives to give an introduction to indexes.
What Struck Me: She explains how INCLUDED columns in an index are only included at the leaf levels! This is something I didn’t know until now. It reminds me of something mentioned by Brad Schulz’s post where Unique Indexes can also do without columns in the index nodes.

12. Gail Shaw (@SQLintheWILD) One wide index or multiple narrow indexes?
Gail writes an authoritative post on indexing strategy. Hers is one of the few contribution that addresses the question “Which index is best?”
What Struck Me: I like her conclusion about the so-called “strategy” of having single-column indexes on each column of the table. And her post is a must read for anyone who isn’t clear on this point.

13. Tom LaRock (@SQLRockstar) Big Mac Index
The SQL Rockstar explains indexes using a Big Mac analogy it’s an analogy I guarantee no-one has considered before (making this post legendary).
What Struck Me: A Microsoft Connect Item on the Big Mac Index. Wow.

14. Diane McNurlan (@SQLDevGal) Top 10 Worst Indexing Practices
Another member of team indexes, Diane covered important index topics in a clear way that it can almost be used as a checklist (for creating indexes or querying them).
What Struck Me: Diane is a very competent writer and although I’ve got a bunch of other posts to read for this round up I found myself going through more of her articles. And subscribing to her blog’s feed. Us database developers have to stick together.

15. AJ Mendo (@SQLAJ) Indexes 101
In AJ’s own words, he talks “about what indexes are, why we can benefit from their use and some of the costs associated with using indexes.”
What Struck Me: Along with the standard clustered vs. non-clustered, AJ gives a brief intro to some of the lesser known kinds of indexes: Fulltext, Spatial, XML etc…

16. Jason Strate (@StrateSQL) An Index On Indexing
Just like the cheesy 80’s sitcom clip show, Jason gives a list of his series on Index analysis. It’s a good sign though, only the best 80’s sitcoms ever got clip shows (Joanie Loves Chachi notwithstanding).
What Struck Me: In fact in my own “cheesy clip show” post last year I ran out of clips and ended up linking to Jason’s site.

17. Dave Levy (@Dave_Levy) How is Fill Factor Impacting My Indexes?
Dave queries some DMVs to help you analyze how effective your fill factor settings are.
What Struck Me: Okay, I have to call this one out. If you’re a dba, and you read only one index post this month make it mine, but if you have time to read two read Dave Levy’s. His fill factor script belongs in every DBA’s toolkit alongside queries that analyze things like missing index or fragmentation levels.

18. Jeremy Carter Cut Your Index Bloat
Jeremy explains the process he follows to identify and remove unused indexes.
What Struck Me: Jeremy is probably the most underexposed blogger on this list. There are a ton of MVP participants this month include honest-to-God book authors! Jeremy holds his own quite easily alongside them.

19. Andy Lohn (@SQLQuill) Partitioned Indexes and Data Types
Andy tackles index partitioning in this post. Specifically, he explains two methods of moving from non-partitioned indexes to partitioned indexes.
What Struck Me: I like that this is not a hypothetical situation. Andy and his colleagues are facing this problem as we speak (or as I write). And they are testing to determine which method is faster. The results are not in yet making his post a bit of a cliff-hanger.

20. Robert L Davis (@SQLSoldier) To Be or Not To Be (a B-tree)
Robert (aka SQL Soldier) explains how XML indexes are structured.
What Struck Me: Oh the things you can do using the dedicated administrator connection!!

21. Aaron Nelson (@SQLvariant) Picture The Indexes
A great analogy for indexes is the phone book. Aaron Nelson runs with it.
What Struck Me: The post and comments stretch that metaphor about as far as it can go.

22. Jason Brimhall (@sqlrnnr) Indexes And Blobs
Ending this month’s T-SQL Tuesday, Jason (team indexes) refurbishes an old script of his to report on BLOB index columns.
What Struck Me: He mentions something that I’m surprised never got covered by anyone else this month: Blob’s make lousy indexes.

Encore!

(Update 9/20/2010) One late addition (sorry I missed it, there was no comment at the invite post).

23. Steve Jones (@WayOutWest) Remembering To Index
In Steve’s own words, he reminds us that “Indexing is important, but you can overdo it.”
What Struck Me: He wrote about a third-party db he came across which include the most over-indexed table I’ve ever seen in my life.

-- Comments (22)

September 14, 2010

Guts Of An Clustered Index

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, clustered index — Michael J. Swart @ 8:31 am

So T-SQL Tuesday rolls around again. And this month it’s hosted by some handsome devil named Michael J Swart. I got to pick the topic, so you might expect great things. Well, we’ll see (no pressure) Here we go.

I want to show visually what’s happening on the disk when you create and grow a clustered index. I think this post will be most useful if you’re an application developer. You know what a table is and what a clustered index is, but beyond that, it’s not clear where SQL Server keeps your data physically. Hopefully you’ll soon find it clearer.

So I’m going to keep it simple. I’m going to show a simple table with a clustered index on it’s identity column (no partitions, no compression, one filegroup on one file). Remember a clustered index defines the ordering of the table and so essentially a clustered index is the table. That’s good, because it means that in the example below, I’ll only talk about one thing. You can call it either the table or the clustered index, it’s the same thing.

First Rule of Database Storage…

First thing you should know about database storage is that everything, I mean absolutely everything in the database is stored in pages. Each page is a chunk of storage on disk exactly 8 kilobytes in size, no more, no less. You’ll see that a page never contains records from different tables.

Create A Table

CREATE TABLE TEST (
  	id INT IDENTITY NOT NULL,
  	variableLength VARCHAR (MAX) NOT NULL DEFAULT (REPLICATE('#', 1000)),
  	CONSTRAINT pk_TEST PRIMARY KEY CLUSTERED (id)
);

Nothing happens on the disk here. There is no new space that is reserved in your database for this new table. Think of SQL Server as a lazy librarian. It’s not going to find shelves for books that don’t exist. But if you could look closely, you would see that there are changes to the system. SQL Server records in system tables that there’s a new table. Here’s one of the system tables:

New Row In A System Table

This particular system table describes the space that’s allocated to table TEST. It’s the system table you go to when you want to find where your table is located on disk. Right now no space is allocated until we create some data.

First Row

insert TEST default values;

Okay, something’s happening now. Two new pages are allocated for table TEST after just a single row.

Two New Pages

Page Addresses: Notice that pages always have a name (or address) that looks like 1:153. The first digit is the file number in that database (In our example, the digit will always be 1 because the database uses only one file). The second number here (153) is an incrementing counter that identifies the page number.

So a page with the address (1:153) is the 153^rd page (or stretch of 8 kilobytes) in the 1^st file of the current database.

The IAM page: The first you see here is an Index Allocation Map. If database tables were books on shelves, the IAM page would be like a floor plan of the library. Paul Randal wrote the definitive web resource on this stuff and so it’s something I’m not going to cover any deeper.

The Data Page: This is where your data is. There’s a header that SQL Server uses for some context. And there’s also a footer that acts helps track the order for the rows on a page. But still there’s roughly 8000 bytes left on data pages for your actual data. In my example, the rows I create are pretty big (about 1kb) and so there’s space for about six more rows in this page. What happens when we try to add seven?

First Index Node

set nocount on
insert TEST default values;
GO 7

Okay, There’s two data pages now. The first one, 1:153 got full and SQL Server allocated a second one, 1:156, for the eighth row:

New Index Page

The Index Page: A new kind of page is created. If you’re looking for a single row and you know the id, you don’t start at the first data page. It’s better to look at the root page. Using the information in the root page, you know the address of the data page your row is in.

In fact a page address and a integer Id column are pretty tiny fields and so there’s a lot of space in this index field. There’s space in this page to index over 600 pages of data. That works out to over 4000 rows (inside the 600 data pages). What happens when you create one too many?

More and More Rows

set nocount on
insert TEST (variableLength) default values;
GO 4347

New Index Level

There’s a new level!

Finding Row 42: So let’s say that you want to look up row with Id=42. First read the root page (1:791). This information tells you to look at page (1:155). You’re still not at a data page. So reading (1:155), you know to look at (1:160). And voila, after three page reads, there’s your row.

B-Trees: Tables that are quite large or that have really wide indexes can have more and more levels. This structure is called a B-Tree (B for balanced). The idea is that no matter where stuff is inserted or deleted, it’s not really difficult to shuffle page pointers around to make the tree balanced.

See For Yourself

This is all stuff that you can see for yourself. The following are examples of commands I’ve used. They’re not documented, but they’re extremely useful when trying to understand this stuff.

SELECT pl.*
FROM sys.system_internals_allocation_units iau
JOIN sys.partitions p
	ON iau.container_id = p.partition_id
CROSS APPLY sys.fn_PhysLocCracker(iau.root_page) as pl
WHERE p.object_id = OBJECT_ID('test')
	AND p.index_id = 1 -- the clustered index
	AND iau.type = 1 -- in row data
 
DBCC PAGE('sandbox', 1, 153, 3);
 
DBCC IND('sandbox', 'test',1);

So that’s it. There’s tons left to say about all this stuff (metric tons!) and I’m hoping that the other T-SQL Tuesday bloggers will cover some of it.

-- Comments (5)

September 7, 2010

Invitation to Participate in T-SQL Tuesday #10 – Indexes

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, index, indexes, indices — Michael J. Swart @ 10:00 am

What are your thoughts on Database Indexes?

(Update 09/20/2010: Find the roundup here) September already! Summer holidays are over and that means no more lazing around. The second Tuesday is fast approaching and that means that T-SQL Tuesday is almost here. And this month I will be your humble host and the topic is Indexes.

A book's index is not that different than a database index.

Indexes

Indexes are strange things. You never need to explicitly create one to create a fully-functional database, but if you want a database to perform well, they’re indispensable.

And there are so many aspects to write about! Like internals, covering, clustered, xml, fulltext, b-trees, hints, maintenance of, included columns, filtered, redundant, missing and tons more.

In fact my SQL Server 2008 Administrator’s Pocket Consultant (The first handy textbook I could grab) has an index entry on “indexes” that has 22 sub-entries.

About T-SQL Tuesday

For those not familiar with T-SQL Tuesday, it’s an idea hatched by Adam Machanic (Blog|Twitter). A monthly event, T-SQL Tuesday is a chance for any blogger to write and post an article on a single topic determined by the host (i.e. yours truly). Leave a comment here on that day and a day or two later, I’ll include your post in the round up.

Follow These Rules

The post must go live on your blog between 00:00 GMT Tuesday, September 14, 2010 and 00:00 GMT Wednesday, September 15, 2010.
In other words, set your sql server date, time and timezone properly and run this script:
IF GETUTCDATE() BETWEEN '20100914' AND '20100915' SELECT 'You Can Post' ELSE SELECT 'Not Time To Post'
IF GETUTCDATE() BETWEEN '20100914' AND '20100915' SELECT 'You Can Post' ELSE SELECT 'Not Time To Post'
Your post has to link back to the hosting blog, and the link must be anchored from the logo (found above) which must also appear at the top of the post
Leave a comment here (below) or I won’t be able to find your post. I expect trackbacks work properly, but if they don’t check back here just in case and leave a comment if necessary.

We also encourage you to …

… include a reference to T-SQL Tuesday in the title of your post. (The more we bloggers advertise T-SQL Tuesday, the more we bloggers get T-SQL tuesday readers)
… tweet using the hash tag #TSQL2sDay to follow links and other relevant conversations.
… consider hosting T-SQL Tuesday yourself. If you’re interested let Adam Machanic Know. If you’ve participated in two T-SQL Tuesdays previously and you don’t let your blog go stale (blog once a month for the last six months) then he’ll put you in the rotation.
And out of curiosity… Are you team indexes or are you team indices? Personally I’m team indexes but I work with a guy who seems to be so team indices that I’ve caught him saying indice for the singular!

That’s it! Good luck! Can’t wait to see what you guys have in store.

Your humble host,
Michael J. Swart

-- Comments (48)