#TSQL2sDay | Michael J. Swart Michael J. Swart

March 3, 2014

T-SQL Tuesday #052: Argue Against A Popular Opinion

Filed under: SQLServerPedia Syndication — Tags: #TSQL2sDay — Michael J. Swart @ 10:07 pm

Update See the round up post here.
I was asked to host T-SQL Tuesday this month, T-SQL Tuesday #52. This marks my second time hosting and I’m happy to do it again.

Your writing assignment for March 2014 is to

pick a popular opinion and argue against it.

… or at least qualify it. Given any issue, people drift to two kinds of crowds. There’s the “it depends” crowd and there’s the “never ever” crowd. We tend to fall in with one crowd or the other. This month, I want you to find an “never ever” issue and argue for it (or conversely, find a “always” issue and argue against it).

I wonder how this month will go. It takes guts to go against common wisdom.

Never ever do this.

You don’t necessarily have to argue against a universal opinion, but it should at least be popular. I think that your choice of opinions is practically limitless:

Bob Duffy had a list of 10 interview questions that annoy SQL professionals. With some great topics there including GUIDs, Cursors, and heaps.
Google results for “SQL.Server should.never”
Fair’s fair. Here are the results for “SQL.Server should.always”
Does anyone want to have a shot at redeeming Microsoft Access?
Foreign Keys, SchmoreignKeys.
Check it out. SQL Server supports varbinary(max)! Ideal for json documents and xml documents (or both!)
Shrinking databases and/or log files (because of the fragmentation! God save us all from fragmentation!)

Here’s a little secret. This month’s topic is not for you. It’s for the readers. It’s a chance for you to give them a more nuanced understanding of a topic that they may not have given a lot of thought up until now. I’m a little curious myself.

The rules are the same as always:

Follow These Rules

The post must go live on your blog between 00:00 GMT Tuesday, March 11, 2014 and 00:00 GMT Wednesday, March 12, 2014.
In other words, set your sql server date, time and timezone properly and run this script:
IF GETUTCDATE() BETWEEN '20140311' AND '20140312' SELECT 'You Can Post' ELSE SELECT 'Not Time To Post'
IF GETUTCDATE() BETWEEN '20140311' AND '20140312' SELECT 'You Can Post' ELSE SELECT 'Not Time To Post'
Your post has to link back to this post, and the link must be anchored from the logo (found above) which must also appear at the top of the post
Leave a comment here (below) or I won’t be able to find your post.

That’s it! Good luck! Can’t wait to see what you have in store.

Your humble host,
Michael J. Swart

-- Comments (17)

April 8, 2013

T-SQL Tuesday #41 – Presenting and loving it?

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication — Tags: #TSQL2sDay — Michael J. Swart @ 8:00 pm

Bob Pusateri hosts this months T-SQL Tuesday. The topic is Presenting and Loving it! You’ll notice that the topic of this blog post takes the exclamation mark and turns it into a question mark. The thing is that I’m not much of a presenter. I’ve presented twice in my life to groups who were not coworkers. Three times if you count my best-man speech for my brother.

In Bob’s TSQL Tuesday invitation, he asks us “How did you come to love presenting?” and “When was the first time you presented and really loved it?” I’ll let you know when and if that ever happens.

But guess what? It turns out that I’m giving a talk to the Toronto SQL Server User Group tonight which will let me grow my presentation CV from two SQL talks to three. I bet no other blogger can brag that they’ve given 33% of their SQL Server Presentations on the same day as Bob’s T-SQL Tuesday about presentations.

Advice

So I’ve got a very short list of advice to give you today. I can only describe some of the things that I’ve done to prepare:

I follow Paul Randal’s advice at Configuring SSMS for presenting. I follow it to the letter. It’s easy to set up and makes SSMS readable.
I install Zoomit. Just in case. I don’t think I have any content that requires it, but you never know.
I installed SSMS Toolspack by Mladen Prajdić. The SQL Snippets feature (which are still way better than SSMS’s native snippets feature). Are great for keeping demo scripts at your fingertips. I have snippets for “d1”, “d2”, “d3” and “d4”. Which correspond to the demos I plan to show.

In Toronto?

So are you in Toronto today? Got plans? Come on out to the user group tonight. I expect it will be pretty fun. I’m really comfortable with the topic and so I think it will be a blast. So when was the first time I presented and really loved it? Ask me again tomorrow.

-- Comments (8)

July 17, 2012

T-SQL Tuesday #32 – A Day in the Life

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: #TSQL2sDay, database developer — Michael J. Swart @ 8:00 am

So it’s T-SQL Tuesday time again! This month it’s hosted by Erin Stellato. The topic this month is interesting. It’s got a bit of a twist. Participating bloggers are asked to keep a diary of sorts for what they did last Wednesday July, 11th 2011. I came into work last Wednesday feeling a slightly more interested in the day ahead. The day seemed a little more significant than other ordinary days. Would it be good, bad or average? It reminded me of a movie (Groundhog Day) about another day with a lot of scrutiny:

Anyway, it’s always interesting to compare someone’s job title with their actual activities and I can’t wait to read Erin’s round up. Here we go:

Michael J. Swart: Senior Database Developer

Morning

Some normal morning Activities. I took a new way into work this morning. I came in at 8:00 and got a large regular coffee from Tim Horton’s. The first thing I did after logging into my computer is open up OneNote for taking notes today. And now I start answering email:

Our company announced support for our product on SQL Server 2012 recently. I answered a few questions about what that means for developers exactly.
Looked at db statistics from the previous day. Decided to follow up on an un-patched database that was hogging the resources on its server. There was a bit of stress here. I have responsibility for something that I don’t have enough control over and I had to sort that out.
I launched some tests earlier this week. These tests exercise a utility I wrote that deletes data. Because the tests last a few days, I checked up on their progress to see whether they’re still running (They are).
Stand up time! (Scrum) Where I learn what others on my team are doing. 9:30

Coffee #2

Had a conversation with colleague about implementing a locking mechanism (similar to mutexes) at the database level.
Tuning a two line query (which used a view in a view in a view) which is more like an 80 line query. Did I tell you I dislike views? They mask complexity and stuff like this happens. Just now I wrote a quick and dirty program to help me expand view definitions inline. I couldn’t quickly find a free utility online. It’s a cool simple program that I would share if I hadn’t written it during work hours.
Woohoo! My first interruption of the day. There’s a particular third party app we use that doesn’t have a client timeout setting. Long running queries cause impatient users to hit refresh and now there’s two long running queries, then three, then four… With the help of sp_whoiscactive, we went from panic to understanding everything relevant. And the brought the server from unresponsive to responsive. The whole process lasted less than two minutes (Thanks again to Adam Machanic and his sp_whoisactive sproc!). Now I have to spend some extra time following up on this. I’ll probably take the next 15 minutes working with the issue tracking software we use to log this incident and make it possible for follow up activities.
Interruption 2! A mercifully short question: No you can’t restore a 2008 R2 backup onto a 2008 server.

Lunch time

Went to a used book to pick up Drawing on the Right Side of the Brain. My blog readers voted for Developing on the Right Side of the Brain to be my next post (published last week here) and I don’t have a well-thought-out idea for what the post should be about. The book should evoke some ideas. We’ll see how it goes.

Afternoon

More time looking at the poor performing queries I mentioned earlier. Finally got to the bottom of it but with no easy fixes, it was time to discuss other options. I had a meeting about that. Some hard choices here. Not fun. Let’s move on.

3:00 Coffee #3

Started on a new project just now. I have to create a utility that takes a database and makes everything anonymous (no personally identifiable information). Easier said than done I think!
Woohoo! Another interruption. And another interesting discussion about shrinking databases!

5:00 Home Time

Pasting this OneNote doc into wordpress to be edited later. Overall this was an average day I think.

Some More Thoughts

All this goes to say that there’s a lot of variety in this day in my job. And overall, I think that this day was pretty typical for myself. There was some overlap into DBA territory (production issues) and there was some overlap into Business Intelligence territory too. About 60% of my day was spent on activities I had planned in advance which and 40% then on surprises.

I’m really keen to see what the rest of the SQL Server bloggers did this day! Thanks Erin for hosting and the great topic.

-- Comments (4)

November 7, 2011

T-SQL Tuesday #024: Procedures “By The Numbers”

Filed under: SQLServerPedia Syndication,Tongue In Cheek — Tags: "sql server", #TSQL2sDay, brad schulz, stored procedures — Michael J. Swart @ 10:12 pm

I’m super excited about this month’s T-SQL Tuesday, for a couple reasons. First of all, this month’s T-SQL Tuesday is hosted by Brad Schulz. Brad is a guy that I got to interview a couple years ago. Second, is that the topic is Sprox ‘n’ Funx (Stored Procedures and Functions). Which I like because the topic is nice and technical.

I started out thinking that I could treat the question “Are stored procedures a good thing?” But that issue is a contentious one and it would take a book to treat that issue properly (So maybe later)

Today, I’m giving some attention to some interesting numbers related to stored procedures (the format is taken from Harper’s Index a popular feature in Harper’s magazine)

Stored Procedures By The Numbers

250: The largest size in megabytes allowed for the source text of a stored procedure.

2,100: The largest number of parameters allowed for a stored procedure.

32: The maximum number of nested levels allowed for stored procedures.

1,598: The number of mentions in sys.messages that contain the word procedure.

858: The number of mentions for function.

389,000: The number of Google results for should use stored procedures.

29,500: The number of Google results reported for should not used stored procedures.

1 in 866: The odds that a random word taken from MichaelJSwart.com is procedure.

1 in 879: The odds that a random word taken from the 2011 State of the Union address is budget.

18: The number of participating blogs in Brad Schulz’ T-SQL Tuesday on the subject of stored procedures and functions.

1875: Earliest year I could find a published mention for the word sproc*

* ~ Lives of the Irish Saints (1875) mentions a St. Sproc daughter of Colum. I am not making that up!

-- Comments (5)

January 11, 2011

My 2011 Resolution

Filed under: Tongue In Cheek — Tags: #TSQL2sDay, blogging, resolutions — Michael J. Swart @ 12:00 pm

So it’s T-SQL Tuesday time again! Thanks to Jenn McCown for hosting this month. Today we’re talking about resolutions. Quite apt for January I think.

Resolution time!

But seriously, What are my resolutions this year? My personal resolutions are personal (but fairly dull) and my work goals are exciting (but also private). So what’s left? Here it is, short and sweet:

I plan to blog more.

Hmm… that could have fit inside a tweet 7 times. I don’t think that’s quite gonna cut it for a blog post. Okay, Here’s exactly what I plan. I have a goal to get four times as many readers as 2010. It’s ambitious, but here’s some things that could help.

Write a post once a week with a post going live at Wednesday at noon. I haven’t been the most prolific and hopefully 2011 will be better.
Keep to technical content as much as possible (SQL Server Denali should be exciting).
Include illustrations when I can.

I like drawing. And I think I’ve improved enough that I can settle on a style. (one panel 300×500, reminiscent of Ripley’s Believe It or Not). At best, I get a smash hit every month or two. At worst it’s a nice graphic that pulls people into the content.

-- Comments (9)

October 11, 2010

Updating Statistics Helps, But Not For The Reasons You Think

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, update statistics — Michael J. Swart @ 8:12 pm

This month it’s Sankar Reddy’s turn to host T-SQL Tuesday. I wish him luck! I know I had a blast last month. This month we’re busting SQL Server misconceptions wherever we find them.

So today I want to explain something about the practice of updating statistics as a technique to improve or maintain performance. It works and it works well, but often not for the reasons you think and I’m going to explain a simpler equivalent technique.

The Story

So you’ve been given the job of finding out why a particular stored procedure is performing poorly. It’s a complicated query plan and one of the first things you notice is that there’s hundreds of thousands of rows coming out of a particular table. After looking closer you figure out that the estimated number of rows is only supposed to be around thirty!

Remember that estimated row counts are based on what the database knows about the data and that info is stored in objects called statistics. As the table grows, these statistics can get out of date. So you try:

UPDATE STATISTICS dbo.MyTable WITH FULLSCAN

which solves the problem. (Good job, take a bow!)

Why It Works

Statistics are used to get an idea of how data is distributed. Based on the data distribution, SQL Server can make better choices on how it chooses query plans. If tables change and stats are out of date, the chosen query plans can be less than optimal. And bad plans have a bad habit of performing orders of magnitude worse than optimal plans.

Why It Really Works

I’ve found that out-of-date statistics can be responsible for bad query plans. But more frequently, I see that the parameter values that are used to compile the plan have just as big an effect.

SQL Server doesn’t just use statistics to estimate the number of rows it’s going to retrieve from a table. It also uses variable values and constants found in the query. And in the case of variables, it always uses the first variable values that it sees when the query gets compiled.

So for example, If I’m looking up a person in the phonebook by last name. The number of people I find is going to depend on whether that last name is common like “Martin” or unique like “Jingleheimer-Schmidt”. Based on the search argument, different plans can get picked which affects database performance.

So even if statistics are already up to date, the act of calling UPDATE STATISTICS can still have an effect. When you update statistics on a table, SQL Server throws out all the query plans that depend on that table (but see comments). And that query gets recompiled with the next parameter values that come along. And these may be different than the original values.

It’s the equivalent of a giving SQL Server a good thumping, or jolt.

So if you’re getting ahead of me, you can see that updating stats won’t always work, but I’ve seen others use it as part of a trial-and-error approach to query tuning. So that when it works sometimes, it becomes part of your mental toolbelt of tricks to try.

Alternative Technique 1

So you’ve used STATS_DATE and you know that statistics are up to date and you want to give SQL Server the same “thump” to build a new query plan without updating stats. You can use:

sp_recompile @objname='MyTable'

Or even more targeted:

DBCC FREEPROCCACHE (plan_handle)

But I would only recommend this for acute performance problems, it by no means addresses the root cause.

Alternative Technique 2

If you decide that SQL Server needs help in picking its plan consistently you can use something more permanent. Use query hints like

OPTIMIZE FOR UNKNOWN,
OPTIMIZE FOR @LastName = “Martin”
USE PLAN …

But this means you’re telling SQL Server you know better than it does. So make sure you understand every nook and cranny of your tables’ data distribution and query plans.

Conclusion

The misconception is that STATISTICS has a greater effect on query performance than it actually does. The query plans that SQL Server chooses depend on both statistics and compiled parameter values.

Try to get away from a trial-and-error method of query tuning. Learn how to look at the following before making a plan:

Find the query plan handle and query plan for a poor performing query.
Estimated row counts vs. Actual row counts
What parameter values were used to compile a query plan.
When statistics were last udpated for a particular index.
The data distribution (histogram) of statistics.

-- Comments (4)

September 20, 2010

T-SQL Tuesday #10 Round up, An index of participants

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: #TSQL2sDay, index, indexes, indices — Michael J. Swart @ 8:19 am

So, about a week ago, I had the honor of being your host for T-SQL Tuesday for the month of September. And an amazing 22 bloggers responded to that invitation. I want to thank each of the bloggers for the time they put into their contribution. We really had some high quality contributions this month.

And another thanks to Adam Machanic for letting me host this month.

Here are the participating blog posts:

1. Brad Schulz Little Known Index Facts
First to go live with his post was Brad Schulz, awesome as ever. He talks about Fill Factor percents and Unique Indexes. He shows his knack for writing in a way that after having read his post, you’ll never forget what he explained.
What Struck Me: Before he gets into the topic at hand, he throws his support behind team indexes (and takes a few shots at team indices) a must read by itself, I pointed my family at the post and heard out-loud laughs.

2. Rob Farley (@Rob_Farley) Table? No such thing…
Rob Farley explains how there is no such thing as a table. Essentially he explains how a clustered index is the table. (Good on ya Rob).
What Struck Me: Spoofing the Matrix:
“Don’t try to look up information in the table. That’s impossible. Only try to realize the truth.”
“What’s that?”
“There is no table.”

3. Pinal Dave (@pinaldave) Disabled Index and Index Levels and B-Tree
Pinal Dave tackles the question “What will be the status of the B-Tree structure when index is disabled?”
What Struck Me: He investigates and explains the answer thoroughly. He also includes a personal touch with his post when he talks about a mentor Vinod Kumar.

4. Noel McKinney (@NoelMcKinney) Constraints & Unused Indexes
Noel (team indexes) explains how an unused index (an index with few reads & writes) can still be useful.
What Struck Me: I like Noel’s writing style. His articles make me feel like he’s sitting right beside me explaining something at my computer. Maybe it’s his first-person narrative or the fact that he throws in his own experiences. Whichever it is it works.

5. Nicholas Cain (@SirSQL) Applying Indexes To Replicated Tables
Any time I read from someone who has tackled replication in a successful way, I’m immediately impressed. Nicholas explains how to extend a replication solution so that the subscriber has indexes that the publisher doesn’t.
What Struck Me: A must read for any one who uses replication (or who is considering replication)

6. Richard Douglas (@SQLRich) Unused Indexes (Indices)
For the first time, Richard Douglas (Team Indices) contributes a post for T-Sql Tuesday. He explains how indexes are not all rainbows and unicorns all the time. He shines light on problem indices that don’t do anyone any good: Unused Indices.
What Struck Me: With a working vocabulary that includes words like “amended” and “gracious”, Richard gives “credence” to the thought of switching to Team Indices.

7. Erin Stellato (@erinstellato) Desperately Seeking Seeks
Erin Stellato explains the steps she takes in her search for a seek (while simultaneously reminding us of 80’s movies).
What Struck Me: What I like about Erin’s take is that she includes in her explanation an Oracle solution. Keeping other platforms in mind keeps everyone honest.

8. Michael J Swart (@MJSwart) Guts Of A Clustered Index
A post by yours truly with fancy Visio pictures. I try to explain what physically happens when a clustered index grows.
What Struck Me: A frisbee.

9. Bob Pusateri (@SQLBob) Potty Chairs And Duplicate Indexes
Provides an embellished script that has been making the blog rounds to report on redundant indexes. Good job Bob!
What Struck Me: A gold painted potty chair! I can’t describe it properly, you just have to see for yourself.

10. Jeremiah Peschka (@peschkaj) Indexes
Jeremiah takes a (deliberately) rambling look at indexes, b-trees and … erm… horology.
What Struck Me: His post is the number one post here ranked in order of likeness to a daily show monologue. (And I love animated gif lolcats). It’s much appreciated.

Start humming "Girl from Ipanema" to yourself

11. Jen McCown (@MidnightDBA) Brief Intro To Indexes and INCLUDE
Jen pulls her post from the archives to give an introduction to indexes.
What Struck Me: She explains how INCLUDED columns in an index are only included at the leaf levels! This is something I didn’t know until now. It reminds me of something mentioned by Brad Schulz’s post where Unique Indexes can also do without columns in the index nodes.

12. Gail Shaw (@SQLintheWILD) One wide index or multiple narrow indexes?
Gail writes an authoritative post on indexing strategy. Hers is one of the few contribution that addresses the question “Which index is best?”
What Struck Me: I like her conclusion about the so-called “strategy” of having single-column indexes on each column of the table. And her post is a must read for anyone who isn’t clear on this point.

13. Tom LaRock (@SQLRockstar) Big Mac Index
The SQL Rockstar explains indexes using a Big Mac analogy it’s an analogy I guarantee no-one has considered before (making this post legendary).
What Struck Me: A Microsoft Connect Item on the Big Mac Index. Wow.

14. Diane McNurlan (@SQLDevGal) Top 10 Worst Indexing Practices
Another member of team indexes, Diane covered important index topics in a clear way that it can almost be used as a checklist (for creating indexes or querying them).
What Struck Me: Diane is a very competent writer and although I’ve got a bunch of other posts to read for this round up I found myself going through more of her articles. And subscribing to her blog’s feed. Us database developers have to stick together.

15. AJ Mendo (@SQLAJ) Indexes 101
In AJ’s own words, he talks “about what indexes are, why we can benefit from their use and some of the costs associated with using indexes.”
What Struck Me: Along with the standard clustered vs. non-clustered, AJ gives a brief intro to some of the lesser known kinds of indexes: Fulltext, Spatial, XML etc…

16. Jason Strate (@StrateSQL) An Index On Indexing
Just like the cheesy 80’s sitcom clip show, Jason gives a list of his series on Index analysis. It’s a good sign though, only the best 80’s sitcoms ever got clip shows (Joanie Loves Chachi notwithstanding).
What Struck Me: In fact in my own “cheesy clip show” post last year I ran out of clips and ended up linking to Jason’s site.

17. Dave Levy (@Dave_Levy) How is Fill Factor Impacting My Indexes?
Dave queries some DMVs to help you analyze how effective your fill factor settings are.
What Struck Me: Okay, I have to call this one out. If you’re a dba, and you read only one index post this month make it mine, but if you have time to read two read Dave Levy’s. His fill factor script belongs in every DBA’s toolkit alongside queries that analyze things like missing index or fragmentation levels.

18. Jeremy Carter Cut Your Index Bloat
Jeremy explains the process he follows to identify and remove unused indexes.
What Struck Me: Jeremy is probably the most underexposed blogger on this list. There are a ton of MVP participants this month include honest-to-God book authors! Jeremy holds his own quite easily alongside them.

19. Andy Lohn (@SQLQuill) Partitioned Indexes and Data Types
Andy tackles index partitioning in this post. Specifically, he explains two methods of moving from non-partitioned indexes to partitioned indexes.
What Struck Me: I like that this is not a hypothetical situation. Andy and his colleagues are facing this problem as we speak (or as I write). And they are testing to determine which method is faster. The results are not in yet making his post a bit of a cliff-hanger.

20. Robert L Davis (@SQLSoldier) To Be or Not To Be (a B-tree)
Robert (aka SQL Soldier) explains how XML indexes are structured.
What Struck Me: Oh the things you can do using the dedicated administrator connection!!

21. Aaron Nelson (@SQLvariant) Picture The Indexes
A great analogy for indexes is the phone book. Aaron Nelson runs with it.
What Struck Me: The post and comments stretch that metaphor about as far as it can go.

22. Jason Brimhall (@sqlrnnr) Indexes And Blobs
Ending this month’s T-SQL Tuesday, Jason (team indexes) refurbishes an old script of his to report on BLOB index columns.
What Struck Me: He mentions something that I’m surprised never got covered by anyone else this month: Blob’s make lousy indexes.

Encore!

(Update 9/20/2010) One late addition (sorry I missed it, there was no comment at the invite post).

23. Steve Jones (@WayOutWest) Remembering To Index
In Steve’s own words, he reminds us that “Indexing is important, but you can overdo it.”
What Struck Me: He wrote about a third-party db he came across which include the most over-indexed table I’ve ever seen in my life.

-- Comments (22)

September 14, 2010

Guts Of An Clustered Index

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, clustered index — Michael J. Swart @ 8:31 am

So T-SQL Tuesday rolls around again. And this month it’s hosted by some handsome devil named Michael J Swart. I got to pick the topic, so you might expect great things. Well, we’ll see (no pressure) Here we go.

I want to show visually what’s happening on the disk when you create and grow a clustered index. I think this post will be most useful if you’re an application developer. You know what a table is and what a clustered index is, but beyond that, it’s not clear where SQL Server keeps your data physically. Hopefully you’ll soon find it clearer.

So I’m going to keep it simple. I’m going to show a simple table with a clustered index on it’s identity column (no partitions, no compression, one filegroup on one file). Remember a clustered index defines the ordering of the table and so essentially a clustered index is the table. That’s good, because it means that in the example below, I’ll only talk about one thing. You can call it either the table or the clustered index, it’s the same thing.

First Rule of Database Storage…

First thing you should know about database storage is that everything, I mean absolutely everything in the database is stored in pages. Each page is a chunk of storage on disk exactly 8 kilobytes in size, no more, no less. You’ll see that a page never contains records from different tables.

Create A Table

CREATE TABLE TEST (
  	id INT IDENTITY NOT NULL,
  	variableLength VARCHAR (MAX) NOT NULL DEFAULT (REPLICATE('#', 1000)),
  	CONSTRAINT pk_TEST PRIMARY KEY CLUSTERED (id)
);

Nothing happens on the disk here. There is no new space that is reserved in your database for this new table. Think of SQL Server as a lazy librarian. It’s not going to find shelves for books that don’t exist. But if you could look closely, you would see that there are changes to the system. SQL Server records in system tables that there’s a new table. Here’s one of the system tables:

New Row In A System Table

This particular system table describes the space that’s allocated to table TEST. It’s the system table you go to when you want to find where your table is located on disk. Right now no space is allocated until we create some data.

First Row

insert TEST default values;

Okay, something’s happening now. Two new pages are allocated for table TEST after just a single row.

Two New Pages

Page Addresses: Notice that pages always have a name (or address) that looks like 1:153. The first digit is the file number in that database (In our example, the digit will always be 1 because the database uses only one file). The second number here (153) is an incrementing counter that identifies the page number.

So a page with the address (1:153) is the 153^rd page (or stretch of 8 kilobytes) in the 1^st file of the current database.

The IAM page: The first you see here is an Index Allocation Map. If database tables were books on shelves, the IAM page would be like a floor plan of the library. Paul Randal wrote the definitive web resource on this stuff and so it’s something I’m not going to cover any deeper.

The Data Page: This is where your data is. There’s a header that SQL Server uses for some context. And there’s also a footer that acts helps track the order for the rows on a page. But still there’s roughly 8000 bytes left on data pages for your actual data. In my example, the rows I create are pretty big (about 1kb) and so there’s space for about six more rows in this page. What happens when we try to add seven?

First Index Node

set nocount on
insert TEST default values;
GO 7

Okay, There’s two data pages now. The first one, 1:153 got full and SQL Server allocated a second one, 1:156, for the eighth row:

New Index Page

The Index Page: A new kind of page is created. If you’re looking for a single row and you know the id, you don’t start at the first data page. It’s better to look at the root page. Using the information in the root page, you know the address of the data page your row is in.

In fact a page address and a integer Id column are pretty tiny fields and so there’s a lot of space in this index field. There’s space in this page to index over 600 pages of data. That works out to over 4000 rows (inside the 600 data pages). What happens when you create one too many?

More and More Rows

set nocount on
insert TEST (variableLength) default values;
GO 4347

New Index Level

There’s a new level!

Finding Row 42: So let’s say that you want to look up row with Id=42. First read the root page (1:791). This information tells you to look at page (1:155). You’re still not at a data page. So reading (1:155), you know to look at (1:160). And voila, after three page reads, there’s your row.

B-Trees: Tables that are quite large or that have really wide indexes can have more and more levels. This structure is called a B-Tree (B for balanced). The idea is that no matter where stuff is inserted or deleted, it’s not really difficult to shuffle page pointers around to make the tree balanced.

See For Yourself

This is all stuff that you can see for yourself. The following are examples of commands I’ve used. They’re not documented, but they’re extremely useful when trying to understand this stuff.

SELECT pl.*
FROM sys.system_internals_allocation_units iau
JOIN sys.partitions p
	ON iau.container_id = p.partition_id
CROSS APPLY sys.fn_PhysLocCracker(iau.root_page) as pl
WHERE p.object_id = OBJECT_ID('test')
	AND p.index_id = 1 -- the clustered index
	AND iau.type = 1 -- in row data
 
DBCC PAGE('sandbox', 1, 153, 3);
 
DBCC IND('sandbox', 'test',1);

So that’s it. There’s tons left to say about all this stuff (metric tons!) and I’m hoping that the other T-SQL Tuesday bloggers will cover some of it.

-- Comments (5)

September 7, 2010

Invitation to Participate in T-SQL Tuesday #10 – Indexes

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, index, indexes, indices — Michael J. Swart @ 10:00 am

What are your thoughts on Database Indexes?

(Update 09/20/2010: Find the roundup here) September already! Summer holidays are over and that means no more lazing around. The second Tuesday is fast approaching and that means that T-SQL Tuesday is almost here. And this month I will be your humble host and the topic is Indexes.

A book's index is not that different than a database index.

Indexes

Indexes are strange things. You never need to explicitly create one to create a fully-functional database, but if you want a database to perform well, they’re indispensable.

And there are so many aspects to write about! Like internals, covering, clustered, xml, fulltext, b-trees, hints, maintenance of, included columns, filtered, redundant, missing and tons more.

In fact my SQL Server 2008 Administrator’s Pocket Consultant (The first handy textbook I could grab) has an index entry on “indexes” that has 22 sub-entries.

About T-SQL Tuesday

For those not familiar with T-SQL Tuesday, it’s an idea hatched by Adam Machanic (Blog|Twitter). A monthly event, T-SQL Tuesday is a chance for any blogger to write and post an article on a single topic determined by the host (i.e. yours truly). Leave a comment here on that day and a day or two later, I’ll include your post in the round up.

Follow These Rules

The post must go live on your blog between 00:00 GMT Tuesday, September 14, 2010 and 00:00 GMT Wednesday, September 15, 2010.
In other words, set your sql server date, time and timezone properly and run this script:
IF GETUTCDATE() BETWEEN '20100914' AND '20100915' SELECT 'You Can Post' ELSE SELECT 'Not Time To Post'
IF GETUTCDATE() BETWEEN '20100914' AND '20100915' SELECT 'You Can Post' ELSE SELECT 'Not Time To Post'
Your post has to link back to the hosting blog, and the link must be anchored from the logo (found above) which must also appear at the top of the post
Leave a comment here (below) or I won’t be able to find your post. I expect trackbacks work properly, but if they don’t check back here just in case and leave a comment if necessary.

We also encourage you to …

… include a reference to T-SQL Tuesday in the title of your post. (The more we bloggers advertise T-SQL Tuesday, the more we bloggers get T-SQL tuesday readers)
… tweet using the hash tag #TSQL2sDay to follow links and other relevant conversations.
… consider hosting T-SQL Tuesday yourself. If you’re interested let Adam Machanic Know. If you’ve participated in two T-SQL Tuesdays previously and you don’t let your blog go stale (blog once a month for the last six months) then he’ll put you in the rotation.
And out of curiosity… Are you team indexes or are you team indices? Personally I’m team indexes but I work with a guy who seems to be so team indices that I’ve caught him saying indice for the singular!

That’s it! Good luck! Can’t wait to see what you guys have in store.

Your humble host,
Michael J. Swart

-- Comments (48)

July 12, 2010

Learning Objectives vs. SQL Server

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", #TSQL2sDay, learning, learning objectives — Michael J. Swart @ 9:00 pm

Takeaway: Choose learning activities that increase understanding and not just knowledge.

So It’s T-SQL Tuesday time again. This month it’s hosted by Robert Davis and the topic is Learning. It’s a topic of particular interest to me. I work in a field that writes software for education and now I’m writing about education on software. Here are the ways I learn (BTW, The methods are ordered from shallowest to deepest in terms of understanding)

Knowledge: (Ability to Remember)

Attending Conference Sessions
Watching webinars
Reading (BOL/Books/Blogs/Magazines)

Comprehension: (Ability to Understand)

All of the above… plus:

Getting help from StackOverflow and #sqlhelp
Conference Sessions (Q&A time). I mentioned conference sessions above, but here, I’m calling out the Q&A time.
User Group Discussions

Application: (Ability to Use)

All of the above… plus:

Troubleshooting: The best knowledge and understanding comes from troubleshooting, especially in crisis situations. This hard-fought knowledge doesn’t get forgotten.
Exploring a new feature installed on a machine.
Talking to consultants or mentors: Don’t underestimate this. I often felt like the SQL Server clinic (drop-in consulting provided by Microsoft’s CSS team) at the pass summit were underused.
Hands-on Labs.
Receiving code reviews.

Analysis: (Ability to Evaluate/Create)

With enough experience, you reach Nirvana*. You’ll know you’ve arrived if you can:

Answer questions from StackOverflow and #sqlhelp
Conduct code reviews
Give Talks/Sessions/Workshops on a topic

* Nirvana comes from Seattle. Coincidence?

But why should you care?

It’s useful to know which methods are most effective. Some lessons we could take from above is that it’s more effective to learn by doing than to attend training. Which kind of makes sense. Being thrown in the deep-end is often more effective swimming instruction than to read about swimming.

And as SQL Server professionals we’ve already got a leg up on academia! Our motivations for learning are often more immediate and pressing than “gotta study for the final”.

It’s my hope that by recognizing what kind of learning objectives you could achieve with each approach, you can make better choices about learning like:

Starting in support is an awesome way to shorten the learning curve.
The Q&A part of sessions stick in my head more than any other (especially if I asked the question).
Have management studio open (or BIDS or whatever) when reading blog articles. Believe me when reading Brad Shulz‘s blog, his examples sink in a lot better this way.
When way out of your depth. Consider hiring a consultant: It’s fixed cost, you get your problem solved, and you can learn something too.

-- Comments (1)

Older Posts »