Michael J. Swart

February 23, 2011

How Full is Fill Factor 100?

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , , — Michael J. Swart @ 12:00 pm

Takeaway:

Question: For a clustered index on an identity column, is it okay to set the fill factor to 100?
Answer: Most likely, it depends on a lot of things.

Fill Factor

So today I’m talking about the FILL FACTOR setting that can be applied to indexes. Remember, Fill Factor is a percentage you can specify so that when indexes are built (or rebuilt) SQL Server leaves some free space in each data page to accommodate any new data that may come along.
A man at a restaurant saves room for dessert.
If more data is added to a page which doesn’t have enough room to accommodate it, then a page split occurs – a new page is created elsewhere and roughly half the rows get written to the new page. This leaves two pages roughly half full. So the goal of setting a Fill Factor properly is to prevent these page splits (because too many page splits impacts performance).

Fill Factor of 100 on Clustered Indexes on Identity Columns

So there’s a couple places I’ve found that recommend a fill factor of 100 on indexes that begin with identity columns.

  • Dave Levy describes a process that includes the tip: “If the index is on an ever increasing value, like an identity column, then the fill factor is automatically 100.”
  • Pinal Dave gives the same advice: “If the index key column is an IDENTITY column, the key for new rows is always increasing and the index rows are logically added to the end of the index. In such a situation, you should set your FILLFACTOR to 100.”

This makes sense, if you always add rows to the end of an index, then you don’t need to save room in the middle do you? Ahh… But what about UPDATE statements? UPDATE statements can add data into the middle of an index (especially a clustered index which includes all fields). You might think that even updating a record so that it’s one byte larger than it used to be will cause a page split.

Fill Factor of 100 Still Has A Bit Of Wiggle Room

It turns out that there’s still a little bit of wiggle room. It’s very rare for pages to have zero bytes free. Even if the index was built (rebuilt) with Fill Factor 100. The reason is because data pages contain an whole number of records. If there’s space on a page, but not quite enough space for a whole record, then it’s considered full. This tiny space could in theory be used for updates that fit.

What Else to Consider

So one extra byte is rarely going to cause a page split. I would be comfortable recommending a Fill Factor of 100 for any index on an identity column. But before you trust me, there are some other things to consider:

  • The bit of wiggle room I mentioned above
  • Know your application! Most OLTP systems do far more Inserts than Updates. (Most OLAP systems do zero updates)
  • How many fields in the index are variable length? And how many of those get updated? Remember only variable length fields can change the size of a record. No variable length fields means an automatic Fill Factor 100.
  • SQL Server only pays attention to Fill Factor on Index Rebuilds (or on creation). It doesn’t maintain the fill factor space any other time. So ask yourself how often updates are applied to rows that are older than your last index rebuild. If it’s rare, then Fill Factor 100.
  • How’s your re-indexing strategy? If you REORGANIZE your indexes instead of REBUILD, the fill factor won’t make a difference at all (If so, better to stop reading this article and work on a comprehensive index maintenance strategy.)
  • Page splits don’t impact performance of seeks (just scans).
  • Page splits aren’t the end of the world. In terms of database health. Fragmentation is like a bad cold.

There’s probably even more things I’m missing. But you know what’s better than guessing? Measuring! Go use Dave Levy’s Fill Factor script to know exactly how Fill Factor is impacting your indexes.

February 16, 2011

Searching Inside Strings: CPU is Eight Times Worse For Unicode Strings

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , , — Michael J. Swart @ 12:00 pm

Every time I see behavior from SQL Server that I don’t understand, it’s very disconcerting. But it often turns out to be a learning experience. And if I’m lucky to get to the bottom of it, the knowledge becomes one more tool in my mental utility belt. I had that experience a couple weeks ago and I want to write about it.

Opening a pocket on a utility belt.

The thing I learned most recently is that searching inside Unicode strings (NVARCHAR strings) is a lot more cpu-intensive than searching inside single-byte strings (VARCHAR strings). By searching I mean looking for substrings. This has nothing to do with index lookups and full-text indexing is a different topic.

So here’s how I learned it: In a database I work on, we recently changed many of our database strings into unicode strings (VARCHAR to NVARCHAR) in order to support multiple languages. And we discovered that the CPU time taken by of a couple procedures shot through the roof! It was an 800% increase in CPU and this was without any significant I/O increase.

This is probably karma coming back to bite me after I said that I/O is the only performance metric you need.

The Setup

Luckily, I was able to reproduce and isolate the behavior. I show the example here. First we build a table of a million rows:

use tempdb
 
create table test
(
    testid bigint primary key,
    v varchar(36),
    nv nvarchar(36)
)
 
go
-- Itzik Ben Gan's trick (via Brad Schulz) to get around SQL Server's lack of a numbers table
with
   L0 as (select 1 as C union all select 1)       --2 rows
  ,L1 as (select 1 as C from L0 as A, L0 as B)    --4 rows
  ,L2 as (select 1 as C from L1 as A, L1 as B)    --16 rows
  ,L3 as (select 1 as C from L2 as A, L2 as B)    --256 rows
  ,L4 as (select 1 as C from L3 as A, L3 as B)    --65536 rows
  ,L5 as (select 1 as C from L4 as A, L4 as B)    --4,294,967,296 rows
  ,Nums as (select row_number() over (order by (select 0)) as N from L5)
insert test
select N,
    CAST (newid() as varchar(36)),
    CAST (newid() as nvarchar(36))
from Nums
where N<=1000000;
go

The Queries

Now look at these two queries, The query that searches the unicode string performs eight times worse than its single-byte counterpart even though they use identical I.O.:

set statistics time on
set statistics io on
-- search utf8 string
select COUNT(1) from test where v like '%abcd%' option (maxdop 1)
-- CPU time = 797 ms,  elapsed time = 791 ms.
-- Table 'test'. Scan count 1, logical reads 16472,...
 
-- search unicode string
select COUNT(1) from test where nv like N'%abcd%' option (maxdop 1)
-- CPU time = 6828 ms,  elapsed time = 6862 ms.
-- Table 'test'. Scan count 1, logical reads 16472,...
 
/*
(Aside: 36 characters isn't much of a test, but it turns out CPU usage
scales linearly on tests using larger input strings: Searching 72 character
strings take twice the CPU, 108 character strings take three times the CPU etc...)
*/

But Why?

The extra CPU cannot be explained away by the wider characters. My gut feeling says that strings twice as long should not take eight times the CPU to search. My first thought was that there was an implicit conversion somewhere but that wasn’t the case.

After some stackoverflow.com help it turns out that this has something to do with the different collations. Many different strings can be compared as equal even though they have different binary representations. VARCHAR strings with different binary representations can compare as equal (e.g. ‘MICHAEL’ is equal to ‘michael’). And Unicode string comparisons have even more complicated rules than these.

So if SQL Server collations have something to do with it, then we can hope that by using a binary collation, we’ll save the extra CPU. And in fact, we see something like that:

-- search unicode string with binary collation
select COUNT(1)
from test
where nv COLLATE Latin1_General_Bin
    like N'%ABCD%'
option (maxdop 1)
-- Table 'test'. Scan count 1, logical reads 16472, ...
-- CPU time = 781 ms,  elapsed time = 777 ms.

However if we use a case sensitive, accent senstive collation, we can hope for better CPU, but we’d be disappointed:

select COUNT(1)
from test
where nv COLLATE SQL_Latin1_General_CP1_CS_AS
    like N'%ABCD%'
option (maxdop 1)
-- CPU time = 6688 ms,  elapsed time = 6702 ms.
-- Table 'test'. Scan count 1, logical reads 16472, ...
/* similar results for all other non-binary collations */

So what do we know so far?

  • Searching inside NVARCHAR strings is slower than searching inside VARCHAR.
  • Specifying different collations (i.e. case sensitivity, accent sensitivity) for NVARCHAR strings doesn’t improve performance
  • Binary collations are the exception. Searching inside strings using binary collations are much faster

Not Just SQL Server

It turns out that this is most likely not SQL Server’s fault. SQL Server relies on the operating system for its string methods. In particular, it probably relies on any one of these methods found in Kernel32.dll:

  • lstrcmpi or lstrcmp
  • FindNLSStringEx, FindNLSString and FindStringOrdinal
  • CompareStringEx, CompareString and CompareStringOrdinal

The docs for the ~Ordinal functions indicate that these functions are meant for binary (non-linguistic) string comparisons. I’d bet a lot of money that this explains the behavior we see in SQL Server. It accounts for why comparisons using binary collations are faster while comparisons using other collations are not.

February 9, 2011

Pinal Dave: Blogger, MVP and now Interviewee

Filed under: Miscelleaneous SQL — Tags: , — Michael J. Swart @ 12:00 pm

Pinal Dave is a popular blogger and if you follow any number of blogs, you’ve most likely heard of him. Pinal Dave’s blog Journey to SQL Authority (SQLAuthority.com), has the highest traffic of any SQL Server Blogger and the third most traffic for any site dedicated to SQL Server (after SQLTeam.com and SqlServerCentral.com). Recently, Pinal was kind enough to agree to answer some questions for an interview. It’s been a fun process and I’m happy to present the results of the interview here.

Although he doesn’t know this, I’ve actually met Pinal Dave. It was at the SQL Server Summit 2009 and I was running between one session or another and I sat down to check email. Pinal was in a seat nearby talking with some colleagues of his and I recognized him. I chose not to introduce myself because he seemed busy at the time. I kind of regret that now. Some day I’ll get the chance to introduce myself in person. But meanwhile, we’ve exchanged messages through our blogs and email messages. He was good enough to submit an article on indexes when I hosted TSQL Tuesday in September last year.

I hope you enjoy this interview. I think (and hope) you’ll find it fascinating.

The Interview

You and your family

Q.Hi Pinal, Thanks for taking the time to answer these questions. First of all, how do you pronounce your name? (As you would pronounce it when introducing yourself to an English speaking Canadian.)

A. This is great question. People from India usually pronounce my name correctly. However, every time I travel internationally, I have issues. The way to pronounce my name is ‘Pinul Daway’. However, it is spelled ‘Pinal Dave’. Every time I travel to Canada, the USA or other English-speaking countries, people often call me ‘Dave’ – thinking that is my first name. However, Pinal is my first name and Dave is my last name.

Very interesting question indeed. I always wanted to clarify that on my blog but never got chance.

Q. Tell me where you grew up and where you went to school.

A. I can give you the names of the places and schools but it will not make sense. My father had a government job and he had to transfer from location to location. This meant I changed schools quite often. I cannot name one city as the place I grew up. I changed schools 12 times before I entered grade 12, I think that says it all.

Q. Tell me about your family. I believe you’re a husband and a father?

A. I have a beautiful daughter, Shaivi, and a lovely supportive wife, Nupur. Everybody has an interesting story about how they met their wife or life partner. I have interesting story about how we did not get introduced for long time.

Our homes were very close to each other. We have many common relatives and friends. But for almost 25+ years neither of us ever met each other. However, I had met her whole rest of the family (father, mother, sister, grandparents, etc) hundreds of times. Well, the same thing goes for her. She had met my family but not me. We were finally introduced to each other by our parents and it was love at first sight.

… and then there is Shaivi. She is now everything for us. This morning for the first time she said the word that I always wanted to hear – “Daddy.” I am overwhelmed with emotions.

Q. What is your favorite memory?

My top two favorite memories are already mentioned in the earlier questions.

  • The day I met my wife.
  • The day my daughter said ‘Daddy.’

I guess nothing can top those two moments.

Pinal’s blog: Journey to SQL Authority

Q. Your blog SQLAuthority.com is very popular. Why is that?

A. If I am polite and say I do not know or give you some kind of philosophy, it will not be the truth. I personally believe there are two reasons.

English – English is not my first language, which limits me to writing everything in the 500 English words I know. I think there are plenty of readers all over the world who are in the same boat as me.

Simplicity – I try to make complex subjects simple enough for even beginners to understand. We are all beginners at one point. I believe that the smartest guy is also a beginner in a new subject. I strongly believe that we all are beginners but the only difference is we all have different powers of understanding.

Q. You have written so many articles. With training and other events,when do you find time to write?

A. It is difficult – very difficult- extremely difficult! I have to ration my time. I sometimes feel guilty when I ration time away from my daughter. I promise I will not do that today.

Q. Your blog is written in English. Why is that? Do you also train in English?

A. English is my third language. The reality is that I have never received training in English. I did not formally learn English at all. I speak Gujarati (regional language) and Hindi (national language). I learned English because on the very first day I was in the USA for my master’s degree, I went to bed hungry because I did not know how to ask for vegetarian food. (I learned that I should say ‘no meat, no sea-food please’ or ‘sin carne’ in Spanish.)

Now I do training in English and I write in English. The reason I write in English is because I learned SQL in English only. I still think in my local language and then quickly translate it to English.

The reason I prefer English is because I believe English is a Global Language and if I want to help the global community, I must express myself in English.

Q. Who has helped you the most with your successes? How have they helped?

A. As this question is in a SQLAuthority.com section I understand you are asking this question in a professional aspect.

I have four icons:

  1. Ben Forta – He is an Adobe Evangelist and I learned a lot reading his blog and books in my early career. I always wanted to be Evangelist like him. I think I want to give him complete credit for inspiring me to write a technical blog.
  2. Vinod Kumar – He is an MCT-Technology Specialist in Bangalore, India. He was an SQL Server Evangelist from the inception of the program in India. After coming back to India, I read his blog and always wanted to be like him – helping the community become more technology aware.
  3. Rushabh Mehta – He is Solid Quality India CEO and world famous BI expert. He is one man, who have given me chance – chance to prove myself, chance to grow, chance to learn technology, chance to help community, chance to go to next level and many more. I own a lot to him.
  4. Rajendra Dave – He is my father. He reads each of my blogs and gives me his feedback. He always encourages me to write creatively and constructively. He has one rule – “Praise in public, criticize in private.” I think this has helped me tremendously. Others have inspired me but he has put 100% of his efforts in making me successful.

Q. You mention on your blog that you’ve been awarded the MVP award. How long have you held that distinction and how did it feel when you first won it?

A. The MVP was the best thing in my life professionally. The feeling was that Microsoft is watching me and appreciating me. I felt – “Microsoft Cares.” It was great feeling. I have been holding this award for three years continuously. I think being an MVP is the best thing that can happen when one is dedicated to the community.

The MVP award has enabled me to do lots of things. It has connected me with great peers and given me access to essential resources, which enabled me to help community further.

Q. Besides blogging, consulting, training and family time, how else do you spend your time?

A. I like to travel. I travel for consulting, training, the community, and for family vacations. Due to the nature of the job, I have to make a conscious effort to stay away from work many times. When I travel with my family, I make sure I do not keep computers around. I do not even check my email and spend 100% time dedicated to my family. They love it and I love them.

Thanks Pinal

Thank you again Pinal, I’m grateful that you took the time to answer my questions.

February 2, 2011

Ridiculously Unnormalized Database Schemas – Part Three

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm
Ridiculously Unnormalized Database Schemas

Today, I describe 3rd Normal Form and give a ridiculous example of a table that breaks the rule badly. I’ve seen 3NF defined this way:

“A relation (table) R, in second normal form is in third normal form if every non-prime attribute of R is non-transitively dependent on every candidate key of R.”

That’s Wikipedia’s definition of third normal form. And they do a decent job explaining what that means. They reference E. F. Codd’s paper Further Normalization of the Data Base Relational Model 1972. But I couldn’t find a reference online and I was too busy this week to dig out my alumni card and visit the Davis Centre Library.

The House That Jack Built

Before we get into the ridiculous example, I want to talk a bit about the English nursery rhyme The House That Jack Built.

It’s a cumulative poem that starts with the line This is the house that jack built. It repeats and expands (like the songs I Know an Old Lady Who Swallowed a Fly or The Twelve Days of Christmas)
By the end of the poem we have a beautiful run-on sentence where the subject of the poem has very very little to do with the house that Jack built. Here is the rhyme:

This is the house that Jack built.

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt
that lay in the house that Jack built.

and so on until

This is the farmer sowing his corn
That kept the cock that crowed in the morn
That waked the priest all shaven and shorn
That married the man all tattered and torn
That kissed the maiden all forlorn
That milked the cow with the crumpled horn
That tossed the dog that worried the cat
That killed the rat that ate the malt
That lay in the house that Jack built.

The cow with the crumpled horn tosses the dog.

But the thing that makes this nursery rhyme interesting is the same thing that makes tables violate third normal form. If we think of each database row as a sentence and the subject as the primary key. This brings us to our ridiculous example.

The Ridiculous Example

Houses (hope you like scroll bars)

Builder

Contents

Content Consumer

Content Consumer Killer

Content Consumer Killer Worrier

Content Consumer Killer Worrier Tosser

Content Consumer Killer Worrier Tosser Milker

Content Consumer Killer Worrier Tosser Milker Spouse

Content Consumer Killer Worrier Tosser Milker Spouse Wedder

Content Consumer Killer Worrier Tosser Milker Spouse Wedder Waker

Content Consumer Killer Worrier Tosser Milker Spouse Wedder Waker Owner

Jack

Malt

Rat

Cat

Dog

Cow (crumpled horn)

Maiden (all forlorn)

Man (tattered and torn)

Priest (shaven and shorn)

Cock (crowed in the morn)

Farmer (sowing corn)

This table is nowhere near third normal form; it’s in a different time zone! This table is supposed to be about houses, not livestock and 16th century English people. So when you’re designing your tables remember: No run on sentences!

The Other Normal Forms

So that’s the series. I could continue with 4NF, 5NF and BCNF (and others!) but I won’t for a few reasons.

  • Most people who pay attention to 1NF, 2NF and 3NF almost always design databases in the other normal forms without trying.
  • The other normal forms are dull enough that it becomes a challenge to actually come up with a counter example that can seriously be called ridiculous
  • I’ve never heard of any problems encountered by anyone caused by not paying attention to the other normal forms. If you know of any, let me know!

Liked This Series?

Hey you, yeah you. The web surfer. Thanks for coming by. I’m glad you’re visiting this post. That’s what I wrote it for. If you liked this series, you may also like my series on Transaction Levels. Or maybe have a look at some of my favorite posts (or some of yours).

Now clicking the links, that’s the first step. Subscribing to my RSS feed – That’s a whole other thing. So here’s the the deal, if you put my blog into your RSS reader, I promise to write stuff worth belonging in it. Go on, subscribe.

Powered by WordPress