Michael J. Swart

May 28, 2013

Swart’s Ten Percent Rule

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

A picture of coffee captioned: "Success: ninety percent hard work, ten percent coffee"“Just because you can, doesn’t mean you should.” SQL Server has a number of limits and restrictions (many of the limits are described here). Most of the time these restrictions don’t feel restrictive at all. I can’t think of a single scenario where I’d want more than 128 characters to name a column. And I don’t think I’ll ever need to select more than 4096 columns at a time. I get the feeling that if you need over 10% of what Microsoft restricts you to, then you’re probably doing it wrong.

So I turn that thought into Swart’s Ten Percent Rule:

If you’re using over 10% of what SQL Server restricts you to, you’re doing it wrong.

Always? No not always. It depends of course on what restriction we’re talking about. Sometimes the rule applies more than other times. I’d like to evaluate a number of restrictions here and judge whether the rule applies or not. I’ll be assigning each 10% rule a “Risk that you’re doing it wrong”. The levels are Severe, High, Elevated, Guarded or Low.


SQL Statement Length

Max Size: 256 megabytes (Using the default network packet size.)
10% of restriction: 25 megabytes
Risk of doing it wrong: Severe

Let’s take a step back. Why do you want to execute a statement that’s 25 megabytes long? The only scenario I can think of is that you’re sending data to the database to be parsed and stored. There’s much better ways to send data to the database then this (See Erland Sommarskog’s definitive Arrays and Lists in SQL Server).

Or maybe you’re creating a one-off huge IN clause to pass to SQL. Better to join to a temp table or a table valued parameter.


Identifier Length

Max Size: 128 characters
10% of restriction: 12 characters.
Risk of doing it wrong: Low

Twelve characters is not a lot to work with when coming up with names for tables or columns. If you had a column or table name with more than 12 characters, I wouldn’t be at all concerned. For comparison, Adventurework’s average column is 11 characters long. Their largest column is “HomeAddressSpecialInstructions” at 30 characters. My friend, Karen Lopez, told me she once came across this beast of a column name: “RetailTransactionLineItemModifierEventReasonCode” at 48 characters. That’s getting extreme (despite intellisense), but 12 is fine.


Number of Columns in a Table

Max Size: 1024 Columns
10% of restriction: 102 columns
Risk of doing it wrong: High

Maybe you need a lot of columns because you’re treating a table like a spreadsheet. The worst offender I’ve ever seen had a whopping 96 columns. It was designed by someone who didn’t know any better. By the way, the most notable modeling smell coming from this table was the columns named [Int01] to [Int30]). In this case, the database designer was definitely doing it wrong.

Maybe you need a lot of columns to support growing functionality. As an example, say you have a table like

CREATE TABLE UserProfile
(
    UserId int not null primary key,
    PhotoFileId int null,
    PreferenceTags varchar(1000) not null,
    CanEditWidget bit not null,
    CanViewWidget bit not null,
    CanAddWidget bit not null,
    IsMonitored bit not null,
    ...,
    IsApprovedForFastTrack bit not null
)

Then the thing to ask yourself is why is this list of columns changing so much? In this example, we see a growing list of boolean attributes (which will probably continue to grow as the application evolves). In this case it makes sense to redesign your schema using:

CREATE TABLE UserProfile
(
    UserId int not null primary key,
    PhotoFileId int null,
    PreferenceTags varchar(1000) not null
)
 
CREATE TABLE UserPermissions
(
    UserId int not null 
        references UserProfile(UserId),
    PermissionId int not null
        references Permissions(PermissionId),
    IsGranted bit not null
)

But maybe you really do need this number of columns, you’re probably storing semi-structured data. By semistructured data, I always think of how one would store cascading style sheets or health records.

Semistructured data is really difficult to model. As an alternative to these difficult-to-manage columns, consider storing the values in an xml column (accepting that you can fetch, but probably no longer want to filter). If you need something more radical, maybe consider a schema-less json document database like couchdb (I can’t believe I said that).

SQL Server has something called wide tables which use sparse columns but you’re on your own here. It’s not something I could recommend.


Simultaneous Server Connections

Max Size: 32,767
10% of restriction: 3,277.
Risk of doing it wrong: Guarded

The number of connections your SQL Server instance can support can be found by this query

SELECT @@MAX_CONNECTIONS as [this many]

and the number of open connections on the server is basically this

SELECT count(1) as [this many] FROM sys.dm_exec_sessions

It’s not that unusual to have 1000 or 2000 connections open. It is rarer to see that many active queries. But it’s not unusual to see 1000s of connections, most of whom are sleeping. It’s possible you have a large number of services and web servers connecting to a variety of databases on your server. With ado.net connection pools, it’s common to have a bunch of idle connections.

The best advice I can give is to get a baseline. Watch the connections regularly in order to understand what is normal. Maybe start with

select 
    count(1) as [connections], 
    count(case [status] when 'sleeping' then null else 1 end) as [active connections],
    db_name([dbid]) as [database name],
    [hostname], 
    [loginame],
    [program_name]
from sys.sysprocesses
where [dbid] NOT IN (0)
group by [hostname], [loginame], [dbid], [program_name]
order by connections desc

Then go from there. Once you know what’s normal, you can then watch for changes to the status quo.


Database Size

Max Size: 512 Petabytes
10% of restriction: 51 Petabytes

51 Petabytes is unwieldy in so many ways I’m not going to list them here. I’m not saying such big data sizes are impossible with a centralized RDBMS like SQL Server. But I am saying that if you’re not sure about whether you can scale SQL Server to that size and you’re reading my blog looking for advice then…

Risk of doing it wrong: Severe


Other metrics

I think Swart’s Ten Percent Rule applies to a lot of things. Try to apply it the next time you come across a SQL Server restriction and let me know if you find anything interesting.

26 Comments »

  1. I wish SQL automatically give you a warning when you pass one of these limits: “You are at 10% of the limit. You are doing it wrong!”

    Comment by Haydar — May 29, 2013 @ 9:53 am

  2. One of our vendor databases has a core table with 209 columns, 192 of which have the word ‘custom’ somewhere in the name, along with a numbering scheme that seems to have changed over the years. Each column is of varying type and size. And yes, the vast majority of these columns are being used.

    You can imagine how difficult it was integrating it with our other systems.

    Comment by Jeremy Marx — May 29, 2013 @ 10:13 am

  3. 209 columns! That’s so crazy. Doesn’t a little mental alarm bell go off at around column #60? How do you get to #209? I guess the reasoning is “In for a penny in for a pound. Screw it, lets add another more columns!”

    Comment by Michael J. Swart — May 29, 2013 @ 10:21 am

  4. How about the 900 bytes limitation on indexes length? That drives our development team CRAZY!

    Comment by Peter Tran — May 29, 2013 @ 4:00 pm

  5. I have to admit that I am guilty of breaking the 10% rule for table names. A database I designed had an average of 18 characters per table name, with two tables coming in first with 53 characters each, plus numerous others in the 40s. I blame the German in me – growing up with words like Donaudampfschifffahrtskapitaensmuetze makes one more acceptable of these long names.

    Comment by Karl Baader — May 29, 2013 @ 11:11 pm

  6. Hey Karl,
    “Donaudampfschifffahrtskapitaensmuetze” That’s a whopping 37 letters.
    I’ve heard of this German habit of running words together. Mark Twain has a go at the German language and mentions this example that he came across this one once “Generalstaatsverordnetenversammlungen” at 36 letters.

    I’ve learned recently about a bizarre abbreviation practice promoted once by IBM, wouldn’t it be easier to spell Donaudampfschifffahrtskapitaensmuetze and Generalstaatsverordnetenversammlungen as D35e and G34n?

    But take heart, there’s no guilt involved. I only assessed the 10% table name length restriction as a low risk that you’re doing it wrong.

    Comment by Michael J. Swart — May 29, 2013 @ 11:35 pm

  7. Hey Peter, I completely agree

    I never treated index length in my post. I think Swart’s rule applies. I would give a 90 byte key length a “High” risk of doing it wrong despite what developers or anyone else says. :-)

    Thanks for stopping by!

    Comment by Michael J. Swart — May 29, 2013 @ 11:39 pm

  8. Actually 100+ columns is not uncommon in an OLAP/star schema design, Dimension tables can get very wide unless you start snowflaking (which has it’s own issues).

    Comment by Jim — May 31, 2013 @ 9:46 am

  9. Good point Jim, And you’re absolutely right.

    Comment by Michael J. Swart — May 31, 2013 @ 10:01 am

  10. On May 29th, @AdamMachanic tweeted: “Hm. Number of worker threads? Total memory? Number of CPUs? It’s a cool idea but not universal.”

    No, it’s not universal, but I thought I’d assess these limits any way.

    Total memory: That’s not a limit enforced by SQL Server is it?
    Number of CPUs: Also no limit enforced by SQL Server (depending on edition).

    These are more limited by either your wallet or the operating system, but not SQL Server.

    Maximum worker threads is limited to 65535. 10% of that is 6554 worker threads which is not in the range recommended for this setting. It’s not an area I’m familiar with so risk of doing it wrong: Elevated maybe?

    Comment by Michael J. Swart — May 31, 2013 @ 10:25 am

  11. “By the way, the most notable modeling smell coming from this table was the columns named [Int01] to [Int30]).”

    I came across something like this one in a version control system SaaS app. The database had a table with a row for each document with various metadata, along with BLOB columns named Version1, Version2, …, Version19 storing versions of the document. (I guess no one needed more than 19 versions.) Not surprisingly this table contained nearly all of the data in the database and was the customer’s main bottleneck.

    About worker threads, I’ve never seen anyone need 6554 worker threads, but I’ve seen cases whether the default number of threads wasn’t sufficient (query notifications with service broker and a 500+ databases).

    Comment by James — June 1, 2013 @ 5:34 pm

  12. I find it dangerous to apply blanket statements like this to a very generic set of criteria. I mean, if all of the table names in your database are <= 12 characters, I don't want to have to be the one who is stuck trying to figure out what the names mean as they will be a mess. Of course, if you do get close to 102 (10% of 1024) columns (and the numbers have been increased to 30000 for a "wide" table using sparse columns, and even then if you get close to 3000 for a wide table) I would suggest you are making work for one of my consultant friends).

    Even things like index key length as a rule might cause people to not use a Unique constraint in a place where it is necessary.

    For most things, it is definitely a threshold to consider, since many limits are super high to allow for a very wide range of uses that haven't been thought of yet…But we do have to be mindful that some readers aren't as smart as you and a few will try to apply the 12 character name thing as a corporate standard :)

    Louis

    Comment by Louis Davidson — June 2, 2013 @ 11:14 am

  13. [...] Swart’s Ten Percent Rule, Michael J. Swart [...]

    Pingback by Evolving Technology Featured In The Daily Six Pack: June 3, 2013 — June 3, 2013 @ 12:01 am

  14. Hi Louis! Thanks for coming by.
    For the 12 character limit, I do mention that 12 is fine and the risk of doing it wrong is “low”. I characterized the risk of doing it wrong as low mostly because I didn’t define a risk category called “non-existant” maybe I should have.

    If someone looks at my post and makes corporate guidelines based solely on this article’s title and H3 headers, then the risk that they’re doing it wrong is “high”. :-)

    I thought I was clear enough that this rule is *not* to be applied as a blanket criteria.

    Comment by Michael J. Swart — June 3, 2013 @ 9:02 am

  15. Hi James,
    Regarding “Version1, Version2, …, Version19″ as 19 blobs in a table. Wow. Maybe the only silver lining here is that it’s one mystery solved. :-)

    Comment by Michael J. Swart — June 3, 2013 @ 9:07 am

  16. Just found a DB from a 3rd party vendor with 88425 columns spread over 2299 tables.
    205 tables have > 102 columns.
    54 tables > 300 cols.
    Top 2 tables, 995 & 880 cols.

    Maybe thats why we are always apprehensive when it comes to updates.

    Comment by Dennis Post — June 3, 2013 @ 9:56 am

  17. Hey Dennis,

    Maybe (like Jim above mentions) it’s an OLAP/star schema design. But even if that’s the case, those numbers seem a bit big. Not knowing anything else about the application, I can’t guarantee what they’re doing is unwise, but I can say that what they’re doing is at least very unusual.

    Comment by Michael J. Swart — June 3, 2013 @ 10:01 am

  18. Sometimes there are more important design rules to follow – $$$$. We have two mainframe datacom applications that were converted to run on SQL Server and MicroFocus COBOL. The tools to do the conversion are automated and don’t include redesign. Somehow it still works, despite COBOL cursors and attention events.

    Comment by Randy in Marin — June 3, 2013 @ 12:16 pm

  19. Hello Randy in Marin!

    Sounds like you’ve found a pretty decent solution for an application which is pretty static and you’re comparing a port versus a redesign.
    And of course design rules, like some of the ones I mention above, are best considered when comparing this design versus that design.

    Michael

    Comment by Michael J. Swart — June 3, 2013 @ 12:47 pm

  20. How about this: almost 5000 databases in one instance!
    The application developers just point to MS saying max is 32000, so they did nothing wrong :-)

    Comment by Shoddy — June 4, 2013 @ 9:26 am

  21. Its true I once inherited a table with 255 columns once. You should have seen the ERD

    Comment by mark — June 4, 2013 @ 12:06 pm

  22. There is a limit of 900 bytes on an indexed column. I propose that limiting indexed columns to 90 bytes be an exception to this rule.

    Comment by Davd Griffiths — June 5, 2013 @ 5:25 pm

  23. Identifier Length – 12 is not enough and too low even as guidance. But I do recommend thirty (30) since that is the maximum size of an Oracle identifier and want our tables and views to be available in both SQL-Server and Oracle.

    Comment by JBWA — June 6, 2013 @ 9:08 pm

  24. I completely agree. Swart’s rule doesn’t apply to identifier length (I thought I said that)

    Comment by Michael J. Swart — June 8, 2013 @ 1:53 pm

  25. To David Griffiths, I disagree. If you regularly have index keys that are greater than 90 bytes, you are definitely doing it wrong.

    Comment by Robert L Davis — August 10, 2013 @ 7:11 pm

  26. [...] Swart’s Ten Percent Rule - Michael J. Swart (Blog|Twitter) shares his thoughts on how you can determine if you’re pushing SQL Server limits perhaps a little too far. [...]

    Pingback by Something for the Weekend - SQL Server Links 31/05/13 • John Sansom — October 26, 2013 @ 3:57 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress