Michael J. Swart

February 2, 2011

Ridiculously Unnormalized Database Schemas – Part Three

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm
Ridiculously Unnormalized Database Schemas

Today, I describe 3rd Normal Form and give a ridiculous example of a table that breaks the rule badly. I’ve seen 3NF defined this way:

“A relation (table) R, in second normal form is in third normal form if every non-prime attribute of R is non-transitively dependent on every candidate key of R.”

That’s Wikipedia’s definition of third normal form. And they do a decent job explaining what that means. They reference E. F. Codd’s paper Further Normalization of the Data Base Relational Model 1972. But I couldn’t find a reference online and I was too busy this week to dig out my alumni card and visit the Davis Centre Library.

The House That Jack Built

Before we get into the ridiculous example, I want to talk a bit about the English nursery rhyme The House That Jack Built.

It’s a cumulative poem that starts with the line This is the house that jack built. It repeats and expands (like the songs I Know an Old Lady Who Swallowed a Fly or The Twelve Days of Christmas)
By the end of the poem we have a beautiful run-on sentence where the subject of the poem has very very little to do with the house that Jack built. Here is the rhyme:

This is the house that Jack built.

This is the malt that lay in the house that Jack built.

This is the rat that ate the malt
that lay in the house that Jack built.

and so on until

This is the farmer sowing his corn
That kept the cock that crowed in the morn
That waked the priest all shaven and shorn
That married the man all tattered and torn
That kissed the maiden all forlorn
That milked the cow with the crumpled horn
That tossed the dog that worried the cat
That killed the rat that ate the malt
That lay in the house that Jack built.

The cow with the crumpled horn tosses the dog.

But the thing that makes this nursery rhyme interesting is the same thing that makes tables violate third normal form. If we think of each database row as a sentence and the subject as the primary key. This brings us to our ridiculous example.

The Ridiculous Example

Houses (hope you like scroll bars)

Builder

Contents

Content Consumer

Content Consumer Killer

Content Consumer Killer Worrier

Content Consumer Killer Worrier Tosser

Content Consumer Killer Worrier Tosser Milker

Content Consumer Killer Worrier Tosser Milker Spouse

Content Consumer Killer Worrier Tosser Milker Spouse Wedder

Content Consumer Killer Worrier Tosser Milker Spouse Wedder Waker

Content Consumer Killer Worrier Tosser Milker Spouse Wedder Waker Owner

Jack

Malt

Rat

Cat

Dog

Cow (crumpled horn)

Maiden (all forlorn)

Man (tattered and torn)

Priest (shaven and shorn)

Cock (crowed in the morn)

Farmer (sowing corn)

This table is nowhere near third normal form; it’s in a different time zone! This table is supposed to be about houses, not livestock and 16th century English people. So when you’re designing your tables remember: No run on sentences!

The Other Normal Forms

So that’s the series. I could continue with 4NF, 5NF and BCNF (and others!) but I won’t for a few reasons.

  • Most people who pay attention to 1NF, 2NF and 3NF almost always design databases in the other normal forms without trying.
  • The other normal forms are dull enough that it becomes a challenge to actually come up with a counter example that can seriously be called ridiculous
  • I’ve never heard of any problems encountered by anyone caused by not paying attention to the other normal forms. If you know of any, let me know!

Liked This Series?

Hey you, yeah you. The web surfer. Thanks for coming by. I’m glad you’re visiting this post. That’s what I wrote it for. If you liked this series, you may also like my series on Transaction Levels. Or maybe have a look at some of my favorite posts (or some of yours).

Now clicking the links, that’s the first step. Subscribing to my RSS feed – That’s a whole other thing. So here’s the the deal, if you put my blog into your RSS reader, I promise to write stuff worth belonging in it. Go on, subscribe.

January 26, 2011

Ridiculously Unnormalized Database Schemas – Part Two

Filed under: Data Cartoons,SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm

Ridiculously Unnormalized Database Schemas

Second Normal Form (2NF)

I’ve seen 2NF defined this way:

“A table is considered 2NF if it is 1NF and if its non-prime attributes depends on the whole of every candidate key not just part of it.

It means that for tables that have multicolumn primary keys (or could), the other columns depend on all the columns in the key, not just a subset. If you follow this rule you’ll see that a field in a record will contain data that is a fact only about the record it belongs to, not others. From another point of view, I’ve seen normalization defined as removing redundant data

Get rid of redundant data (Star Trek's Data gets fired)

Things to Remember

  • For 2NF, you only need to look at the “non-prime attributes” or in other words the attributes that aren’t part of the primary key.
  • Look at these columns and ask whether they depend on the whole primary key.
  • Tables with a single-column primary keys are automatically in 2NF
  • BUT it’s not fair to make identity columns your primary keys on every table and call your job done (The definition of 2NF closes this loophole by mentioning candidate keys).

The Example

Take a look at the following table. It tracks reviews from the talent search t.v. show “American Idol”. The primary key is defined on the columns (Date, Singer, Song, Reviewer). Look at each other column and ask whether it depends on the whole key, or a subset of the key.

American Idol Reviews

Date

Singer

Song

Reviewer

Age

Show Order

Show Theme

Result

Review

Aug-21

Justin Guarini

Before Your Love

Paula Abdul

23

1

Finale

Runner-Up

Beautiful

Aug-21

Justin Guarini

Before Your Love

Randy Jackson

23

1

Finale

Runner-Up

A little pitchy dawg

Aug-21

Justin Guarini

Before Your Love

Simon Cowell

23

1

Finale

Runner-Up

Rubbish

Aug-21

Kelly Clarkson

A Moment Like This

Paula Abdul

23

2

Finale

Winner

Beautiful

Aug-21

Kelly Clarkson

A Moment Like This

Randy Jackson

23

2

Finale

Winner

A little pitchy dawg

Aug-21

Kelly Clarkson

A Moment Like This

Simon Cowell

23

2

Finale

Winner

Rubbish

Aug-21

Justin Guarini

Get Here

Paula Abdul

20

3

Finale

Runner-Up

Beautiful

Aug-21

Justin Guarini

Get Here

Randy Jackson

20

3

Finale

Runner-Up

A little pitchy dawg

Aug-21

Justin Guarini

Get Here

Simon Cowell

20

3

Finale

Runner-Up

Rubbish

Aug-21

Kelly Clarkson

Respect

Paula Abdul

23

4

Finale

Winner

Beautiful

Aug-21

Kelly Clarkson

Respect

Randy Jackson

23

4

Finale

Winner

A little pitchy dawg

Aug-21

Kelly Clarkson

Respect

Simon Cowell

23

4

Finale

Winner

Rubbish

Aug-21

Justin Guarini

A Moment Like This

Paula Abdul

23

5

Finale

Runner-Up

Beautiful

Aug-21

Justin Guarini

A Moment Like This

Randy Jackson

23

5

Finale

Runner-Up

A little pitchy dawg

Aug-21

Justin Guarini

A Moment Like This

Simon Cowell

23

5

Finale

Runner-Up

Rubbish

Aug-21

Kelly Clarkson

Before Your Love

Paula Abdul

23

6

Finale

Winner

Beautiful

Aug-21

Kelly Clarkson

Before Your Love

Randy Jackson

23

6

Finale

Winner

A little pitchy dawg

Aug-21

Kelly Clarkson

Before Your Love

Simon Cowell

23

6

Finale

Winner

Rubbish

You can see that with the exception of the column Review that all the columns in tables are not dependent on the whole key. We can pull these columns into separate tables:

Normalized:

ShowContestants

Singer

Date

Age

Result

Justin Guarini

Aug-21

23

Runner-Up

Kelly Clarkson

Aug-21

20

Winner

Performances

Date

Singer

Song

Show Order

Aug-21

Justin Guarini

Before Your Love

1

Aug-21

Kelly Clarkson

A Moment Like This

2

Aug-21

Justin Guarini

Get Here

3

Aug-21

Kelly Clarkson

Respect

4

Aug-21

Justin Guarini

A Moment Like This

5

Aug-21

Kelly Clarkson

Before Your Love

6

Shows

Date

Show Theme

Aug-21

Finale

We still have a table to hold all the reviews by the judges. Defined as:

    Reviews(Date, Singer, Song, Reviewer, Review)

But it’s still a point of debate whether or not the reviews depend on the whole primary key or a subset of the key (especially based on the example).

January 19, 2011

Ridiculously Unnormalized Database Schemas – Part One

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm
Ridiculously Unnormalized Database Schemas

So last week I introduced this series where I try to explore the different normal forms by breaking them in absurd ways. This time I’m talking about first normal form.

First Normal Form (1NF)

First normal form has a kind of fuzzy definition. A normalization process was mentioned by E. F. Codd in 1970 which came to be known as first normal form (Even though 1NF has been redefined and clarified a few times since) The paper is called A Relational Model of Data For Large Shared Data Banks. So in this article I’m going to deal with Codd’s description of 1NF in that paper. A database model is in 1NF if it is relational and has

“all relations are defined on simple domains”

Think of relations as tables and simple domains as column spaces that are atomic (i.e. can’t be decomposed). And so today’s ridiculously un-normalized database example is from that paper.
Good and bad relational schemas
So having tables as values inside tables is not something that SQL Server (or any RDBMS supports) but actually the example was given to explain how to normalize (to 1NF) and to avoid hierarchical data models that were common at the time. These hierarchical data models were and are implemented by systems like IMS and other systems that look like file systems. So in 1970, something like xml might have been easier to understand than relational data models in 1NF. The example above would actually be visualized like this in xml:

<Employee Man#="123" Name="Michael Swart" Birthdate="Nov. 22">
    <JobHistory JobDate="2000" Title="Lackey">
            <SalaryHistory SalaryDate="2000" Salary="$1,000,000" />
            <SalaryHistory SalaryDate="2001" Salary="$2,000,000" />
    </JobHistory>
    <JobHistory JobDate="2002" Title="Senior Lackey">
            <SalaryHistory SalaryDate="2002" Salary="$3,000,000" />
            <SalaryHistory SalaryDate="2003" Salary="$4,000,000" />
    </JobHistory>
    <Children ChildName="Mini-me 1" Birthyear="2000" />
    <Children ChildName="Mini-me 2" Birthyear="2002" />
</Employee>

XML Inside Databases

But in 2005, SQL Server introduced the xml data type. And my take is that it’s good… mostly. It’s as good as long as you treat as an atomic value (i.e. non-decomposable).

  • We can think of an xml column as just a string column with a constraint that makes it well formed xml. And that’s not terribly different than an integer column with a constraint that enforces that values are not negative.
  • XML columns are perfect for software messages (like SQL Server broker messages).
  • And they’re great for serialized objects (like query plans).

But …

XML is the most decomposable data type I can think of and when we treat these columns as non-atomic, then the tables that contain them are not in 1NF. It’s certainly not what E. F. Codd had in mind when he designed the relational data model.
An upset looking E. F. Codd
I’ve found that trouble comes when we try to query it or when we modify it in place. Because then we’re not treating the xml as atomic. And then the shortcomings of xml columns inside a database become very apparent. In SQL Server – as of 2005 – there are ways to query xml values or modify xml values such as when we use xml indexes and when we use XQuery. We find complicated code down this road and performance issues that are just as complicated.
For example:

What are your thoughts?

Do you have opinions on 1NF or the history of databases? It’s all fascinating stuff. Let me know in the comments what you think. I’ve got a bunch of ideas that can’t fit in one article and your comments might let me know what to continue writing about.

January 12, 2011

Ridiculously Unnormalized Database Schemas – Part Zero

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm
Ridiculously Unnormalized Database Schemas

I’m presenting a small series of articles that describes database normalization. But the topic is already covered quite well by Wikipedia’s Database Normalization article. And so in this series I’m going to explain database normalization by using examples. Actually by counter-example. The goal is to come up with counter-examples that break the rules so bad that you’ll never forget the rules for each normal form.

DB Normalization is a good thing

But I’m not going to explain why. At least not in depth.  That’s a topic for another day. Let me just say that SQL Server – and Oracle and DB2 and the rest – were designed to support Normalized Databases. Designing Normalized databases makes the best use of SQL Server and doing so avoids a lot of messy problems.

Before We Start…

I’m not going to jump into first normal form (1NF). That’s next article. All database schemas that are 1NF must be relational. So in this article I’m looking at information stores that aren’t relational.

I start by giving an example of an information container. It’s not a database schema, It’s not even a data model (strictly).

The Human Brain

The Human Brain

So as an information store, the brain is not super-reliable but it has it’s place. I chose this example because if you’re not in I.T., the brain still holds a ton of information!

  • To the theoretical physicist, it has a lot information by virtue of the fact that it’s a physical object.
  • The chemist and biologist would be fascinated by the information stored in the DNA of every cell.
  • And of course if connected to a living body, the brain has information for everyone else, in the usual sense (just ask!)

The Relational Database Model

Getting back to databases. Specifically ones that aren’t relational.

It’s surprising that DB normalization was defined at the exact same time as the Relational Database Model (Well, not exactly the same time. They were a chapter apart in Codd’s paper A Relational Model of Data for Large Shared Data Banks 1970)

So for a schema to even be considered for first normal form, it has to be relational. And while SQL Server is called an RDBMS, it turns out to be surprisingly easy to define tables that aren’t relational. Take this example:

-- not a relational table
CREATE TABLE Employee (
    EmployeeNumber INT NOT NULL,
    FirstName NVARCHAR (50) NOT NULL,
    LastName NVARCHAR (50) NOT NULL
)

There’s no primary key! If your tables are missing primary keys, then you’re automatically disqualified. No 1NF for you! (Incidentally the same goes for missing foreign keys.)

In the next few articles I leave the non-relational stuff behind. We’ll look at examples of relational tables (don’t worry, there’s still a lot of ridiculousness coming).

Powered by WordPress