So last week I introduced this series where I try to explore the different normal forms by breaking them in absurd ways. This time I’m talking about first normal form.
First Normal Form (1NF)
First normal form has a kind of fuzzy definition. A normalization process was mentioned by E. F. Codd in 1970 which came to be known as first normal form (Even though 1NF has been redefined and clarified a few times since) The paper is called A Relational Model of Data For Large Shared Data Banks. So in this article I’m going to deal with Codd’s description of 1NF in that paper. A database model is in 1NF if it is relational and has
“all relations are defined on simple domains”
Think of relations as tables and simple domains as column spaces that are atomic (i.e. can’t be decomposed). And so today’s ridiculously un-normalized database example is from that paper.
So having tables as values inside tables is not something that SQL Server (or any RDBMS supports) but actually the example was given to explain how to normalize (to 1NF) and to avoid hierarchical data models that were common at the time. These hierarchical data models were and are implemented by systems like IMS and other systems that look like file systems. So in 1970, something like xml might have been easier to understand than relational data models in 1NF. The example above would actually be visualized like this in xml:
<Employee Man#="123" Name="Michael Swart" Birthdate="Nov. 22"> <JobHistory JobDate="2000" Title="Lackey"> <SalaryHistory SalaryDate="2000" Salary="$1,000,000" /> <SalaryHistory SalaryDate="2001" Salary="$2,000,000" /> </JobHistory> <JobHistory JobDate="2002" Title="Senior Lackey"> <SalaryHistory SalaryDate="2002" Salary="$3,000,000" /> <SalaryHistory SalaryDate="2003" Salary="$4,000,000" /> </JobHistory> <Children ChildName="Mini-me 1" Birthyear="2000" /> <Children ChildName="Mini-me 2" Birthyear="2002" /> </Employee>
XML Inside Databases
But in 2005, SQL Server introduced the xml data type. And my take is that it’s good… mostly. It’s as good as long as you treat as an atomic value (i.e. non-decomposable).
- We can think of an xml column as just a string column with a constraint that makes it well formed xml. And that’s not terribly different than an integer column with a constraint that enforces that values are not negative.
- XML columns are perfect for software messages (like SQL Server broker messages).
- And they’re great for serialized objects (like query plans).
XML is the most decomposable data type I can think of and when we treat these columns as non-atomic, then the tables that contain them are not in 1NF. It’s certainly not what E. F. Codd had in mind when he designed the relational data model.
I’ve found that trouble comes when we try to query it or when we modify it in place. Because then we’re not treating the xml as atomic. And then the shortcomings of xml columns inside a database become very apparent. In SQL Server – as of 2005 – there are ways to query xml values or modify xml values such as when we use xml indexes and when we use XQuery. We find complicated code down this road and performance issues that are just as complicated.
- What happens when we want to look at information inside the plan cache? Look at Finding Implicit Column Conversions In The Plan Cache by Jonathan Kehayias. It’s a good demonstration of how complicated solutions are necessary when dealing with XML inside SQL Server.
- Jason Strate has a great series on XQuery (when the benefits outweigh the drawbacks) here: XQuery | Strate SQL
- Or look at my own experiment with XQuery: View Blocked Process Reports using the Deadlock Graph Control It’s an article of mine that gives a crazy XQuery solution to an unimportant problem.
What are your thoughts?
Do you have opinions on 1NF or the history of databases? It’s all fascinating stuff. Let me know in the comments what you think. I’ve got a bunch of ideas that can’t fit in one article and your comments might let me know what to continue writing about.