Data Modelling: My Favourite Example

April 6, 2011

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Intro

In this article, I want to explain a pattern that I like to use whenever I model something, it’s fairly straightforward. It involves using identity columns primary keys for all object tables and compound keys for all relationship tables. Surprisingly, even with these restrictions, there’s still a lot of flexibility possible in the data model.

In fact, I conducted an exercise at work where colleagues designed a database model based on this rule of thumb. (The exercise was like pair programming but for db design, ask me about it some time). Attendees were given the instructions: “Identity columns for objects, compound keys for relationships” and I found that there was still a large variety of designs.

For example, ask yourself how you would model a SHIFT table (as in scheduled work)? Is it an object, or is it a relationship? It sounds like it’s own object, but it can also be thought of as a relationship between employee and time slot.

Example

Imagine you want to track the new sport Synchronized Bulk Loading (the sport will make its premier at the 2012 SQLympics in London).

It’s a judged sport and we want to track the marks given by judges to various performances. Here’s what an application might display:

A mockup of a report, or a sample screen shot

For this example I want to capture (at a minimum) the following information

Events (Sport, Location, Date)
Judges (Name, Country)
Atheletes (Name, Country)
Scores (Score)

Every time I see a two dimensional chart like this, I look to see whether it’s appropriate to use this pattern.

A Visual explanation of a db design pattern.

I see this pattern a lot.

And so in this case I would get:

Simple diagram for this example's schema.

I want to compare two different schemas, The first one is my preferred one. It uses identity columns only for object tables (not relationship tables). The second one uses identity columns for everything.

Schema A

CREATE SCHEMA A;
GO
 
CREATE TABLE A.JUDGES
(
	JudgeId INT IDENTITY NOT NULL PRIMARY KEY,
	Name NVARCHAR(100) NOT NULL UNIQUE,
	Country NVARCHAR(100) NOT NULL
);
 
CREATE TABLE A.[EVENTS]
(
	EventId INT IDENTITY NOT NULL PRIMARY KEY,
	[Year] INT NOT NULL,
	Location NVARCHAR(100) NOT NULL,
	UNIQUE ([Year], Location)
);
 
CREATE TABLE A.ATHLETES
(
	AthleteId INT IDENTITY NOT NULL PRIMARY KEY,
	Name NVARCHAR(100) NOT NULL UNIQUE,
	Country NVARCHAR(100) NOT NULL
);
 
CREATE TABLE A.EVENT_JUDGES
(
	EventId INT NOT NULL
		REFERENCES A.[EVENTS](EventId),
	JudgeId INT NOT NULL
		REFERENCES A.JUDGES(JudgeId),
	PRIMARY KEY (EventId, JudgeId)
)
 
CREATE TABLE A.EVENT_ATHLETES
(
	EventId INT NOT NULL
		REFERENCES A.[EVENTS](EventId),
	AthleteId INT NOT NULL
		REFERENCES A.ATHLETES(AthleteId),
	PRIMARY KEY (EventId, AthleteId)
)
 
CREATE TABLE A.SCORES
(
	EventId INT NOT NULL,
	AthleteId INT NOT NULL,
	JudgeId INT NOT NULL,
	Score NUMERIC(3,2) NOT NULL,
	PRIMARY KEY (EventId, AthleteId, JudgeId), 
	FOREIGN KEY (EventId, AthleteId)
		REFERENCES A.EVENT_ATHLETES(EventId, AthleteId),
	FOREIGN KEY (EventId, JudgeId)
		REFERENCES A.EVENT_JUDGES(EventId, JudgeId)
)

Schema B

CREATE SCHEMA B;
GO
 
CREATE TABLE B.JUDGES
(
	JudgeId INT IDENTITY NOT NULL PRIMARY KEY,
	Name NVARCHAR(100) NOT NULL,
	Country NVARCHAR(100) NOT NULL
);
 
CREATE TABLE B.[EVENTS]
(
	EventId INT IDENTITY NOT NULL PRIMARY KEY,
	[Year] INT NOT NULL,
	Location NVARCHAR(100) NOT NULL
);
 
CREATE TABLE B.ATHLETES
(
	AthleteId INT IDENTITY NOT NULL PRIMARY KEY,
	Name NVARCHAR(100) NOT NULL,
	Country NVARCHAR(100) NOT NULL
);
 
CREATE TABLE B.EVENT_JUDGES
(
	EventJudgeId INT IDENTITY NOT NULL PRIMARY KEY,
	EventId INT NOT NULL
		REFERENCES B.[EVENTS](EventId),
	JudgeId INT NOT NULL
		REFERENCES B.JUDGES(JudgeId)
)
 
CREATE TABLE B.EVENT_ATHLETES
(
	EventAthleteId INT IDENTITY NOT NULL PRIMARY KEY,
	EventId INT NOT NULL
		REFERENCES B.[EVENTS](EventId),
	AthleteId INT NOT NULL
		REFERENCES B.ATHLETES(AthleteId)
)
 
CREATE TABLE B.SCORES
(
	ScoreId INT IDENTITY NOT NULL PRIMARY KEY,
	EventAthleteId INT NOT NULL REFERENCES B.EVENT_ATHLETES(EventAthleteId),
	EventJudgeId INT NOT NULL REFERENCES B.EVENT_JUDGES(EventJudgeId),
	Score NUMERIC(3,2) NOT NULL
)

Why I like A over B

Schema A uses identity columns for only JUDGES, EVENTS and ATHLETES. It uses natural keys for every other table. By contrast Schema B uses identity columns for every table, even the relationship tables. I like Schema A over Schema B for a number of reasons. Here are a couple of them:

1. Business Constraints seem easier to enforce.
We get some of this data integrity almost for free! For example, we get to enforce the business rule that scores apply to judges and athletes who are participating in the same event. It is impossible for some rogue application to insert a score that applies to a judge and an athlete who never participated in the same event.

2. Queries turn out easier. e.g. when deleting …

/* Due to unprecedented use of steroids and bribes, delete all marks for an event */
DECLARE @EventIdToDelete INT = 42;
 
DELETE FROM A.SCORES WHERE EventId = @EventIdToDelete;
 
DELETE FROM B.SCORES
WHERE EventAthleteId IN (
	SELECT EventAthleteId
	FROM B.EVENT_ATHLETES
	WHERE EventId = @EventIdToDelete)
OR EventJudgeId IN (
	SELECT EventJudgeId
	FROM B.EVENT_JUDGES
	WHERE EventId = @EventIdToDelete)

3. Queries turn out easier. e.g. when copying …

/* Copy whole event */
DECLARE @EventIdToCopy INT = 43;
DECLARE @NewEventId INT;
 
INSERT A.[EVENTS](Location, [Year])
SELECT Location, [Year]
FROM A.[EVENTS]
WHERE EventId = @EventIdToCopy;
 
SET @NewEventId = SCOPE_IDENTITY();
 
INSERT A.EVENT_ATHLETES (EventId, AthleteId)
SELECT @NewEventId, AthleteId
FROM A.EVENT_ATHLETES
WHERE EventId = @EventIdToCopy;
 
INSERT A.EVENT_JUDGES (EventId, JudgeId)
SELECT @NewEventId, JudgeId
FROM A.EVENT_JUDGES
WHERE EventId = @EventIdToCopy;
 
INSERT A.SCORES(EventId, AthleteId, JudgeId, Score)
SELECT @NewEventId, AthleteId, JudgeId, Score
FROM A.SCORES
WHERE EventId = @EventIdToCopy;
 
/* For schema B, this is difficult and hairy. I could write a sample solution but it's not instructive.
Solutions usually include temp tables, output clauses and a mess of joins. Or you could instantiate
them as objects in an application and persist them again. But it's not as simple as Schema A */

4. Ready for partitioning
By using natural keys, In Schema A we have the option of using a column such as EventId as a partitioning column on all tables. You can’t say the same for Schema B.

But It Seems I’m Arguing Against Myself

It’s true. It does seem like I’m setting up a straw man just to knock it down again. But I see designs like Schema B often so the example is not far fetched. And if something like Schema B makes it into production systems, it gives the that schema an enormous amount of inertia because at that point “refactoring” the schema will cost a lot (in time, money, you name it).

-- Comments (10)

10 Comments »

[…] @MichaelJSwart posts Data Modelling: My Favourite Example Posted on April 6, 2011 by sqlmashup https://michaeljswart.com/2011/04/data-modelling-my-favourite-example/ […]

Pingback by @MichaelJSwart posts Data Modelling: My Favourite Example | sqlmashup — April 6, 2011 @ 1:14 pm
Wouldn’t you want to also include a primary key on A.SCORES?
PRIMARY KEY (EventId, AthleteId, JudgeId) to enforce that a given athlete can only given a single score by a judge for a particular event?
I know this isn’t directly related to the design you’re trying to highlight, but I was just wondering if this is a simple oversight, or if there’s a reason it was left out.

Comment by Aaron Mathison — April 8, 2011 @ 2:46 pm
Thanks so much for stopping by.
Good catch aaron, I can’t believe I missed that. I’ll update the article soon because mistakes like that are distracting aren’t they.

Comment by Michael J Swart — April 8, 2011 @ 4:08 pm
[…] Another good example is the sys.index_columns table. Look at the schema diagram below. It’s a perfect instance of my favourite DB Design example […]

Pingback by Microsoft Model Databases: Some Unconventional Alternatives? | Michael J. Swart — February 9, 2012 @ 8:53 pm
Hey, you spelled Modeling wrong! 🙂

So a post about data modelling with no data models…only DDL. My heart is broken.

Comment by Karen Lopez — February 21, 2012 @ 1:16 pm
Hey Karen,
So by data models, you mean entity relationship diagrams showing logical models and physical models right? (with relationships labelled, etc…).
Then yeah, no data models here. Sorry about that. I’ll make it up to you somehow. 🙂

Re: the spelling of modelling. I seem to remember learning how to spell that properly during a talk about contentious issues. I learn from the best.

Michael

Comment by Michael J. Swart — February 21, 2012 @ 1:41 pm
[…] The SQL you have to write turns out so much cleaner. I hinted about that here: My Favourite Data Modelling Example. […]

Pingback by Data Modelling: Composite Relationships | Michael J. Swart — July 26, 2012 @ 12:02 pm
Hi,
I was thinking a while on your point “1. Business Constraints seem easier to enforce.”, if i can break it.. I agree this is awesome.
I just wonder why did you take Name Unique in Schema A. Is there a specific reason?

Thanks a Ton..!!!!

Nimesh

Comment by Nimesh Parikh — August 7, 2012 @ 5:59 am
Hi Nimesh,

I made Name unique in Schema A because it is a candidate primary key. I did not make Name unique in Schema B because I simply forgot. So people should not assume that I claim anything by that difference. Thanks for catching that.

Comment by Michael J. Swart — August 7, 2012 @ 8:50 am
Good one Michael, Thanks.. 🙂

Nimesh

Comment by Nimesh Parikh — August 8, 2012 @ 12:08 am