Michael J. Swart

November 28, 2011

SQL Fingertips

They Might Be Giants (TMBG) released an album called Apollo 18. It has 38 tracks and it runs for 42 minutes. It’s an album that will always remind me of my friend and first University roommate (Hey, what’s up Chris!). So the average track is about a minute long and this is due to a set of 21 tracks collectively called “fingertips”. Most of these fingertips are five to fifteen seconds long. Some are a bit longer. These fingertips are probably not good enough to be put on an album on their own. But collectively, they make a really cool album (especially on shuffle mode).

The List

So when I found myself with a collection of tiny tips that might not qualify as a blog post on their own, I thought of that album and a microsecond later, we’ve got SQL Fingertips!

So here they are, in no particular order.

  • Stackoverflow tip: Use the bounty feature without being afraid of the hit to your reputation. A bounty offered on a decent question usually attracts enough attention to get your question up-votes. And unless it’s a crummy question, those up-votes will earn you back the reputation points you spent.
  • T-SQL tip: You can use the following script as a template to be able to run a single query on multiple databases and have it return a single result set:
    DECLARE @temp TABLE (
    	ServerName sysname,
    	DBName sysname,
    	[Row Count] int
    );
     
    declare @schemaname sysname = 'Production';
    declare @tablename sysname = 'Product';
    declare @sql nvarchar(max) = N'use [?];
    	if exists (select 1 from sys.tables where name = ''' + @tablename + N''' and SCHEMA_NAME(schema_id) = ''' + @schemaname + N''')
    		exec sp_executesql N''
    			select @@servername, db_name(), Count(1) as [count]
    			from ' + QUOTENAME(@tablename) + '''';
     
    insert @temp (ServerName, DBName, [Row Count])
    exec master..sp_MSforeachdb @sql;
    select * from @temp;
  • SQL Server Management Studio tip: GO can take a parameter. If you write GO 100, your batch will be executed 100 times. Many people know this but I always here a couple “wow”s each time I demonstrate it. And here’s a couple more notes about that batch separator.
    • Don’t forget to SET ROWCOUNT OFF.
    • The batch separator doesn’t have to be GO. It’s configurable! I used to set it to ENGAGE for a while, but the joke got old and I switched it back.
  • Blog tip: If you like following blogs or people, but maybe you’re falling out of love with your RSS reader. No worries. You can get any RSS feed at all sent straight to your inbox.  There are lots of RSS-to-inbox websites out there. The one I recommend is FeedMyInbox.com. So, for example, to get my own blog delivered to your inbox, start here.
  • Data → Eye → Brain, A Data Visualization tip: Data visualizations (charts, diagrams, etc..) help us gain knowledge and understand data that maybe we couldn’t understand otherwise (See Anscombe’s Quartet). And for large and complex datasets, the best visualizations are usually custom designed and interactive. There’s a lot of power in customized visualizations: See what David McCandless and Miriah Meyers have to say.
  • Twitter DBA Tip #1: I put a call on twitter asking people to tweet the one database tip they thought more people should know. The first one comes from @BrentO: “Don’t leave your database in full recovery mode if you’re not doing t-log backups.” He’s right. Full recovery mode usually implies a backup strategy that includes transaction log backups. (And full or differential backups are not enough either).
  • Twitter DBA Tip #2: This one’s from @wnylibrarian“If using SQLExpress & no SQLAgent, try using SQLCMD in a batchfile.” I’m glad Gary tweeted this one. I’ve come up against this problem in my own work. It turns out that when downgrading editions of SQL Server, often one of the first things that you miss is SQL Agent. If you follow Gary’s advice, it’s not a hurdle, it’s only an inconvenience.
  • Twitter DBA Tip #3: Next I heard from @DaveH0ward who wants to remind people that “You need to size your data /log files and set autogrowth amounts appropriately.” I think a lot of people don’t know that the default auto-growth settings for SQL Server databases are a bad idea. Tim Ford warns us of dire consequences for not following this tip.
  • Twitter DBA Tip #4: Finally I heard from @SQLSandwhiches (aka Adam Mikolaj). He shared a pet peeve “I’ve seen so many servers that don’t have logs to auto cycle. Don’t you love opening a log file with 300,000 entries?” Adam wrote about his experience earlier this year (including a solution of course).
  • MixedTape Fingertip Tip: Say you find yourself back in the nineties making a mixed tape and you find that you have a minute left to fill on one side of the tape. They Might Be Giants to the rescue! Pick several fingertips from their Apollo 18 album and optimize every second of that mixed tape! My personal favorite is the 45 second Minimum Wage off of their other album Flood.

So how many of these SQL Fingertips did you know? Have you got a small tip? Share it in the comments. Have you got a whole bunch of small tips? Write your own SQL Fingertip post. I’d be especially grateful to read a set of Business Intelligence fingertips.

October 19, 2011

Secret Santa as a Puzzle

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm

So every year, the adults in my family pick names to do a gift exchange. I guess lots of people do this and they call it Secret Santa.

In my family there are five couples that participate and we draw names from a hat. Inevitably, someone will draw their own name or the name of their spouse and everyone puts the names back in the hat. It usually takes us quite a few tries before we get it right.

This past thanksgiving (Canadian) we went through this again. It’s fun and we always give the person who picked their own name a hard time. But I got to thinking about this as a puzzle:

“How can we exchange names fairly and secretly without the do-overs?”

Well, let’s look at this a few ways:

  • As a mathie.
  • As a computer techie.
  • From a practical point of view.

As a Mathie

I graduated from University of Waterloo with a BMath with a major in CS and C&O*. But since graduating, I’ve focused only on CS (Computer Science) things. The mathie in me wonders: “What are the odds that we pass the hat around successfully?”

I thought long and hard about it but couldn’t crack that question. My textbooks were no help and there were no professors on hand to answer, so I asked http://math.stackexchange.com for help. Hoping that I’d have just as much success with that site as I do with stackoverflow.com. I was not disappointed. Here’s the question I asked:

Five couples draw names from a hat. If a person draws their own name, or the name of their spouse, all the names go back in a hat and names are re-drawn. Using a computer, I know that the probability of this happening is 1 – (440192 / 10!) or about 88%. What’s a general expression for n couples?

And a stackexchange user joriki gave a brilliant answer:

By assigning a letter to each couple, this can be reduced to the problem of finding the number an of anagrams of a word with n different letters, each occurring twice, with no letters fixed. The desired number of permutations is then 2nan, since each of the n couples can be assigned in two ways to the two instances of its letter. Wikipedia mentions this problem as a generalized derangement problem. The general formula given for a word with numbers n1, … ,nr of r different letters is

where Pk is the k-th Laguerre polynomial. In the present case, r=n and ni=2, so we only need the second Laguerre polynomial, which is P2(x)=(1/2)(x2-4x+2). The n factors of (1/2) cancel with the n factors of 2, so the probability of success is

where (2n)! counts the total number of permutations. For n=5, Wolfram|Alpha gives 440192/(10!), as you calculated.

I think it’s a beautiful answer and for the first time in a long time, I missed doing Math. So this math problem and the probabilities are now well understood. But it doesn’t save any time picking names at Thanksgiving does it? Let’s look at it from another point of view.

As a Computer Techie

Using C#
Okay, I understand this subject much better. A C# program is easy to write. In fact, I wrote a quick one and it looks something like this:

static void Main()
{
	List<string> names = new List<string>() {"Mike", "Leanne", "Dave S",
		"Cindy", "Marianne", "Dave B", "Linda", "Matt",
		"Lisa", "Dan" };
 
	List<int> picks = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
 
	while (!IsValid(picks))
	{
		Shuffle(ref picks);
	}
 
	PrintResults(picks, names);
}

You can look at the full program here. It works, but it doesn’t feel like the right solution to the problem: The results aren’t secret for one thing. We’d need a trusted third party to distribute the results. Also notice the while loop, we don’t know for certain if/when the while loop will break, so this program seems a little sloppy.

Using SQL
So now here’s a subject I’m really really comfortable with. I used an approach that my friend Paul Santos explained to me when we talked about this problem. His approach is this:

  • Drawing from a hat is a permutation of 10 names. So generate all the permutations
  • Filter out the invalid permutations
  • Pick a random permutation from the list of valid ones and report that

So here’s me showing all the permutations:

with OneToTen AS
(
	SELECT n
	FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) as N(n)
)
SELECT A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n, J.n
FROM OneToTen A
JOIN OneToTen B
	ON B.n NOT IN (A.n)
JOIN OneToTen C
	ON C.n NOT IN (A.n, B.n)
JOIN OneToTen D
	ON D.n NOT IN (A.n, B.n, C.n)
JOIN OneToTen E
	ON E.n NOT IN (A.n, B.n, C.n, D.n)
JOIN OneToTen F
	ON F.n NOT IN (A.n, B.n, C.n, D.n, E.n)
JOIN OneToTen G
	ON G.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n)
JOIN OneToTen H
	ON H.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n)
JOIN OneToTen I
	ON I.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n)
JOIN OneToTen J
	ON J.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n)
OPTION (FORCE ORDER, MAXDOP 1)

To filter out the invalid permutations, I add a where clause. To report a random permutation, I order the results by newid() and select the top 1 row:

with OneToTen AS
(
	SELECT n
	FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) as N(n)
)
SELECT TOP (1) A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n, J.n
FROM OneToTen A
JOIN OneToTen B
	ON B.n NOT IN (A.n)
JOIN OneToTen C
	ON C.n NOT IN (A.n, B.n)
JOIN OneToTen D
	ON D.n NOT IN (A.n, B.n, C.n)
JOIN OneToTen E
	ON E.n NOT IN (A.n, B.n, C.n, D.n)
JOIN OneToTen F
	ON F.n NOT IN (A.n, B.n, C.n, D.n, E.n)
JOIN OneToTen G
	ON G.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n)
JOIN OneToTen H
	ON H.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n)
JOIN OneToTen I
	ON I.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n)
JOIN OneToTen J
	ON J.n NOT IN (A.n, B.n, C.n, D.n, E.n, F.n, G.n, H.n, I.n)
WHERE A.n NOT IN (1, 2)
	AND B.n NOT IN (1, 2)
	AND C.n NOT IN (3, 4)
	AND D.n NOT IN (3, 4)
	AND E.n NOT IN (5, 6)
	AND F.n NOT IN (5, 6)
	AND G.n NOT IN (7, 8)
	AND H.n NOT IN (7, 8)
	AND I.n NOT IN (9, 10)
	AND J.n NOT IN (9, 10)
ORDER BY NEWID()
OPTION (FORCE ORDER, MAXDOP 1)

So I’m really happy with this. But only as a solution to a puzzle. The only thing it has going for it is that it’s guaranteed to halt. It doesn’t take less cpu than the C# program and we’re further away from telling relatives who they’re meant to buy presents for!

A Practical Point Of View

In real life, using authentication schemes and trusted third parties is a great way to bring any holiday party to a halt Mr. Buzz Killington (No more stuffing for you). So what kind of practical things can you do to make things easier or avoid do-overs? Here’s a couple ideas:

  • There are many many websites that are built just for this problem: SecretSanta.com, Elfster, and DrawNames.com. (I’m not making those up!) I think there’s even a facebook or iPhone app for that. I haven’t looked too closely at these so I can’t vouch for them.
  • You could write people’s names on an old deck of cards. Shuffle those cards and deal them out. You don’t avoid any problems this way – I mean, you won’t avoid getting your own name – but shuffling and dealing is quicker than passing around a hat.
  • If you pick your own name, replace it and pick again. The exchange won’t be 100% secret, but most people don’t care. This is what most Secret Santas are like.
  • Status quo. Do nothing. This is probably what my family will stick with. We’ll continue to pass around the hat and give people a hard time for picking their own names.

Have you got any other ideas? If you do a gift exchange with your family, what does your family do?

*~ CS and C&O are short for “Computer Science and Combinatorics and Optimization”, my first year at University was spent learning to pronounce that correctly.

August 23, 2011

ACID Properties By Example (And Counterexample) Part Four: Durable

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 9:00 am
ACID Properties By (Counter) Example

The last ACID property is D, Durability. Again, Haerder and Reuter describe Durability:

“Once a transaction has been completed and has committed its results to the database, the system must guarantee that these results survive any subsequent malfunctions.”

What does this mean exactly? That’s a tall order for a database system! I mean any malfunction whatsoever? I’m pretty sure our database systems are designed to survive a power failure but I don’t expect that they could survive something as severe as the heat death of the universe.

Actually databases don’t have to go that far. When designing a database system, only two kinds of malfunctions are considered: media failure and system failure.

Media Failures

For media failure (e.g. a faulty hard drive) databases are recovered by using backups and transaction logs. And this leads directly to three bits of super-common DBA advice:

  • Take backups regularly.
  • Keep your transaction logs and your main database files on different hard drives.
  • When dealing with a disk failures, step one is backing up the tail of the log

System Failures

System failures (e.g. system crashes, power outages etc…) have to be handled too.

SQL Server does it this way. When SQL Server is processing transactions, it will first write changes to a transaction log and then write the associated changes to the database file. Always always in that order (There’s a bit more too it, but that’s the main part). It’s called the Write-Ahead Transaction Log.

But when there’s a system malfunction, a few things need to be cleaned up the next time the server restarts (to maintain atomicity and consistency). There may be transactions that were interrupted and not yet committed. And some transactions may not have their changes written to disk, or sometimes not written completely to disk. How do you recover from stuff like that?

Well the database recovers from a failure like that during a startup process called (unsurprisingly) “recovery”. It can look at these half-performed transactions and it can roll them back using the info in the logs. Or alternatively it can roll-forward and replay committed transactions that haven’t made it to disk if the conditions are right and there’s enough info in the transaction log to do so. (Further Information at MCM Prep Video: Log File Internals and Maintenance)

So What Does This Mean To You?

If an ACID database system like SQL Server reports that your transaction has committed successfully then because it’s durable, your transaction is truly persisted: You don’t have to worry about buffer flushes or power failures “losing” your work.

Example

So what is interestingly durable? Durability in database systems usually means that something is redundant so that if one thing is lost, the transaction is not lost. So I give a list here of things that are too redundant:

  • The Hydra‘s heads (Greek Mythology)
  • Enchanted Brooms from the Sorcerer’s Apprentice.
  • Autofac (An interesting short story by Philip K. Dick which I finished reading last night).

Counter-Example

I have two examples and they both come from the career of Richard Harris (best known to my family as the first Dumbledore). Did you know he was a one-hit wonder? He had a hit single in the seventies called MacArthur Park. If you’ve never heard the song, skip this article and experience the utter madness that is MacArthur Park. You won’t regret it.

Back to the example. The singer of MacArthur Park would like to have his cake. Unfortunately, it’s been left out in the rain (malfunction). But that’s okay right? He could always get out the recipe (transaction log) and make a new one right? Wrong! He’ll never have that recipe again (durability fail). Had he persisted that recipe, the poor sucker would still have his cake.

Bonus Richard Harris Counterexample

You may remember he played Emperor Marcus Aurelius in the movie Gladiator. (Spoiler alert!) In that movie, he plans to make Maximus his heir instead of his son Commodus. He first tells his plans to Maximus (who is reluctant to rule Rome) and then he tells Commodus who did not take the news well at all. In fact he murdered his father after hearing it!  The Emperor’s plans never make it to the public and so Commodus becomes Emperor.

You see, his plans to make Maximus his heir was not durable! Had the Emperor told a bunch of other people first, then his intended heir Maximus would have ruled Rome as he wanted (Not to mention it would have removed the motive for his murder).

That’s The Series

So that’s it. I had fun with it. It gave me a chance to “geek out”. And even though blog post series are a nice way of treating a topic in depth, I still found myself struggling to keep each article to blog-post length. There’s just so much to learn here. I guarantee I learned more writing this series than a reader would reading it 😉

Tell me what you think!

August 10, 2011

ACID Properties By Example (And Counterexample) Part Three: Isolation

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

So the third ACID property of database transactions is I for Isolation. This is the only ACID property that deals with behaviour of a transaction with respect to other concurrent transactions (all the other properties describe the behaviour of single transactions). Haerder and Reuter describe it as:

Isolation: Events within a transaction must be hidden from other transactions running concurrently.

It’s not super-rigorous, but I think of it like this: No looking at works-in-progress

Back to the drawing board.

(Actually, I don’t always believe in that advice, but it helps the cartoon)

So there are different kinds of database isolation. Even with the the guideline: no looking at other transactions in progress. And now these levels of isolation are well defined. I wrote a series on those earlier, the different levels are READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ and SERIALIZABLE. By the way READ UNCOMMITTED is the only isolation level here that is not really isolated, more on that later.

Isolation in SQL Server

SQL Server supports all of these isolation levels. It enforces this isolation using various locks on data (fascinating stuff actually), processes will wait to maintain isolation. In contrast, Oracle supports only SERIALIZABLE and a kind of READ COMMITTED that is closer in behaviour to SQL Server’s SNAPSHOT isolation. No matter how it’s implemented, READ COMMITTED is the default isolation level in both SQL Server and Oracle.

Unisolated Transactions:

So it is possible for other transactions to see the effects of a transaction in-flight (i.e. as it’s happening, before it’s committed). This is done with NOLOCK hints or with the READ UNCOMMITTED isolation level. In fact, I learned recently that when using NOLOCK hints, you not only can see the effects of an in-flight transaction, but you can see the effects of an in-flight statement. This is an Isolation failure and it boils down to this: SQL Server transactions are atomic, but when using NOLOCK, it might not seem that way. So take care.

Example

Today’s example and counterexample both come from the newspapers headlines of Chicago.

For the example – a fictional example – I explain a situation that’s all about not making assumptions. It’s all about being cautious and not committing to a decision while the jury’s still out. This immediately brought to mind a scene from the movie Chicago [spoiler alert!] :

The movie (and play) is about a court case. The main character Roxie is on trial for murder. It’s a sensational trial and the papers are eager to publish the results of the trial. The papers are so eager in fact that the papers have printed out two editions of their newspapers. One headline read “She’s Innocent” the other headline read “She’s Guilty”. But those two stacks of papers are just sitting there in the van. The man in the newspaper van waits for a signal from the courthouse. Once he got the proper signal, he cracked open the innocent edition and gave them to a paper boy to hand out.

It’s about not acting on information while the jury is still out. The jury is isolated from the world and no one can act on what the jury has to say until they’ve committed to a verdict.

Counter-Example

Our counter-example comes from non-fiction. In reality, the assumptions we make tend to be correct. Our assumptions are only interesting when they turn out to be incorrect. This counter-example comes from the most incorrect newspaper headline I can think of:

“Dewey Defeats Truman”

Click through for Wikipedia’s page on cool piece of newspaper history (Chicago newspaper history). It’s a great example of what can go wrong when we act on tentative (uncommitted) information. The Chicago Tribune published the wrong presidential candidate as the winner.

But the really really cautious reporters would report neither candidate as the winner. They’d be waiting at the Electoral College convention. They’d be keen on seeing how that turns out.

August 3, 2011

ACID Properties By Example (And Counterexample) Part Two: Consistent

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

Dr. Jim Gray developed many of the properties that transactions have in well-designed databases. Later, Haerder and Reuter took these properties and used them to coin the acronym ACID. At that time, they defined consistency this way:

Consistency
A transaction reaching its normal end, thereby committing its results, preserves the consistency of the database. In other words, each successful transaction by definition commits only legal results.

Essentially, consistency means that database systems have to enforce business rules defined for their databases.
But it’s interesting. The word consistency (applied to database systems) aren’t always used the same way! For example, in Brewer’s CAP theorem the C, standing for consistency is defined as “All clients have the same view of the data.” (Really computer scientists?? That’s the word you decide to overload with different meanings?). So if you ever hear someone say eventually consistent. They’re using the consistency term from CAP, not the consistency term from ACID.

I guess “C” means something different for everyone.

Consistency in SQL Server

In my own words, consistency means that any defined checks and constraints are satisfied after any transaction:

  • Columns only store values of a particular type (int columns store only ints, etc…)
  • Primary keys and unique keys are unique
  • Check constraints are satisfied
  • Foreign key constraints are satisfied

Constraints Are Enforced One Record At A Time
Some things you might notice about these constraints. They can all be checked and validated by looking at a single row. Check constraints enforce rules based only on the columns of a single row. One exception is where these constraints might perform a singleton lookup in an index to look for the existence of a row (for enforcing foreign keys and primary keys).
Multi-line constraints are not supported directly because it would be impractical to efficiently enforce consistency. For example, it’s not possible to create a constraint on an EMPLOYEE table that would enforce the rule that the sum of employee salaries must not exceed a specific amount.

Consistency Enforced After Each Statement
Another interesting thing about SQL Server is that while ACID only requires the DBMS to enforce consistency after a complete transaction, SQL Server will go further and enforce consistency after every single statement inside a transaction. It might be nice to insert rows into several tables in any order you wish. But if these rows reference each other with foreign keys, you still have to be careful about the order you do the inserting, transaction or no transaction.

Handling Inconsistencies
When SQL Server finds inconsistencies. It handles it in one of a few ways.

  • If a foreign key is defined properly, a change to one row can cascade to other other rows.
  • If a value of a particular datatype is inserted into a column which is defined to hold a different datatype, SQL Server may sometimes implicitly convert the value to the target datatype.
  • Most often, SQL Server gives up and throws an error, rolling back all effects of that statement.

Inconsistent Data Any Way
It also turns out that it’s very easy to work around these constraints! (Besides the all-too-common method of not defining constraints in the first place). Primary keys, Unique constraints and datatype validation are always enforced, no getting around them. But you can get around foreign keys and check constraints by

  • using WITH NOCHECK when creating a foreign key or a check constraint. You’re basically saying, enforce any new or changing data, but don’t bother looking at any existing data. These constraints will then be marked as not trusted
  • using the BULK INSERT statement (or other similar bulk operations) without CHECK_CONSTRAINTS. In this case foreign keys and check constraints are ignored and marked as not trusted.

Example

I’m taking the following example not from I.T., but from the world of medical labs.

When processing medical tests (at least in my part of the world), there’s a whole set of rules that medical professionals have to follow. Doctors and their staff have to fill in a requisition properly. The specimen collection centre has do verify that information, take samples and pass everything on to a lab. The lab that performs the tests, ensures that everything is valid before performing the test and sending back results to the doctor.

Just like a database transaction, the hope is that everything goes smoothly. All patient information is entered properly. Patients and lab techs have followed all appropriate instructions.

Fixing Inconsistent Data: It sometimes happens that information is entered incorrectly or missing (like insurance info, or the date and time of the test). In these cases, often the lab might call back for corrections before continuing with the test. This is similar to the case when SQL Server recognizes that a statement will not leave the database in a consistent state. In some cases, SQL Server can try to do something about it. For example it can do an implicit conversion of a datatype, or it can cascade a delete/update.

Giving Up And Rolling Back: But sometimes a medical test can’t be saved. For example, sometimes a sample arrives clotted when it should have arrived unclotted (or vice versa). In these cases, meaningful results aren’t possible and the whole test has to be rejected to be performed again correctly. SQL Server will do this whenever it’s necessary to maintain consistent data. It will raise an error and the entire statement or transaction is undone (to be corrected and performed again).

Counterexample

Well, this counterexample comes from the world of cheesy Science Fiction. Normally we want our databases to store only consistent and legal data. Any illegal data should be rejected right away.   What we don’t want is for our databases to get hung up on some crazy inconsistent data.

But if you’re Captain Kirk and you need to deal with a rogue computer or robot that’s acting up. What do you do? Simple, confuse it with inconsistent information! Those robots won’t know what hit them.

This bit of dialog comes straight from an episode of Star Trek called “I, Mudd” (I’m not even making this up, Google it!)

Kirk: Norman, Everything Harry tells you is a lie, remember that, everything Harry tells you is a lie.
Harry Mudd: Listen to this carefully Norman: I am lying.
[Norman the android starts beeping, his light starts flashing and his ears start smoking]
Norman: You say you are lying but if everything you say is a lie then you are telling the truth but you cannot tell the truth because everything you say is a lie but you lie, you tell the truth but you cannot for you lie … Illogical! Illogical!
[more of the same, more smoke and kaboom]

Honest-to-God smoke from the ears! It’s so classic it gets parodied a lot. (Here’s one of my favourites, a comic from Cyanide and Happiness).

I’m grateful that our databases don’t choke on inconsistent data. They just throw an error and tell clients “Here, you deal with it!”.

July 27, 2011

ACID Properties By Example (And Counterexample) Part One: Atomic

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

In the 1800s scientists first used the word atom to describe the smallest bits of an element. They picked that word because it meant un-cuttable or indivisible. The idea was that that’s as far as you can go; you can’t break down these bits further. Fast forward to the early 1900s and they found out that atoms actually can change (through radiation or splitting). But it was too late to change the language. Splittable or not, atoms are atoms.

This process of splitting atoms is so interesting that somehow the word atomic has come to refer this process of dividing atoms (e.g. atomic bomb, atomic energy).

It's interesting: For most people, the word atomic only refers to a process where atoms are split.

Atomic Transactions in SQL Server

But when we talk about the word atomic as one of the ACID properties of transactions, the word regains its original meaning: indivisible. SQL Server transactions are always atomic. They’re all-or-nothing. To twist Meatloaf’s words, this means that two out of three is bad (It’s got to be three out of three or nothing). It’s often forgotten, but this applies to single statement transactions too; all of a statement (whether an update, delete or insert) will happen or not happen.

To guarantee atomicity, SQL Server uses a Write Ahead Transaction Log. The log always gets written to first before the associated data changes. That way, if and when things go wrong, SQL Server will know how to rollback to a state where every transaction happened or didn’t happen. There’s a lot more to it than that, but as a developer all I care about is that I can trust that my transactions don’t get split.

Example

Here’s an example from outside the I.T. industry. It’s a story about an all or nothing transaction. About two years ago, Samoa switched from driving on the right side of the road to the left (The NYT has a great article on it).
You can imagine the great effort that must go into a switch like this. And it has to happen all or nothing. The switch has to happen everywhere, all at once with no exceptions. Unlike other big projects that can usually be broken down into smaller phases, this one can’t.
Translated into SQL, this might be equivalent to:

BEGIN TRANSACTION
 
UPDATE ROADS
SET TrafficDirection = 'Left'
WHERE Country = 'Samoa';
 
UPDATE TRAFFIC_LIGHTS
SET TrafficDirectionMode = 'Left'
WHERE Country = 'Samoa';
 
UPDATE INTERSECTIONS
SET TrafficDirectionConfigurationMode = 'Left'
WHERE Country = 'Samoa'
 
COMMIT

It’s not an I.T. example, but you get the idea. If this “transaction” were not atomic there would be trouble!

Counter Example

An example of failed atomicity (outside I.T.). One word: Standards.
Say you want to create a universal standard for something (say the Metric system) the main purpose is to create it to be the single standard to replace all others. If you fail in your goal, you’ve added to the problem!
Some more successful universal standards:

  • http (over gopher etc…) for almost all web traffic
  • Blueray (over hd-dvd) for hi-def movie formats

But consider the Metric system. It’s mostly successful because of its large adoption. But because there are a few stragglers, it’s not as successful as it could be. Translated into SQL:

UPDATE EVERYTHING
SET Units = 'METRIC',
    Value = fn_Convert(Value, Units, 'METRIC')
-- no where clause!

This “statement” didn’t update everything. The “statement” wasn’t atomic and this continues to cause problems. One problem that comes to mind is the failed Mars Climate Orbiter mission.

Let’s be grateful for the all-or-nothing transactions we have in our databases!

July 20, 2011

ACID Properties By Example (And Counterexample) Part Zero

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 12:00 pm

ACID Properties By (Counter) Example

So I’m introducing a small series about ACID properties as it applies to databases. (For other acid properties, talk to a chemist or Timothy Leary).

When talking about databases, ACID is an acronym that stands for Atomic, Consistent, Isolation and Durable. These are important properties of a database system’s architecture. Specifically these properties refer to how database transactions are designed.

In fact this stuff is important in any transaction processing system (TPS). These systems (not just database systems) use a server-client architecture and they first became popular in the 1960s. These systems are successful because they allow multiple clients to modify and share data concurrently all while enforcing data integrity! Not too shabby.

So most servers (including database servers) were built with this architecture in mind. It’s interesting that NoSQL databases don’t attempt to provide ACID transactions. Each of these NoSQL databases ignore one or more of these properties and attempt to offer something else in its place (but that’s a story for another day).

With SQL Server, these properties are enforced by default. But as it happens, you can relax these ACID properties in SQL Server if you want. We’ll see that it turns out to be easy (maybe too easy?) to write SQL that ignores some of these properties. The hope is that after reading this series, you’ll

  • be aware of the properties
  • understand why database transactions behave the way they do,
  • and be aware of any consequences if you’re tempted to give up any of these properties.

How This Series Is Organized

So I started this series as a single blog post, but it was getting a bit long for a single article. I wanted to come up with some examples (and counterexamples) other than the too common example of a money transfer between two bank accounts.

What you’ll see in this series is

  • a description of each ACID property.
  • A bit about how each property is handled in SQL Server,
  • An example from real life (but not necessarily an I.T. example!)
  • A counterexample from real life (but again, not necessarily an I.T. example!)

Stay tuned!

May 4, 2011

When To Use Blocked Processes Reports

I introduced the SQL Server Blocked Process Report Viewer a couple weeks ago and I realize that I took something for granted. Not everyone has practice watching for Blocked Process Reports and not everyone understands how or when they can use my utility to help troubleshoot concurrency issues.

Steps For Finding Concurrency Problems

Remember: When it comes to concurrency problems, you don’t have to guess what’s wrong!!!

And I explain it flowchart style!

Wait Statistics Who Is Active SQL Trace SQL Server Blocked Process Report Viewer Event Notifications The Future – Tracking Blocking in Denali
  • Checking for LCK_M_XX waits Paul Randal has a great script that interprets the data in the dmv sys.dm_os_wait_stats. I like this script because when all else fails. This script is a great starting point for understanding where the system’s bottlenecks are.
  • Using sp_WhoIsActive Adam Machanic wrote Who Is Active as an tricked out version of sp_who and sp_who2. I recommend it because it is a great view into what’s active on your server right now. And that includes blocked processes and other concurrency issues. (i.e. For any acute problem go there. For chronic concurrency problems, come back here).
  • Using SQL Trace You might know this as Profiler. Capture a trace with the “Blocked Process Report” event which is located in the Error and Warnings event list. But don’t forget! You first have to decide on what it means for your system to have excessive blocking and configure the blocked process threshold accordingly. I’ve learned very recently that peoples’ ideas of excessive blocking vary a lot. In my own environment, I often look for blocking longer than 10 seconds. Other people use a threshold of 10 minutes!
  • Analyzing Traces With Blocked Process Report Viewer This is the tool I wrote that I hope you find useful. Right now it tells you who the lead blocker is. And I hope to expand the features into analysis soon.
  • Configuring Server for Event Notifications I’m really not too familiar with this option and don’t use it much. As an alternative you can also use WMI queries and events mapped to a sql agent job (Thanks Vincent Salard, for that tip).
  • Using Extended Events Once Denali Arrives Jonathan Kehayias knows extended events backwards and forwards. In his blog post here, he describes how in the next version of SQL Server, the blocked process report will be traceable using extended events.

An Ounce of Prevention

In an extremely timely post Kendra Little writes about understanding locks in It’s a Lock. Following much of the same steps, you can understand what your app is doing beforehand and avoid any blocking problems from the start (e.g. understanding locks held during schema changes).

Next Week

  • I’ll be releasing SQL Server Blocked Process Report Viewer. (Right now we’re only in beta and I’m also open to suggestions about a new name for the tool)
  • Understanding blocking is the first step. Next week I’ll talk about what happens after analysis. I’ll write about the steps I’ve taken and had success with. after analysis.

Powered by WordPress