Technical Articles | Michael J. Swart - Part 18 Michael J. Swart

April 14, 2010

Forcefully Rename a SQL Server Database

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", alter database, Error 5030, rename, rename database, sp_renamedb, t-sql — Michael J. Swart @ 3:41 pm

If you’ve ever used Object Explorer (in SSMS) in order to rename a database, you may have come across this error message from Management Studio:

which says:

Unable to rename [database]. (ObjectExplorer)
Additional information:
--> Rename failed for Databse '[databse]'.
	(Microsoft.SqlServer.Smo)
	--> An exception occured while executing a Transact-SQL statement or batch.
		(Microsoft.SqlServer.ConnectionInfo)
		--> The database could not be exclusively locked to perform the operation.
			(Microsoft SQL Server, Error: 5030)

Management Studio is telling you that there are connections that it doesn’t want to mess with.

Or when using sp_renamedb from a query window, you might see this similar error:

Msg 5030, Level 16, State 2, Line 1
The database could not be exclusively locked to perform the operation.

If you really really want to rename the database and don’t care at all about in-flight transactions, then use this script/template I recently came up with:

A Forceful Script:

ALTER DATABASE [old_name]
 
	SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
ALTER DATABASE [old_name]
 
	MODIFY NAME = [new_name]
GO
ALTER DATABASE [new_name]
	SET MULTI_USER
GO

Caveat:

Remember that this script failed kindegarten for not playing well with others. So use this script only on QA and Dev boxes. And if you’re going to do something on a production box, make sure you do this only during scheduled downtime.

One More Interesting Thing:

A friend of mine stored this script in One Note and found that when it came to use it, the script failed because One Note had replaced some of the white space with non breaking spaces!

This issue is a One Note issue and applies to all scripts and code, not just T-SQL scripts.

-- Comments (6)

April 13, 2010

T-SQL Tuesday #005: Focus on Users when Reporting

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server" — Michael J. Swart @ 12:48 pm

Takeaway: Getting feedback from customers (end-users) is crucial for a successful reporting initiative. Success based on a user-adoption metric.

So it’s the second Tuesday of the month again and time for another T-SQL Tuesday. This April Aaron Nelson is hosting T-SQL Tuesday #005 and the topic of the month is Reports.

So this month I want to contrast the approaches we take as data storers and data reporters:

Client Focus

Client focus is more critical for reports and reporting than it is for data collection. Put another way: when writing data collection apps, you might be able to get by without much user feedback but when developing reporting solutions, you cannot get by without consulting users.

Consider these two typical scenarios of data flow between a user and data store.

Data In

In the data collection stage, data is consumed (and stored) by an application which is in the domain of the database developer (i.e. you). When the developer and the data consumer are the same person… everyone wins!

Data Out

In contrast, reporting solutions see end users consuming data. This is about as far removed as you can get from the database developer (i.e. you again). So if there is a disconnect in communication between you two, nobody wins.

I remember seeing a chain of three extra people acting as a communication layer between an end user and a developer. I think it’s because the data collection system was designed and implemented with some success this way.

Bridge the Gap Between Dev and User

Ideally client and analyst can sit down for a 1 on 1 talk. When doing so we need to keep the following in mind:

People who want data aren’t just curious
They want to solve a problem. And in the rare case they’re not trying to solve a problem, they’re trying to find out whether there’s a problem to solve. So find out what problems users are trying to solve. What are their goals? “Analyze This” or “Improve That” are probably a bit too generic. Find out what decisions get made with this data.
Summary Reports are Almost Never The End Of The Story
See point 1, People are trying to solve a problem. And if they can’t dig further into data, they’re going to run up against an inevitable roadblock. In the past I’ve seen reporting solutions ditched for the combo solution of in-house-SQL-Guy + copy-and-paste + Excel.
(As an aside, Excel as a solution isn’t too bad. It can serve the needs of the end user. It’s just that I think reporting solutions have the potential to do better)
Reporting Requirements are Not Report Requirements
See point 1 and 2. I would rather have requirements that start like “I need to analyze the efficiency of widget x” rather than “I need to view a histogram with data from this table”

I end with a quote from the last paragraph of The Data Warehouse Toolkit by Richard Kimball (a book I’ve plugged before):

In Closing … Sweeping away all the details and techniques, the gold coin for the data warehouse professional is to listen to the business. Consistently listening to the user brings us back to what we are supposed to do.

-- Comments (2)

April 6, 2010

Transaction Phenomena – Part 4: Serializable vs. Snapshot

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server 2005", "sql server 2008", "sql server", "Transaction Isolation Level", SERIALIZABLE, SNAPSHOT — Michael J. Swart @ 4:09 pm

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 4: Serializable vs. Snapshot

So I’ve finished talking about the types of transaction phenomena defined in the ISO/ANSI standard. There are two isolation levels that SQL Server supports which never experience any of these (no dirty, non-repeatable or phantom reads). They are SERIALIZABLE and SNAPSHOT. They are both made available in order to avoid dirty, non-repeatable or phantom reads, but they do so using different methods. Understanding both is the key to being able to decide whether these are right for your application.

SERIALIZABLE

Serializable is the most isolated transaction level. Basically when a transaction reads or writes data from the database, that’s what it’s going to be until the end of the transaction:

From ISO/ANSI: [Execution of concurrent SERIALIZABLE transctions are guaranteed to be serializable which is] defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

So that’s it! SERIALIZABLE transactions see database data as if there were no other transactions running at the same time. So no dirty, phantom or non-repeatable reads (but maybe some blocking).

It’s interesting that the standard defines SERIALIZABLE as the default level. Microsoft doesn’t subscribe to that notion and makes READ COMMITTED the default level.

The SERIALIZABLE level prevents phantom reads by using range locks. Which I explain at the end of this article.

SNAPSHOT

SNAPSHOT transactions avoid phantom reads, dirty reads and non-repeatable reads, but they do it in quite a different way than SERIALIZABLE transactions do.

While SERIALIZABLE uses locks, instead SNAPSHOT uses a copy of committed data. Since no locks are taken, when subsequent changes are made by concurrent transactions, those changes are allowed and not blocked.

So say you’re using SNAPSHOT transactions and you finally decide to make a change to some data. As far as you know, that data hasn’t changed from the first time you looked at it. But if that data has been changed elsewhere then you’ll get this error message:

Msg 3960, Level 16, State 4, Line 1
Snapshot isolation transaction aborted due to update conflict. You
cannot use snapshot isolation to access table 'dbo.test' directly or
indirectly in database 'snapshottest' to update, delete, or insert
the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the
update/delete statement.

What this update conflict error message is trying to convey is exactly the same concept as Wikipedia’s Edit Conflict error message. Except that Wikipedia explains it better. I suggest looking there.

ANSI-SQL’s SNAPSHOT Definition

There isn’t one. The SNAPSHOT isolation level I’m talking about is a Microsoft thing only. They’re useful, but definitely not part of the SQL standard.

It’s not too hard to see why. The SNAPSHOT isolation level permits the database server to serve data that is out of date. And that’s a big deal! It’s not just uncommitted. It’s old and incorrect (consistent, but incorrect).

Some people place a greater value on consistency rather than timely and accurate. I think it’s nice to have the choice.

Bonus Appendix: Range Locks.

(I was tempted to break out this appendix into its own blog post but ulitmately decided not to.)

So SERIALIZABLE transactions take range locks in order to prevent Phantom Reads. It’s interesting to see what range of values is actually locked. The locked range is always bigger than the range specified in the query. I’ll show an example.

Say we have a table storing integers and insert 6 even numbers:

CREATE TABLE MyTest
(
	id INT PRIMARY KEY
);
INSERT MyTest VALUES (2), (4), (6), (8), (10), (12);

Now lets read a range:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 3 AND 8;
-- do concurrent stuff here.
COMMIT

Books OnLine says: “The number of RangeS-S locks held is n+1, where n is the number of rows that satisfy the query.” We can verify this by looking at sys.dm_tran_locks. I’ve shown the locks that are taken above. Range locks apply to the range of possible values from the given key value, to the nearest key value below it.

You can see that the “locked range” of [2..10] is actually larger than the query range [3..8]. Attempts to insert rows into this range will wait.

What happens if we select a range containing all rows?

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 13;
-- do concurrent stuff here
COMMIT

You can see that everything is selected. That lock at “infinity” has a resource_description value of (ffffffffffff).

Last interesting bit. Ghost records can participate in these ranges!

DELETE MyTest;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT id FROM MyTest WHERE id BETWEEN 1 AND 10;
-- do concurrent stuff here
COMMIT

Concurrent transactions are able to insert values above 12, but will wait for values below that!

-- Comments (6)

March 29, 2010

Transaction Phenomena – Part 3: The Phantom Read

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:57 am

SERIES: Concurrent Transaction Phenomena

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 3: The Phantom Read

Imagine your transaction retrieves rows filtered by some criteria. A phantom read occurs when your transaction retrieves rows using the same criteria and new rows are returned. In this case, either the new rows you see have been inserted or data has changed in existing rows such that these rows now meet the filter criteria. Either way these rows are new to you.

From ISO/ANSI: Process P1 reads the set of rows N that satisfy some search condition. Process P2 then executes statements that generate one or more rows that satisfy the search condition. If P1 repeats the query it obtains a different collection of rows.

Example

It’s cheesy, but I use the following example as a mnemonic. Imagine there’s a database of pop culture, and that transactions can last years: Then we have

use PopCultureDb
 
SELECT Name
FROM CarModels
WHERE Manufacturer = 'Rolls Royce'
-- over 1925
-- query would give new row: "The Phantom I"
 
SELECT Name
FROM TonyAwards
WHERE AwardName = 'Best Musical'
-- over 1988
-- query would give new row: "Phantom of the Opera"
 
SELECT Name
FROM Movies
WHERE SagaName = 'Star Wars'
-- over 1999
-- query would give new row "Star Wars: Episode I - The Phantom Menace"

In each example if a transaction lasted over the years indicated, the new rows are examples of phantom reads.

Phantom Reads vs. Non-Repeatable Reads

So what’s the difference between phantom reads and non-repeatable reads? This can cause confusion sometimes. Because they both have the property that a repeated query returns results that are different. But the distinction between phantom reads and non-repeatable reads lies in how the query results are different.

The Non-Repeatable Read is a phenomena specific to a read of a single row. When data has changed in this row, and the row is read again, the changed data is returned. This is a non-repeatable read.
So what about a change that affects multiple rows? This certainly seems like a phenomenon that applies to more than one row. But actually, I think of this as just a set of non-repeatable reads where each non-repeatable read still only affects one row.
The Phantom Read is a phenomenon that deals with queries that return sets. The thing that’s changing in a phantom read is not the data in the rows; it’s the set membership that has changed. *

Isolation Levels that Experience Phantom Reads

Just like non-repeatable reads, the levels READ COMMITTED and READ UNCOMMITTED can experience phantom reads. But REPEATABLE READ can experience phantom reads too. In fact the only isolation level that avoids phantom reads is SERIALIZABLE (and SNAPSHOT, but I’ll get to that in part 4).

*-For rows that have been deleted, if a transaction reads them (or rather fails to read them) it would seem that this is both a non-repeatable read and a phantom read. But for the purposes of the ISO/ANSI standard it is in fact considered a non-repeatable read.

-- Comments (8)

March 26, 2010

Listing Tables Ordered By Dependency

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 4:47 pm

Short post for the Weekend.

There’s a number of use cases that may require you to see a list of database tables ordered by dependency.

For example, you may want to know in which order to create tables if you’re dealing with a scripting project. (On the flip side, if you’re looking to delete tables, reverse the order).

I came up with a pretty script that lists the tables you care about in order of how they can be created (i.e. create tables at level zero before moving on to level one tables).

WITH myTableList AS
(
	select object_id, name
	from sys.tables
	--where name like 'stuffICareAbout%'
),
myForeignKeyList AS
(
	select fk.parent_object_id, fk.referenced_object_id
	from sys.foreign_keys fk
	join myTableList mtl_parent
		on mtl_parent.object_id = fk.parent_object_id
	join myTableList mtl_referenced
		on mtl_referenced.object_id = fk.referenced_object_id
),
TablesDependencies AS
(
	SELECT name AS TableName,
		object_id AS TableID,
		0 AS level
	FROM myTableList 
 
	UNION ALL
 
	SELECT mtl.name, mtl.object_id, td.level + 1
	FROM myTableList mtl
	JOIN myForeignKeyList mfk
		ON mfk.parent_object_id = mtl.object_id
		AND NOT mfk.parent_object_id = mfk.referenced_object_id
	JOIN TablesDependencies td
		ON mfk.referenced_object_id = td.TableID
)
SELECT TableName, MAX(level) as level
FROM TablesDependencies
GROUP BY TableName
ORDER BY level, TableName

Modify this to suit your needs, i.e:

Modify the where clause to filter only the tables you care about.
Modify the fields to report on schema if it’s something you care about

Cheers

-- Comments (0)

March 25, 2010

Transaction Phenomena – Part 2: The Non-Repeatable Read

Filed under: Data Cartoons,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", "Transaction Isolation Level", non-repeatable read, Read committed — Michael J. Swart @ 10:31 am

SERIES: Concurrent Transaction Phenomena

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 2: The Non-Repeatable Read

The non-repeatable read occurs when a transaction re-reads a single record and finds that it has been changed or deleted.

From ISO/ANSI: Process P1 reads a row. Process P2 then modifies or deletes that rows and commits the change. If P1 rereads the row it receives the modified value or discovers the row has been deleted.

Changing Data
I like to think of Non-Repeatable Reads as being about reading data that has changed.

Changing Data

(I apologize in advance to my friends at work whose reaction I know will be “Boo”.)

But Non-Repeatable reads are not always a bad thing. In fact they’re often harmless or even required. When data has changed, you have to decide whether you want consistent data or whether you want current data. If you need to rely on consistent data, then a non-repeatable read is detrimental. Here is an example of a non-repeatable read that causes problems:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
 
	IF EXISTS(
		SELECT 1
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23')
	BEGIN
 
		-- imagine a concurrent transaction
		-- sells seat B23 here and removes the record
		-- from table Theatre.AvailableSeats
 
		-- then no rows are returned here:
		SELECT price
		FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
		-- and zero rows are affected here:
		DELETE FROM Theatre.AvailableSeats
		WHERE seat = 'B23'
 
	END
COMMIT

Isolation Levels that Experience Non-Repeatable Reads

Just like dirty reads, this phenomenon is possible when using the READ UNCOMMITTED isolation level, but it is also possible when using the READ COMMITTED isolation level. And in SQL Server READ COMMITTED is the default isolation level for transactions.

My experience
To avoid this phenomena, you could use isolation level REPEATABLE READ. But often I find it’s possible to rewrite a transaction and stick with the default READ COMMITTED.

-- Comments (11)

March 23, 2010

Transaction Phenomena – Part 1: The Dirty Read

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "Dirty Reads", "NOLOCK" "READ UNCOMMITTED", "sql server", "Transaction Isolation Level" — Michael J. Swart @ 12:00 pm

SERIES: Concurrent Transaction Phenomena

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 1: The Dirty Read

A dirty read occurs when a transaction reads a record that has not yet been committed. In my own head, I think of this as reading tentative data.

From ISO/ANSI: One process (P1) modifies a row, and another process (P2) then reads that row before it is committed by P1. If P1 then rolls back the change, P2 will have read a row that was never committed and that may thus be considered to have never existed.

Mr. Bean shows the dangers (cue video to 1 minute 40 seconds):

By the way, I love that the BBC has put Mr. Bean clips on YouTube. In the clip, Mr. Bean cheats off his classmate who then proceeds to “rollback” his change by tossing his first answer attempt. If Mr. Bean had access to the test after it was handed in (i.e. committed), he would have made a more effective cheater.

Isolation Levels that Experience Dirty Reads

The only isolation level in which dirty reads are possible is READ UNCOMMITTED. You can also use a NOLOCK hint inside a SELECT query to achieve the same behavior.

My Own Experience

I’m not afraid of the occasional NOLOCK hint or the READ UNCOMMITTED level. It helps on tables that I know are static or tables that are growing (INSERTS and SELECTS but no DELETES or UPDATES). It can be pretty powerful when used correctly.
The NOLOCK hint used to be my favorite deadlock killer. But lately, I’ve used it less often, especially when another solution is available.

Many others have written on this topic. Some important points:

Linchi Shea who points out that NOLOCK SELECT statements can still cause blocking when DDL statements are thrown in (such as the seemingly innocent DBCC REINDEX command statement).
Itzik Ben Gan who describes another NOLOCK danger. That is, data returned may not just be uncommitted, but inconsistent. i.e. missing or duplicated rows!

-- Comments (6)

March 8, 2010

I/O Operations, Arguably the Only Performance Metric You Need

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: "sql server", i-o operations, top i/o queries — Michael J. Swart @ 11:43 pm

Measuring Performance

Mike Walsh is hosting T-SQL Tuesday this month. And the topic is Input and Output (I/O).

The thing about I/O is that it’s very much a hardware topic and so the category is very very suited for database administrators. The topic is maybe a little more suited to them than us database developers.

So it’s natural to ask: What is it (if anything) about I/O that db developers should know?

I/O is THE Metric For Measuring Performance

Well, here’s my thesis: I/O is the principal metric to watch. Any other metric – such as duration, CPU cycles, network lag etc… – don’t matter nearly as much as I/O.

Here’s why:

(Solid State Drives excepted) Disk operations are one of the few things databases do that rely on moving parts. As such they are slooow (relatively speaking). Save the I/O, save the world.
In theory, OLTP databases should get by with minimal I/O. If particular queries are performing many reads or writes chances are that the query can be improved.
Is it really a memory issue? You’re worried about memory. Low page life expectancy is usually a symptom of too little memory. But if this value nosedives often, it could be in conjunction with a I/O-intensive query.
What about measuring duration? Duration should be treated like a symptom, not a cause. Arguably, excessive I/O is a symptom too, but it’s often one step closer to the cause than duration.
Ease off on the CPU-heavy stuff. I’m a big fan of letting the database server serve data and of letting the application do the thinking. I work in a very developer-centric environment and it’s great: Everyone is comfortable with letting the app do any heavy CPU work.
I’ve found it convenient to deal with only one ruler. I’ve used I/O as my only metric for a few years now and I’ve been extremely happy with the results. (Well, not quite my only metric, but my TOP 20 I/O queries and my TOP 20 CPU queries have about 18 queries in common).
I plan to re-evaluate things if/when Solid State Drives have their day in the sun. It’s not too far into the (exciting) future.

As it turns out, as I write this, the first I/O articles are coming in and it seems that Rob Farley seems to have written a very thorough article about I/O as it relates to the cost of a query. He points out that we shouldn’t ignore other metrics in favor of I/O alone. I may be lucky, but I’ve never seen a performance problem in production that did not have the symptom of high I/O.

Bonus Post Script: My Top 20 I/O Query

Others have created these queries in the past. You can find them everywhere. This one’s mine. It’s a tool that’s been on my belt for a number of years.

DECLARE @SystemIO FLOAT
SELECT @SystemIO = SUM(total_logical_reads + total_logical_writes)
FROM sys.dm_exec_query_stats;
 
SELECT TOP 20 [Row Number] = ROW_NUMBER() OVER (ORDER BY total_logical_reads + total_logical_writes DESC),
    [Query Text] = CASE
        WHEN [sql_handle] IS NULL THEN ' '
        ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
            (CASE
                WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(MAX),ST.text)) * 2
                ELSE QS.statement_end_offset
                END - QS.statement_start_offset) / 2))
        END,
    [Execution Count] = execution_count,
    [Total IO] = total_logical_reads + total_logical_writes,
    [Average IO] = (total_logical_reads + total_logical_writes) / (execution_count + 0.0),
    [System Percentage] = 100 * (total_logical_reads + total_logical_writes) / @SystemIO,
    [Object Name] = OBJECT_NAME(ST.objectid),
    [Total System IO] = @SystemIO,
    [SQL Handle] = [sql_handle]
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text ([sql_handle]) ST
WHERE total_logical_reads + total_logical_writes > 0
ORDER BY [Total IO] DESC

-- Comments (11)

February 5, 2010

Removing Columns

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: "sql server", sys.system_internals_partition_columns — Michael J. Swart @ 1:00 am

Takeaway: Deleting columns does not reduce the width of a record (including new ones). Tables have to be rebuilt in order to reclaim space.

So I had the opportunity to talk to Michelle Ufford (SqlFool.com) last November after her PASS Summit Talk. We wondered what happens to the space taken by columns after they’ve been deleted. For example, you may be removing columns or altering columns in order to reduce a records’ width. The improved bytes/row (and rows/page) can provide a huge benefit to performance.

Michelle thought it was a great idea for a blog topic and so did I. But Michelle graciously agreed to let me be the one to write it up (I understand she’s busy this month). So here’s the main question: What does happen to the space taken by columns that have been deleted? The naive idea that SQL Server magically cleans it up is wrong, for example:

First I create a table with really inefficient columns:

CREATE TABLE NAME (
	id INT IDENTITY PRIMARY KEY,
	first CHAR(1000) NOT NULL,
	middle CHAR(1000) NOT NULL,
	last CHAR(1000) NOT NULL
);
GO

INSERT NAME (first, middle, last) VALUES ('Michael', 'J', 'Swart');
INSERT NAME (first, middle, last) VALUES ('Lester', 'B', 'Pearson');
INSERT NAME (first, middle, last) VALUES ('Mack', 'D', 'Knife');
INSERT NAME (first, middle, last) VALUES ('Homer', 'J', 'Simpson');

These four rows I put in take up two data pages like this:

Remember, each page is 8 Kb long. So lets remove the middle column and right-size the others like this:

ALTER TABLE NAME DROP COLUMN Middle;
ALTER TABLE NAME ALTER COLUMN First varchar(20) NOT NULL;
ALTER TABLE NAME ALTER COLUMN Last varchar(20) NOT NULL;
 
-- And add a couple rows for good measure:
INSERT NAME (First, Last) VALUES ('Bartholomew', 'Simpson');
INSERT NAME (First, Last) VALUES ('Lisa', 'Simpson');

So things should look better right? Not quite! The old data pages haven’t changed at all and the new data looks just as bad:

What’s Happening

The action of dropping columns is not an operation that affects existing data. It’s a meta-data operation only. You can actually see this at work using the slightly undocumented view sys.system_internals_partition_columns. The is_dropped field indicates that the columns haven’t disappeared, they’ve just been marked as dropped:

select t.name as tablename,
c.name as columnname,
	ipc.*
from sys.system_internals_partition_columns ipc
join sys.partitions p
	on ipc.partition_id = p.partition_id
join sys.tables t
	on t.object_id = p.object_id
left join sys.columns c
	on c.object_id = t.object_id
	and c.column_id = ipc.partition_column_id
where t.name = 'NAME'
order by c.column_id

So just how hard is it to get rid of columns?

Hair Makes Removing Columns Possible

How to Really Get Rid of Columns

It turns out that an index REBUILD is required (A REORGANIZE isn’t good enough). If you have have a maintenance plan it might eventually take care of this. Or you can do this explicitly:

ALTER INDEX ALL ON NAME REBUILD;

Things look much nicer now:

The information in the diagram above is hard to see, but it works out to about 200 rows per page as we had hoped. The column is truly deleted now which you can see using this query (once again):

select t.name as tablename,
	c.name as columnname,
	ipc.*
from sys.system_internals_partition_columns ipc
join sys.partitions p
	on ipc.partition_id = p.partition_id
join sys.tables t
	on t.object_id = p.object_id
left join sys.columns c
	on c.object_id = t.object_id
	and c.column_id = ipc.partition_column_id
where t.name = 'NAME'
order by c.column_id

Other Blogs have treated this topic before e.g.:

Andras at Simple Talk last year.
Kalen Delaney whose name is almost synonymous with SQL Internals looked at this in 2006.

But none of those have pictures of Samson, do they?

Update March 12, 2010: I just read chapter 30 in SQL Server MVP Deep Dives. The chapter is called Reusing space in a table. In it Joe Webb talks about a very similar scenario and talks recommends an approach of using DBCC CLEANTABLE. This is good for reclaiming space that was once used by variable-length columns. But it doesn’t work for the example I use in this post which uses fixed-length columns. It’s an important distinction because large columns are more often variable length than fixed length.

-- Comments (12)

February 4, 2010

Forbidden String

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:13 am

ATTENTION SQL BLOGGERS,

I’ve been blogging using the wordpress platform for about three months now and every now and then I’ve come across a certain quirkiness that I’ve finally gotten to the bottom of. I want to warn you about it and maybe save some head-scratching.

Articles with a particular string cannot be saved. When I try to use this particular string, the web server times out. At least in my environment and after checking with Brent Ozar, it’s not isolated to just me.

Here’s the string, I’m pasting it as an image for somewhat obvious reasons:

Some programmer doesn't want you using this string

What’s really odd is that

VARCHAR ( works (note the space).
varchar( works.
CHAR( works.
and ARCHAR( works.

If this is a security “feature” it seems like a crummy one. Crummy because it hinders valid work (by us honest sql bloggers) and crummy times two because It’s not particularly effective at guarding against some perceived risk. Especially when the work-around of VARCHAR space bracket works just fine.

It seems like a known issue though: http://core.trac.wordpress.org/ticket/11486

Your mileage may vary. But try it out. See if you can save a draft of a blog post using the forbidden string. Leave a comment back here if you had trouble like I do.

-- Comments (8)

« Newer Posts — Older Posts »