Michael J. Swart

February 5, 2010

Removing Columns

Takeaway: Deleting columns does not reduce the width of a record (including new ones). Tables have to be rebuilt in order to reclaim space.

So I had the opportunity to talk to Michelle Ufford (SqlFool.com) last November after her PASS Summit Talk. We wondered what happens to the space taken by columns after they’ve been deleted. For example, you may be removing columns or altering columns in order to reduce a records’ width. The improved bytes/row (and rows/page) can provide a huge benefit to performance.

Michelle thought it was a great idea for a blog topic and so did I. But Michelle graciously agreed to let me be the one to write it up (I understand she’s busy this month). So here’s the main question:  What does happen to the space taken by columns that have been deleted? The naive idea that SQL Server magically cleans it up is wrong, for example:

First I create a table with really inefficient columns:

CREATE TABLE NAME (
	id INT IDENTITY PRIMARY KEY,
	FIRST CHAR(1000) NOT NULL,
	middle CHAR(1000) NOT NULL,
	LAST CHAR(1000) NOT NULL
);
GO
INSERT NAME (FIRST, middle, LAST) VALUES ('Michael', 'J', 'Swart');
INSERT NAME (FIRST, middle, LAST) VALUES ('Lester', 'B', 'Pearson');
INSERT NAME (FIRST, middle, LAST) VALUES ('Mack', 'D', 'Knife');
INSERT NAME (FIRST, middle, LAST) VALUES ('Homer', 'J', 'Simpson');

These four rows I put in take up two data pages like this:

pages graph

Remember, each page is 8 Kb long. So lets remove the middle column and right-size the others like this:

ALTER TABLE NAME DROP COLUMN Middle;
ALTER TABLE NAME ALTER COLUMN FIRST VARCHAR(20) NOT NULL;
ALTER TABLE NAME ALTER COLUMN LAST VARCHAR(20) NOT NULL;
 
-- And add a couple rows for good measure:
INSERT NAME (FIRST, LAST) VALUES ('Bartholomew', 'Simpson');
INSERT NAME (FIRST, LAST) VALUES ('Lisa', 'Simpson');

So things should look better right? Not quite! The old data pages haven’t changed at all and the new data looks just as bad:

pages graph

What’s Happening

The action of dropping columns is not an operation that affects existing data. It’s a meta-data operation only. You can actually see this at work using the slightly undocumented view sys.system_internals_partition_columns. The is_dropped field indicates that the columns haven’t disappeared, they’ve just been marked as dropped:

SELECT t.name AS tablename,
c.name AS columnname,
	ipc.*
FROM sys.system_internals_partition_columns ipc
join sys.partitions p
	ON ipc.partition_id = p.partition_id
join sys.tables t
	ON t.OBJECT_ID = p.OBJECT_ID
LEFT join sys.columns c
	ON c.OBJECT_ID = t.OBJECT_ID
	and c.column_id = ipc.partition_column_id
WHERE t.name = 'NAME'
ORDER BY c.column_id

So just how hard is it to get rid of columns?

Samson Removing Columns

Hair Makes Removing Columns Possible

How to Really Get Rid of Columns

It turns out that an index REBUILD is required (A REORGANIZE isn’t good enough). If you have have a maintenance plan it might eventually take care of this. Or you can do this explicitly:

ALTER INDEX ALL ON NAME REBUILD;

Things look much nicer now:

pages graph

The information in the diagram above is hard to see, but it works out to about 200 rows per page as we had hoped. The column is truly deleted now which you can see using this query (once again):

SELECT t.name AS tablename,
	c.name AS columnname,
	ipc.*
FROM sys.system_internals_partition_columns ipc
join sys.partitions p
	ON ipc.partition_id = p.partition_id
join sys.tables t
	ON t.OBJECT_ID = p.OBJECT_ID
LEFT join sys.columns c
	ON c.OBJECT_ID = t.OBJECT_ID
	and c.column_id = ipc.partition_column_id
WHERE t.name = 'NAME'
ORDER BY c.column_id

Other Blogs have treated this topic before e.g.:

  • Andras at Simple Talk last year.
  • Kalen Delaney whose name is almost synonymous with SQL Internals looked at this in 2006.

But none of those have pictures of Samson, do they?

February 4, 2010

Forbidden String

Filed under: SQLServerPedia Syndication, Technical Articles — Michael J. Swart @ 10:13 am

ATTENTION SQL BLOGGERS,

I’ve been blogging using the wordpress platform for about three months now and every now and then I’ve come across a certain quirkiness that I’ve finally gotten to the bottom of. I want to warn you about it and maybe save some head-scratching.

Articles with a particular string cannot be saved. When I try to use this particular string, the web server times out. At least in my environment and after checking with Brent Ozar, it’s not isolated to just me.

Here’s the string, I’m pasting it as an image for somewhat obvious reasons:

Some programmer doesn't want you using this string

Some programmer doesn't want you using this string

What’s really odd is that

  • VARCHAR ( works (note the space).
  • varchar( works.
  • CHAR( works.
  • and ARCHAR( works.

If this is a security “feature” it seems like a crummy one. Crummy because it hinders valid work (by us honest sql bloggers) and crummy times two because It’s not particularly effective at guarding against some perceived risk. Especially when the work-around of VARCHAR space bracket works just fine.

It seems like a known issue though: http://core.trac.wordpress.org/ticket/11486

Your mileage may vary. But try it out. See if you can save a draft of a blog post using the forbidden string. Leave a comment back here if you had trouble like I do.

February 2, 2010

Looking at Page Contents

Filed under: SQLServerPedia Syndication, Technical Articles — Tags: , — Michael J. Swart @ 10:06 am

If you want to dig deep into SQL Server internals, you’re eventually going to want to look at the physical page structure of a data page. As such I’ve bookmarked Anatomy of a Page by Paul Randal.

The syntax for all this stuff is hard for me to remember, so I’ve made a template for myself. And I’m sharing that here:

-- looking at page contents.
 
-- T3604 to output to console
DBCC traceon (3604)
 
-- select rows from the table of interest
-- (as well as their physical location)
SELECT TOP (10) t.*,
	pl.FILE_ID, pl.page_id, pl.slot_id
FROM <tablename,sysname,Production.Product> AS t
cross apply sys.fn_PhysLocCracker(t.%%physloc%%) AS pl;
 
DECLARE @dbname sysname;
SET @dbname = DB_NAME();
DBCC page(@dbname, --db_name
	1, --file_id
	136, --page_id
	1) --results style

Ctrl+Shift+M will let you provide the table name. I’ve also made this script into a code-snippet called page.

Note that the script makes use of sys.fn_PhysLocCracker which is SQL 2008 and later.

By the way, I highly recommend playing with this stuff. Not having to guess at the width of records is really helpful. I plan to write a post soon which shows how it helped an investigation I did.

January 27, 2010

Blogger tip: DIY In-Line Forms

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:56 pm


Rick Klau of Google recently mentioned a relatively unused feature of Google Docs called Docs Forms. I wanted to mention it in case you haven’t come across it yet. Docs Forms is a feature of Google Docs that allows bloggers to embed forms into their blog articles or web pages. And that includes all blogging platforms (not just google’s blogspot). Basically, these forms can be embedded into any html that supports IFRAMES.

This easy-to-use form-creator opens a huge potential to bloggers that might not have used forms in the past (e.g. me). The list of applications includes surveys, polls, contest submissions, contact pages and so on.

For me, Blogs + Forms allows a higher level of interaction with readers than I had compared to this time last week. And that is an exciting prospect. If you care, you can see how I’m using a form right now on my “Mail Me” page.

Caveat:
There is one thing I worry about though. Blog polls are notorious for producing untrustworthy data. It’s the nature of a self-selected sample population.

For example, I bet Paul Randal’s survey takers, are a bit savvier than average so his blog survey results have to be taken with a grain of salt. In other words, I bet the who’s-Paul-Randal? crowd have a lot of people in common with the No-disaster-recover-plan crowd and they might be under-represented in such surveys.

Example
And what would this blog post be without an example. So without further ado, give the feature a spin:

January 18, 2010

Tic Tac Toe vs. T-SQL

Filed under: SQLServerPedia Syndication, Tongue In Cheek — Michael J. Swart @ 11:50 pm
It's learning

It's learning

Stephen Horne (aka BlueDog67) threw down the challenge last week to Create An Unbeatable Tic-Tac-Toe Game. He did that in this post.

So that’s what I did. Here’s my crack at it (which I also left as a comment):

The curious can download http://michaeljswart.com/wp-content/uploads/2010/01/tictactoe.zip

The zipped file contains two files:

  • One is a bcp output file that contains data. (It is used by the script)
  • The other is a sql script that creates the following tables and procedures:
    • TABLE tttBoard (spaceId, value)
    • TABLE tttMoves(id, turn, bestNextMove)
    • PROCEDURE s_CleanBoard()
    • PROCEDURE s_PrintBoard()
    • PROCEDURE s_MakeMove(@spaceId, @letter)
    • PROCEDURE s_ComputerMove

I decided to use a look up table (tttMoves) to find the best move. I thought it fitting to develop a solution that uses a set of data rather than an algorithm because of it is T-SQL after all. Here are three things I learned.

  • The size of tttMoves is extremely small. Roughly 4500. Think about it. If you take nine ordered spaces on a tic-tac-toe board and put every combination of X, O, or space, there’s still a manageable number of boards to deal with. I removed completed boards and boards showing games that were invalid. 4500 were left. Of these game boards, only about 400 were more complicated than a simple win or a block for their best move.
  • The whole process was a lot quicker than it took the eighties super-computer to learn tic tac toe in WarGames. (Who remembers that? To this day, joshua is a terrible password)
  • A T-SQL tip: This should have been obvious but remember that order of columns in a table matters when using BULK INSERT with a data file using native formatting.

January 17, 2010

An incentive, and a hint

Filed under: SQLServerPedia Syndication, Tongue In Cheek — Michael J. Swart @ 8:21 pm

So last week, I challenged the world to solve a puzzle. This was part of Adam Machanic’s T-SQL Tuesday. And while the entire world does not read my blog, Google Analytics tell me that there are roughly a hundred of you that do.

But even among those of you who read my blog, there were no bites. Not one. So I decided to not give the solution this week. Instead I offer a prize and a hint. First the hint for everyone (it’s a two parter):

  • First, look in the source for a clue
  • secondly, the puzzle is not about cryptography, but can be solved with one or two lines of T-SQL

And now for the prize: Coffee on me. This prize is for one person. I’ve got a $10 Tim Horton’s gift card ready to be sent to the first person to tell me the answer to the puzzle. (A Starbucks gift card if you’re international). So get to it!
Update(1/18/2010)… We have a winner. Check out the solution here.


January 12, 2010

A puzzling situation…

Filed under: SQLServerPedia Syndication, Tongue In Cheek — Tags: — Michael J. Swart @ 12:01 am

A puzzling situation (my T-SQL Tuesday entry). This entry is an actual puzzle. Here I’m asking you to do the impossible (or at least the computationally difficult). Solve for ???:

Good luck

Good luck

Solution next week.

BTW, Don’t even try to attack this using a brute force approach. Even if you could test 1000 values a millisecond it would take like a whole decade to crack it.

Update 1/18/2010: See solution here

December 21, 2009

Top 8 Queries to Improve DB Design

Filed under: SQLServerPedia Syndication, Technical Articles — Michael J. Swart @ 2:24 pm
Top Eight Queries to Improve DB Design.

Top Eight Queries to Improve DB Design.

There is a disease that most television networks suffer from. The disease was more prevalent in the eighties and nineties than it is today, but every sit-com that lasts longer than two seasons seems destined to suffer from it: The cheesy clip show. The writers of such sitcoms probably thought “Whew! I need a break.” That thought was probably quickly followed by. “I betcha the viewers would like to re-watch Alex P. Keaton attempting to handle that feisty kangaroo”.

Well this article is like that. It’s a time where I link to old articles of mine (and some from the SQLServerPedia wiki!) and pass it off as something novel. Thanks for bearing with me and Happy Holidays. Without further ado:

The Eight Queries

These queries can be run right now! And can help identify areas for improvement in DB design. These queries are focused on DB design (so queries that identify top wait types or top queries by resource usage are skipped).

In no particular order:

  1. Find Missing SQL Dependencies If a view, function or stored procedure refers to a nonexistent object (be it a table, view etc…) This query can help you identify those.
  2. Finding your Missing Keys If a column with a name like ‘%id’ does not point to a table, you just might need a foreign key.
  3. Indexing Foreign Keys? To minimize the amount of maintenance time required to maintain data integrity required by foreign keys. (The query needs a database with data in it. The query is useless against empty databases).
  4. Which identity column is running out of room? Find columns which might be in danger of running out of room. (Run against a database with data in it).
  5. Find Missing Indexes This is courtesy SQLServerPedia. The Microsoft documentation for the missing index views is here. But the SQLServerPedia article pulls it all together. (Must be run against an active database).
  6. Find Queries in the Plan Cache That Are Missing an Index Another query from SQLServerPedia. Be careful with this one. It can use a lot of resources.
  7. Find Indexes Not In Use Again, from SSP, use against an active database.
  8. Find Tables Without Primary Keys Looking for heaps of trouble? This last query is also from SSP. It doesn’t need to be an active DB. Any old development DB will do.

What? Still here?

You still want more meta information about your databases. Be careful what you wish for. I recommend looking at what Jason Strate has done. That’ll keep you busy.

December 17, 2009

Find Missing SQL Dependencies

Filed under: SQLServerPedia Syndication, Technical Articles — Michael J. Swart @ 12:01 pm

A short post today.

I use the following script to find (broken) stored procedures or views that refer to sprocs or tables that no longer exist… or that don’t exist yet.

This is a quick sanity check that can help identify broken or obsolete stored procedures.

-- 2008 only
SELECT
	OBJECT_NAME(referencing_id) AS [this sproc or VIEW...],
	referenced_entity_name AS [... depends ON this missing entity name]
FROM sys.sql_expression_dependencies
WHERE is_ambiguous = 0
	AND OBJECT_ID(referenced_entity_name) IS NULL
ORDER BY OBJECT_NAME(referencing_id), referenced_entity_name

December 16, 2009

Gleeful Calamity

Filed under: Miscelleaneous SQL, SQLServerPedia Syndication — Michael J. Swart @ 11:21 am
Uhhh Ooooo

Uhhh Ooooo

“Gleeful Calamity”

… is a phrase that I’ve been giving thoughts to for the past few days. I’m quoting a phrase that Gever Tulley used in a TED talk he gave about his tinkering school.

In many situations it’s valuable to have the freedom to fail. Especially when the goal is learning.

When learning new things about SQL Server, I couldn’t take in half as much knowledge if I didn’t have a development server called sandbox. The sandbox db sits on my local machine. And like its namesake, it’s a safe place to play. It’s a place where starting from scratch from time to time is normal and expected. I expect that the tempdb fills this role for many of you.

Being Creative (Idea Mash-ups)

Having the freedom to fail is also valuable when you want to be creative. Later on in this post I’ll explain a failure of mine and I’ll go through what went wrong and why it doesn’t matter.

Being creative often entails tweaking an existing idea or taking two separate ideas and putting them together.

Sometimes the Idea Mash-ups Work…

… and Sometimes They Don’t

Here’s one failed idea I had.  A long time ago, I work with a database that had 800 tables in one schema in one database. By comparison, the Adventureworks database has about 70 tables in five schemas. I asked myself. Is there an automatic way to group the tables into candidate schemas while staying agnostic about the business rules? Here was my thinking:

  • I can infer table relationships using defined foreign keys, sys.sysdepends and other clues based on the naming conventions we used.
  • Tables and their relationships remind me of vertices and edges in graph theory.
  • I can rely on theories of dead mathematicians to do my work for me!

… in theory any way. In practice – as I found out – there’s no substitute for digging into the business rules of a domain, sigh. But the process was fun and didn’t take too much time. While I was investigating this idea, I thought this is what creativity feels like.

Meta-Calamity

When I started writing this post, I had a different example of a personal failure in mind:

In my sandbox, I once explored a (seemingly) novel idea and after a while, I decided it sucked. As it turns out, Google tells me that my idea was not so novel after all and was written up in a blog as a good idea by someone I respect. So for that and other reasons, I picked another failure. Lesson: Google seemingly novel ideas before beginning a blog topic about it.

(Image is courtesy of Anthony Cain on Flickr… Check out his photostream!)

Older Posts »

Powered by WordPress