Michael J. Swart

March 23, 2010

Transaction Phenomena – Part 1: The Dirty Read

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft  describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read

A dirty read occurs when a transaction reads a record that has not yet been committed. In my own head, I think of this as reading tentative data.

From ISO/ANSI: One process (P1) modifies a row, and another process (P2) then reads that row before it is committed by P1. If P1 then rolls back the change, P2 will have read a row that was never committed and that may thus be considered to have never existed.

Mr. Bean shows the dangers (cue video to 1 minute 40 seconds):

By the way, I love that the BBC has put Mr. Bean clips on YouTube. In the clip, Mr. Bean cheats off his classmate who then proceeds to “rollback” his change by tossing his first answer attempt. If Mr. Bean had access to the test after it was handed in (i.e. committed), he would have made a more effective cheater.

Isolation Levels that Experience Dirty Reads

The only isolation level in which dirty reads are possible is READ UNCOMMITTED. You can also use a NOLOCK hint inside a SELECT query to achieve the same behavior.

My Own Experience

I’m not afraid of the occasional NOLOCK hint or the READ UNCOMMITTED level. It helps on tables that I know are static or tables that are growing (INSERTS and SELECTS but no DELETES or UPDATES). It can be pretty powerful when used correctly.
The NOLOCK  hint used to be my favorite deadlock killer. But lately, I’ve used it less often, especially when another solution is available.

Many others have written on this topic. Some important points:

  • Linchi Shea who points out that NOLOCK SELECT statements can still cause blocking when DDL statements are thrown in (such as the seemingly innocent DBCC REINDEX command statement).
  • Itzik Ben Gan who describes another NOLOCK danger. That is, data returned may not just be uncommitted, but inconsistent. i.e. missing or duplicated rows!

March 8, 2010

I/O Operations, Arguably the Only Performance Metric You Need

Measuring Performance

Measuring Performance

Mike Walsh is hosting T-SQL Tuesday this month. And the topic is Input and Output (I/O).

The thing about I/O is that it’s very much a hardware topic and so the category is very very suited for database administrators. The topic is maybe a little more suited to them than us database developers.

So it’s natural to ask: What is it (if anything) about I/O that db developers should know?

I/O is THE Metric For Measuring Performance

Well, here’s my thesis:  I/O is the principal metric to watch. Any other metric – such as duration, CPU cycles, network lag etc… – don’t matter nearly as much as I/O.

Here’s why:

  • (Solid State Drives excepted) Disk operations are one of the few things databases do that rely on moving parts. As such they are slooow (relatively speaking). Save the I/O, save the world.
  • In theory, OLTP databases should get by with minimal I/O. If particular queries are performing many reads or writes chances are that the query can be improved.
  • Is it really a memory issue? You’re worried about memory. Low page life expectancy is usually a symptom of too little memory. But if this value nosedives often, it could be in conjunction with a I/O-intensive query.
  • What about measuring duration? Duration should be treated like a symptom, not a cause. Arguably, excessive I/O is a symptom too, but it’s often one step closer to the cause than duration.
  • Ease off on the CPU-heavy stuff. I’m a big fan of letting the database server serve data and of letting the application do the thinking. I work in a very developer-centric environment and it’s great: Everyone is comfortable with letting the app do any heavy CPU work.
  • I’ve found it convenient to deal with only one ruler. I’ve used I/O as my only metric for a few years now and I’ve been extremely happy with the results. (Well, not quite my only metric, but my TOP 20 I/O queries and my TOP 20 CPU queries have about 18 queries in common).
  • I plan to re-evaluate things if/when Solid State Drives have their day in the sun. It’s not too far into the (exciting) future.

As it turns out, as I write this, the first I/O articles are coming in and it seems that Rob Farley seems to have written a very thorough article about I/O as it relates to the cost of a query. He points out that we shouldn’t ignore other metrics in favor of I/O alone. I may be lucky, but I’ve never seen a performance problem in production that did not have the symptom of high I/O.

Bonus Post Script: My Top 20 I/O Query

Others have created these queries in the past. You can find them everywhere. This one’s mine. It’s a tool that’s been on my belt for a number of years.

DECLARE @SystemIO FLOAT
SELECT @SystemIO = SUM(total_logical_reads + total_logical_writes)
FROM sys.dm_exec_query_stats;
 
SELECT TOP 20 [Row Number] = ROW_NUMBER() OVER (ORDER BY total_logical_reads + total_logical_writes DESC),
    [Query Text] = CASE
        WHEN [sql_handle] IS NULL THEN ' '
        ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
            (CASE
                WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(MAX),ST.text)) * 2
                ELSE QS.statement_end_offset
                END - QS.statement_start_offset) / 2))
        END,
    [Execution Count] = execution_count,
    [Total IO] = total_logical_reads + total_logical_writes,
    [Average IO] = (total_logical_reads + total_logical_writes) / (execution_count + 0.0),
    [System Percentage] = 100 * (total_logical_reads + total_logical_writes) / @SystemIO,
    [Object Name] = OBJECT_NAME(ST.objectid),
    [Total System IO] = @SystemIO,
    [SQL Handle] = [sql_handle]
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text ([sql_handle]) ST
WHERE total_logical_reads + total_logical_writes > 0
ORDER BY [Total IO] DESC

February 5, 2010

Removing Columns

Takeaway: Deleting columns does not reduce the width of a record (including new ones). Tables have to be rebuilt in order to reclaim space.

So I had the opportunity to talk to Michelle Ufford (SqlFool.com) last November after her PASS Summit Talk. We wondered what happens to the space taken by columns after they’ve been deleted. For example, you may be removing columns or altering columns in order to reduce a records’ width. The improved bytes/row (and rows/page) can provide a huge benefit to performance.

Michelle thought it was a great idea for a blog topic and so did I. But Michelle graciously agreed to let me be the one to write it up (I understand she’s busy this month). So here’s the main question:  What does happen to the space taken by columns that have been deleted? The naive idea that SQL Server magically cleans it up is wrong, for example:

First I create a table with really inefficient columns:

CREATE TABLE NAME (
	id INT IDENTITY PRIMARY KEY,
	first CHAR(1000) NOT NULL,
	middle CHAR(1000) NOT NULL,
	last CHAR(1000) NOT NULL
);
GO
INSERT NAME (first, middle, last) VALUES ('Michael', 'J', 'Swart');
INSERT NAME (first, middle, last) VALUES ('Lester', 'B', 'Pearson');
INSERT NAME (first, middle, last) VALUES ('Mack', 'D', 'Knife');
INSERT NAME (first, middle, last) VALUES ('Homer', 'J', 'Simpson');

These four rows I put in take up two data pages like this:

pages graph

Remember, each page is 8 Kb long. So lets remove the middle column and right-size the others like this:

ALTER TABLE NAME DROP COLUMN Middle;
ALTER TABLE NAME ALTER COLUMN First varchar(20) NOT NULL;
ALTER TABLE NAME ALTER COLUMN Last varchar(20) NOT NULL;
 
-- And add a couple rows for good measure:
INSERT NAME (First, Last) VALUES ('Bartholomew', 'Simpson');
INSERT NAME (First, Last) VALUES ('Lisa', 'Simpson');

So things should look better right? Not quite! The old data pages haven’t changed at all and the new data looks just as bad:

pages graph

What’s Happening

The action of dropping columns is not an operation that affects existing data. It’s a meta-data operation only. You can actually see this at work using the slightly undocumented view sys.system_internals_partition_columns. The is_dropped field indicates that the columns haven’t disappeared, they’ve just been marked as dropped:

select t.name as tablename,
c.name as columnname,
	ipc.*
from sys.system_internals_partition_columns ipc
join sys.partitions p
	on ipc.partition_id = p.partition_id
join sys.tables t
	on t.object_id = p.object_id
left join sys.columns c
	on c.object_id = t.object_id
	and c.column_id = ipc.partition_column_id
where t.name = 'NAME'
order by c.column_id

So just how hard is it to get rid of columns?

Samson Removing Columns

Hair Makes Removing Columns Possible

How to Really Get Rid of Columns

It turns out that an index REBUILD is required (A REORGANIZE isn’t good enough). If you have have a maintenance plan it might eventually take care of this. Or you can do this explicitly:

ALTER INDEX ALL ON NAME REBUILD;

Things look much nicer now:

pages graph

The information in the diagram above is hard to see, but it works out to about 200 rows per page as we had hoped. The column is truly deleted now which you can see using this query (once again):

select t.name as tablename,
	c.name as columnname,
	ipc.*
from sys.system_internals_partition_columns ipc
join sys.partitions p
	on ipc.partition_id = p.partition_id
join sys.tables t
	on t.object_id = p.object_id
left join sys.columns c
	on c.object_id = t.object_id
	and c.column_id = ipc.partition_column_id
where t.name = 'NAME'
order by c.column_id

Other Blogs have treated this topic before e.g.:

  • Andras at Simple Talk last year.
  • Kalen Delaney whose name is almost synonymous with SQL Internals looked at this in 2006.

But none of those have pictures of Samson, do they?

Update March 12, 2010: I just read chapter 30 in SQL Server MVP Deep Dives. The chapter is called Reusing space in a table. In it Joe Webb talks about a very similar scenario and talks recommends an approach of using DBCC CLEANTABLE. This is good for reclaiming space that was once used by variable-length columns. But it doesn’t work for the example I use in this post which uses fixed-length columns. It’s an important distinction because large columns are more often variable length than fixed length.

December 1, 2009

Comparison: Switching Tables vs. sp_rename

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication — Tags: , , , — Michael J. Swart @ 1:00 am

Takeaway: Stick with sp_rename and leave the ALTER TABLE SWITCH statement for partitions.

Earlier in the month I was at a talk given by Don Vilen called To Partition or Not To Partition The questions .. and the answers. In the talk he gave an interesting tip about switching partitions. (Remember that switching partitions is a method that is used to implement the sliding window strategy. For more info, search for the keyword SWITCH in this page).

Here’s the tip: Notice that the partition arguments in the ALTER TABLE – SWITCH command are optional. This means that you can swap the data of any table with another. It doesn’t have to be partitioned. But that staging table must still be empty.

So as I thought about how to get around the empty staging table restriction, I had an idea (Great Scott!) This is the diagram I made in my notes:

Not a flux capacitor

Not a flux capacitor

Moments later, Don Vilen summed up this idea: “The staging table must be empty. But there’s nothing stopping you from taking a third table with data and switching that into the newly emptied table.” I’m paraphrasing. He said it much better than that. Maybe an example would help:

use tempdb;
 
create table primary_table (i int primary key);
create table staging_table_with_data (i int primary key);
create table empty_staging_table (i int primary key);
insert primary_table values (1), (2), (3)
insert staging_table_with_data values (4), (5), (6)
 
-- the following does not work:
alter table primary_table switch to staging_table_with_data;
/* this fails with:
    ALTER TABLE SWITCH statement failed. The target table
    'tempdb.dbo.staging_table_with_data' must be empty.
*/
 
-- but this does work:
alter table primary_table switch to empty_staging_table;
alter table staging_table_with_data switch to primary_table;

When would you use this?

The original reason that the SWITCH statement was implemented was to help transfer data efficiently by using partition switching. But I think we can use this feature on whole non-partitioned tables in order to quickly transform data.

A great example is an update statement that applies to a large number of rows that has to be done quickly. Myself, I’ve come across a situation where we were converting local times to UTC times and needed speed.

Is this any better than sp_rename?

In the domain of non-partitioned tables, the answer is no, not at all. In fact the code above can probably be written like this:

use tempdb;
 
create table primary_table (i int primary key);
create table staging_table (i int primary key);
 
insert primary_table values (1), (2), (3)
insert staging_table values (4), (5), (6)
 
exec sp_rename @objname='primary_table', @newname = 'old_data', @objtype = 'OBJECT'
exec sp_rename @objname='staging_table', @newname = 'primary_table', @objtype = 'OBJECT'

You probably have seen this kind of technique used by tools that generate scripts to change table definitions. In fact this is the only technique you can use to add or remove an identity column to a table.

So then what’s the difference between them? Why would you choose one method over the other. Hopefully you can use the rest of the article to make an informed decision.

Restrictions

When renaming tables using sp_rename, there are very few restrictions, in fact I’m pretty sure there are only four:

  • Neither the primary or staging table can be temporary
  • The table can’t be published (via replication)
  • The new name can’t clash with an existing object name.
  • Enforced dependencies – like schema-bound views – can’t exist.

In fact, aside from syntax errors, I’ve never ever seen sp_rename fail. However, when swapping non-partitioned tables using the SWITCH keyword there are more pitfalls to watch out for:

  • The columns, types, indexes and constraints of the primary and staging tables must match.
  • too many others to list

To see what could go wrong, read the manual here or check out the error list with:

--roughly 50 rows here:
select text from sys.messages
where text like '%ALTER TABLE SWITCH statement%'
and text not like '%partition%'

But the the best approach would be to test specific scenarios on a test machine. Just to find out what’s possible.

Benefits of one over the other

So if you’re still with me (and congratulations and thank you for bearing with me) you might get the distinct impression that sp_rename is the better choice. You’re probably right. But I’ll still try to sum up the benefits and drawbacks of each method here:

sp_rename:

  • Almost never fails.
  • Different table definitions are possible.
  • Indexed views (which rely on schema-bound views) are not allowed.
  • Foreign keys and indexes must be recreated before or after the rename.

SWITCH

  • You don’t have to recreate indexed views. But the staging table must have matching views before the switch.
  • Other foreign keys, indexes and constraints do not need to be redefined. But as before, the staging table must be prepared properly.
  • Minimal or zero downtime.
  • The big drawback is the rule that requires the staging and the primary tables to have matching definitions.

So, like I said in the beginning: For the most part, stick with sp_rename and leave the ALTER TABLE SWITCH statement for partitions.

October 27, 2009

Apology accepted

Filed under: Tongue In Cheek — Tags: , — Michael J. Swart @ 10:10 am

An open letter to delinquent SQL bloggers.

I accept your apologies which you thoughtfully expressed in posts such as here, here, here, here, here and here. You expressed desires to make amends and reconcile with me and I truly hope you were sincere.

I know that mistakes were made and there are regrets on both sides. But I think pointing fingers would be counterproductive at this stage.

So today, on behalf of SQL Blog readers, I hereby accept your apologies and forgive you all.

Together I think we can get through this difficult time. I think it’s best now if we focus on the present. And I think with patience, time and more consistent blog posts, I will not only forgive, but forget.

Sincerely,
Michael J. Swart
Database Whisperer

All kidding aside. It’s no biggie. I really never miss the lack of posts from any particular author in my RSS feed aggregator. I just mark such apology-posts as read and get on with the day.

October 26, 2009

SQL Snippets is my new favourite thing.

Filed under: Miscelleaneous SQL — Tags: , , — Michael J. Swart @ 8:18 am

Within the last hour, Mladen Prajdić announced a new version of his popular SSMS Tools Pack version 1.7. The main new feature being SQL Snippets. This provides Management Studio an add-in that is very much like the code snippets in other IDEs everywhere.

Also within that last hour, I’ve installed and test-driven the new feature and it is officially my new favourite thing. I’ve got big plans for this feature. I’ve already translated my custom templates to it. In conjunction with templates (and Ctrl-Shift-M), I foresee great things for this feature.

Good job Mladen!

October 14, 2009

A Trick for Aggregating Data Fast

Filed under: Technical Articles — Tags: , , , — Michael J. Swart @ 5:49 pm

I want to explain a trick I came up with that helps an OLTP database do part of the job that OLAP cubes do. (Isn’t that cute, the database engine thinks it’s a cube.*)

If for some reason you find yourself wanting to use an OLTP database (i.e. regular old database) for something that is better suited to an OLAP cube (like Analysis Services for example) then you’re in good company! It seems to be a very common challenge. Even Microsoft seems to recognize this by implementing T-SQL tricks that step over the line into OLAP territory. Grouping sets comes to mind; Also another sign is that CUBE is now a keyword.

Aggregating Data Along a Dimension Before it’s Needed.
One task that is deep in the heart of OLAP territory is reporting the aggregate of a measurement along a dimension. I want to explain how to do this efficiently inside an OLTP db.

In the following examples the OLTP database (and wannabe OLAP cube) will be the standard AdventureWorks database. I’ll be using the Sales.SalesOrderDetail table. The measurement here is the LineTotal field, and the dimension here is the Product.

So we can report on this kind of data easily by maintaining statistics using an indexed view. (BTW, this is a great use-case for indexed views).

Here’s the trick. If n is the measurement, then the only stats we need to store are the sums of n0, n1, n2, grouped by the dimension. In other words, we keep track of the count, the sum and the sum of squares.

CREATE VIEW Sales.v_SalesOrderDetailAggregates WITH SCHEMABINDING
AS
SELECT ProductID,
   SUM(LineTotal) AS Sum1,
   COUNT_BIG(*) AS Count1,
   SUM(LineTotal*LineTotal) AS Sum2
FROM Sales.SalesOrderDetail
GROUP BY ProductID
GO
 
CREATE UNIQUE CLUSTERED INDEX IX_v_SalesOrderDetailAggregates
ON Sales.v_SalesOrderDetailAggregates(ProductID)
GO

Using the Indexed View

So that’s the clustered index I propose we create. As I’ll show later, this is enough. The work of aggregating the data is offloaded to the db. And the work is done once at creation and maintained automatically during any relevant transactions.

In the following code sample, I show how to use this view to report on many common stats (aggregates):

SET STATISTICS IO ON
 
-- a report that aggregates LineTotal over Products
SELECT
   ProductId,
   COUNT(LineTotal),
   SUM(LineTotal),
   AVG(LineTotal),
   STDEVP(LineTotal),
   STDEV(LineTotal),
   VARP(LineTotal),
   VAR(LineTotal)
FROM Sales.SalesOrderDetail
GROUP BY ProductID
ORDER BY ProductID
 
-- the same report (aggregate LineTotal over Products)
-- that uses the indexed view.
SELECT
   ProductID,
   [COUNT] = Count1,
   [SUM] = Sum1,
   [AVG] = Sum1 / Count1,
   [STDEVP] = 1.0/Count1 * SQRT(Count1*Sum2 - Sum1*Sum1),
   [STDEV] = 1.0/SQRT(Count1*(Count1-1)) * SQRT(Count1*Sum2 - Sum1*Sum1),
   [VARP] = (Sum2 - (Sum1*Sum1/Count1))/(Count1),
   [VAR] = (Sum2 - (Sum1*Sum1/Count1))/(Count1-1)
FROM Sales.v_SalesOrderDetailAggregates
WHERE Count1 > 1
ORDER BY ProductID
 
/*
results:
 
(266 row(s) affected)
Table 'SalesOrderDetail'. Scan count 1, logical reads 485, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
(266 row(s) affected)
Table 'v_SalesOrderDetailAggregates'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/

So why not expand the indexed view to include everything from the beginning? Well, there are a couple problems with that. First is that you’ll get an error message like:

Msg 10125, Level 16, State 1, Line 2
Cannot create index on view "AdventureWorks.Sales.v_SalesOrderDetailAggregates" because it uses
aggregate "STDEV". Consider eliminating the aggregate, not indexing the view, or using alternate
aggregates. For example, for AVG substitute SUM and COUNT_BIG, or for COUNT, substitute
COUNT_BIG.

And so what I’ve shown here is that you can use a calculation involving COUNT_BIG, SUM(n) and SUM(n*n) to calculate common statistical aggregates like AVG, STDEV and VAR.

Besides, this strategy also takes the least space, and for a little extra work, you can aggregate on a grouping higher up in the dimension. That is, by using the entries in the index as subtotals in order to generate totals for larger groupings.

*– I’m probably missing an opportunity to include an awesome illustration for this post; something involving databases, cubes and square and round pegs.

October 4, 2009

Spotlight on Brad Schulz Y'all

Filed under: Miscelleaneous SQL — Tags: , , , , — Michael J. Swart @ 5:46 pm

Every now and then, I come across an idea where I think “Man, I wish I thought of that”

Reading Brad Shulz’s blog post The Age-Old SELECT * vs. SELECT 1 Debate was one of those times. In that blog post, he explains how relevant the select expressions are when used inside an EXISTS’s subquery. By looking at the official docs here: http://technet.microsoft.com/en-us/library/ms188336.aspx you’ll learn enough to get by. But after reading Brad Shulz’s article, you’ll never forget.

Brad is relatively new to SQL Server. He spent much of his career so far focused on Visual FoxPro and was a Visual Foxpro MVP for years. SQL Server is a new focus for him and he’s already moderating the MSDN T-SQL forum. It won’t be long before he’ll be able to create test database servers with one hand tied behind his back (Interesting trick: Sit on your right hand and try typing “create test database servers”.)

His blog http://bradsruminations.blogspot.com/ deserves more exposure. I was first introduced to the articles he writes in September as he syndicated his blog at SQLServerPedia.

His style is sometimes hilarious, and funny most of the time, but always clever. I’m always surprised when SQL Server writers manage to keep their articles engaging because of the subject matter. Brad seems to keep up just fine.

Also Must-Read (if only for the laughs)

Brad recently agreed to answer a number of questions I had. (Huge thanks Brad for taking the time!)

Michael J. Swart: What skills in Visual FoxPro (VFP) transfer easily to SQL Server? (Any surprising ones?)

Brad Schulz: Just working with the concepts of databases, tables, columns, relations, and indexes for many years in VFP makes the transfer easy.  And basic SQL syntax has been part of the VFP language since 1991, so that eliminates an enormous learning curve right away… We VFPers have been doing INSERT, UPDATE, DELETE, and SELECT for almost 20 years.

That being said, though, we would still do a lot of one-row-at-a-time processing, because it was easy and it was lightning fast.  But the equivalent of that in SQL (i.e. cursor) is slow as molasses, and I’ve learned a lot of set-based techniques in SQL Server that I had never used before.  Thank goodness that I started playing with T-SQL recently… I think I would have gone bananas if I had started with SQL2000 with no CROSS APPLY or CTE’s or window functions (i.e. the OVER clause).  How did you guys stay sane before SQL2005 came out?

Michael J. Swart: A common story is that many SQL Server professionals got their start as “accidental DBAs”. How did you get your start? Or how did you come to choose SQL Server as a focus?

Brad Schulz: My story’s not of the accidental variety.  I got a B.S. in Computer Science from USC (University of Southern California) in 1981.  I actually started out as a Building Design major, which is a combination of Architecture and Civil Engineering, but I dropped the Architecture class about a month into my freshman year, because I could see that I had no drawing talent whatsoever.  To fill that void, I ended up taking a FORTRAN class independent study with a Computer Science professor.  I gobbled it up, and finished the class in about 2 weeks.  The professor asked me to be a tutor for others taking the regular class.  I changed my major to Computer Science just before my sophomore year.

When I graduated from college, I immediately started working as a float in my parents’ pool.  But after a couple of months, a friend of the family recommended me to a pear-packing company in Kelseyville, California (I found out later they were the largest pear-packer in the state).  They had a computer system that no longer worked and the software company that custom wrote it for them was out of business.  To make a long story short, they hired me as a consultant (God knows why), and I read the manual on the programming language (a form of BASIC on an HP-250 machine) and ended up re-writing their entire system.  From that point, I got work by word-of-mouth, and I’ve been an independent developer ever since.

I worked in dBase II in the early 1980’s, and then eventually moved to dBase III, Quicksilver, Clipper, and then FoxBase+ and FoxPro.  I’ve found plenty of FoxPro work for more than 20 years, but it’s not going to last forever, so SQL Server seemed like the logical choice to move to.

Michael J. Swart: What motivates you to moderate the MSDN T-SQL Forum?

Brad Schulz: Back in the 1990’s I participated in the FoxForum on CompuServe, answering questions about FoxPro.  I enjoyed helping people, and I learned a lot just by helping others, and I learned new ideas from experts who hung out at the forum.  I wanted to duplicate that experience at the MSDN T-SQL Forum.  I’ve learned an incredible amount from some very talented people there.

Michael J. Swart: What motivates you to blog?

Brad Schulz: For one thing, I wanted to get myself known a little more in the SQL community.  Some of my blogs are humorous or tongue-in-cheek, and that’s just a release for the crazy creative side of me.  But I also enjoy writing “how-to” and “discovery” articles… they force me to practice and experiment more with T-SQL.  I wrote several articles for FoxTalk and FoxPro Advisor in the 1990’s, but the turnaround time took forever, from submission through the editing process, and then waiting months for eventual publication.  In contrast, I like the immediate publication of blog posts and getting immediate feedback.

Michael J. Swart: What’s one thing you can say about yourself that would make the SQL Server crowd say “Nifty”?

Brad Schulz: What’s a word that means “stylish; smart; fine; excellent” and rhymes with “thrifty”?  Okay, scratch that.

Let me see… I’m an adopted child whose birth-mother was German Catholic and who played the violin.  I always knew I was adopted.  My birth-mother had requested I be placed with a German Catholic family (which I was, though my mother was the Catholic and my father was the German).  My parents knew about the violin, and figured there was musical talent in me, so they set me up with piano lessons when I was 5.  I’ve been playing piano ever since (45 years).  The weird, spooky part is that I asked for more detailed non-identifying information from the adoption agency when I was about 25 years old, and when I received it, I was floored when it said that my birth-mother worked in and had a talent for “data processing” (whatever that meant in the 1950’s).  Talk about “nature” vs “nurture”… I think the “nature” argument wins hands-down!

I also have a passion for the stock market, which I discovered in 1998.  You can read more about that at my (now-on-hiatus) website http://www.stockciphering.com/whatis.htm.  The stock market is just one huge database of stocks and daily prices, isn’t it?  And so I wrote a suite of Visual FoxPro (VFP) programs to calculate technical analysis figures for stocks and produce buy signals for them, and I created the website to try to ca
pitalize on it.  The website never really set the world on fire, but in “eating my own dog food” and using the data for my own trading, I did very well… From Jan2003 to Mar2009, my IRA (where I do my trading) more than quadrupled in value, compared to the S&P500 which lost 25% of its value during that same period.

 Michael J. Swart: Thanks Brad

October 1, 2009

The ABCs of Management Studio Shortcuts

Filed under: Technical Articles — Tags: , , — Michael J. Swart @ 10:39 am

Today I was watching a webcast hosted by Brent Ozar and Tim Ford (congrats on the MVP awards guys). One topic came up was “which key do you press to execute a query in SSMS?.” F5 seemed to be the default key. However I found out that Ctrl+E did the same thing. This prompted me to enumerate the alphabet and to see what each shortcut combination accomplishes.

I’m also giving the shortcuts a rating from 0-5 based on how often I personally use them. 0 means never. 5 means I’d rather give up my spacebar than this shortcut.

BTW, I’m also not mentioning ALT keys, or SHIFT keys, that comes later. But first “CTRL, CTRL you must learn CTRL!”

Ctrl+A: Select all text (a standard text editor command) : 4
Ctrl+C: Copy (a standard text editor command) : 5
Ctrl+D: Show results in grid : 4
Ctrl+E: Execute (aka F5, aka Alt+X) : 0 but maybe that will improve now that I know about it.
Ctrl+F: Quick Find: 3
Ctrl+G: Goto line number: 0 I don’t use it, but then I don’t have line numbers turned on.
Ctrl+H: Quick Replace: 2
Ctrl+I: The best hidden treasure ever. Hit Ctrl+I and then type some text, you go straight to the next occurence of that string. Try it now, I’ll wait… Wasn’t that so cool? : 5
Ctrl+J: Intellisense’s list members (2008 only) : 3
Ctrl+K: First key of a two key shortcut that can help do lots of things. This is a doorway into cool stuff. : 3
Ctrl+L: Show execution plan. : 2
Ctrl+M: Include Actual Execution Plan : 2
Ctrl+N: New Query window with the same connection as the current one. : 2
Ctrl+O: Open file (pretty standard) : 3
Ctrl+P: Print (Still there for nostalgia purposes) : 0
Ctrl+R: Hide Results Pane : 4 A big help for screen real estate.
Ctrl+S: Save File : 3
Ctrl+T: Show results as text : 3
Ctrl+U: Puts focus on the choose database control (Mnemonic: U is for USE) : 4
Ctrl+V: Paste (a standard text editor command) : 5
Ctrl+X: Cut (a standard text editor command) : 4
Ctrl+Y: Redo (a standard text editor command) : 4
Ctrl+Z: Undo (a standard text editor command) : 4

So there you go, Seemingly still up for grabs is Ctrl+Q and Ctrl+B. These don’t appear to do anything.

September 25, 2009

CHAR: What is it Good For?

Filed under: Technical Articles — Tags: , , , , , , — Michael J. Swart @ 7:06 am

Takeaway: … Absolutely nothing (say it again).

What’s the difference between the CHAR datatype and the VARCHAR datatype? At the time of this writing Google gives thousands of web pages that are willing to explain the difference. Or if not explaining the difference at least explaining which is better and when. Here are two of the more interesting or important links:

What this Blogger Thinks
I’m going to put my vote with VARCHAR over CHAR in almost all circumstances. I’m going to list what I think are the stated benefits of CHAR and then hopefully show that the benefits are outweighed by consistency you get from applying VARCHAR as a rule of thumb always. (n.b. Throughout this post, the same arguments apply to NCHAR vs. NVARCHAR.)

Data Integrity:
Does SQL Server Complain if you give CHAR (10) less than ten characters?
At first glance, you might assume (the way I did) that the following benefit: That

CREATE TABLE example1 ( col CHAR (10) )

is somehow equivalent to

CREATE TABLE example2
(
    col VARCHAR (10),
    CHECK (LEN (col) = 10)
)

But it turns out not to be the case. Inserting strings shorter than ten characters into the column in example 1 does not produce an error. Instead, SQL Server pads the value with enough spaces to fill out the value of the column.

Size
And so we get to it. The main consideration when deciding between these types always seems to be size. VARCHAR as you know takes only the characters it needs, but it uses an extra two bytes to store the length of the string. That’s why Microsoft recommends using CHAR for strings that are going to be a consistent length.

So, I’ve decided to find out how much exactly. Using the sample database Adventureworks, I changed the tables that used NCHAR columns to identical tables that use NVARCHAR columns. And I found that the number of pages consumed was exactly the same (even after rebuilding indexes). I expected that though. The small savings you get in that case was not enough to be able to fit extra rows into a database page.

But here’s something that surprised me regarding null-able columns. If you are counting every byte, you’ll see that VARCHAR is the clear space saver when it comes to NULLS. You’ll find that CHAR (10) columns that are NULL still take ten bytes while VARCHAR (10) columns take zero bytes. This is another reason to pick VARCHAR over CHAR.

Semantics
What about semantically? When application developers sees a CHAR (10) column, it’s clear to them that data in the column is expected to be exactly 10 characters. This is a benefit to the CHAR data type.
It is not something that the VARCHAR data type conveys nicely. Even with a CHECK constraint or with documentation.

So I’ll concede that. Even though it’s possible to create user defined types that do something similar:

CREATE TYPE PostalCode FROM VARCHAR (6);
CREATE TYPE SIN FROM VARCHAR (9);
CREATE TYPE OneChar VARCHAR (1);

So are CHARS Completely Useless?
No, but I don’t see any real significant advantage either. And so I think it’s easier to apply VARCHAR as a rule of thumb and as a best practice without worrying about whether it’s better than CHAR. So pick VARCHAR and be done with it.

« Newer PostsOlder Posts »

Powered by WordPress