Michael J. Swart

April 27, 2012

Modifying Tables Online – Part 5: Just One More Thing

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 10:50 am

SERIES: Modifying Tables Online

In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

Just One More Thing

So Rob Volk commented on Part 3 of my blog. The comment prompted me to write a part five (which you’re reading now). So you can thank him for this bonus section.

I’m going to quote Rob directly. He wrote:

One question/suggestion, have you considered using a new schema (e.g. Sales_New) and creating the new table in that schema? The benefit is that all the defaults, keys, triggers, and indexes can retain the same names as the original, and the switch becomes a single ALTER SCHEMA…TRANSFER operation. (or two, one for the old table and one for the new)

He’s absolutely right. The switch step becomes much much simpler and simpler is almost always better (as it is in this case). My migration would start by creating some extra schemas and then creating the staging table. In the Adventureworks example I’ve been using, that looks something like this:

use AdventureWorks2012
go
create schema staging;
go
create schema obsolete;
go
CREATE TABLE staging.SalesOrderHeader(
-- etc...

So now our DB is prepared and looks like this:

Copy the data over to the staging table the same way. In my SalesOrderHeader example, that code doesn’t change at all except that I replace “Sales.SalesOrderHeader_new” with “Staging.SalesOrderHeader”.

When ready, the switch now includes code like this:

ALTER SCHEMA obsolete TRANSFER Sales.SalesOrderHeader;
ALTER SCHEMA Sales TRANSFER staging.SalesOrderHeader;
 
drop trigger obsolete.t_i_SalesOrderHeader;
drop trigger obsolete.t_u_SalesOrderHeader;
drop trigger obsolete.t_d_SalesOrderHeader;

Looks simple right! It is. Simple is better. The schema now looks something like this:

A few things to be careful of:

  • Foreign keys: Although no renaming is necessary, they still have to dropped from and to the obsolete table. And foreign keys pointing to the new table still need to be added.
  • Existing triggers: They should recreated on the new table.

That’s It

So dear reader, that’s the series. It was fun for me to explore in depth an intermediate topic. Thanks for bearing with me. I know the series was a little dry, but I figured that I wanted to use this walk-through for myself as a template for future migration projects and that if I thought it was useful, maybe you would think so too.

April 26, 2012

Modifying Tables Online – Part 4: Testing

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 12:00 pm

SERIES: Modifying Tables Online

In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

This is part 4 of the series (you can consider this Appendix 3, The main article is found here).

So my testing is also pretty straightforward. I

  1. Backup the Adventureworks2012 database on my dev machine
  2. Start sending a whack of queries to the database to simulate concurrent activity
  3. Make Adventureworks2012.Sales.SalesOrderHeader larger by adding lots of rows
  4. Migrate the database using the script from part 3
  5. Look for any errors, fix them, restore the db and repeat the process if necessary
  6. Otherwise … profit?

You should be able to handle each of these steps. Except that I want to explain exactly how I personally handle number 2: I use an application to send queries to the database many times. Sounds easy right? The trick is that I want to call the stored procedure many times at once.

Activity Generator

This is a C# program which I compile into a .net application (I can’t wait for .net 4.5′s async and await)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SqlClient;
using System.Threading;
 
namespace ConsoleApplication1 {
    class Program {
        static int counter = 0;
        static void Main( string[] args ) {
            SqlConnectionStringBuilder cs = new SqlConnectionStringBuilder();
            cs.DataSource = @".";
            cs.InitialCatalog = "Adventureworks2012";
            cs.IntegratedSecurity = true;
            cs.AsynchronousProcessing = true;
            string connectionString = cs.ToString();
            string sql = @"s_DoSomething";
 
            for( int i = 0; i < 100000; i++ ) {
                SqlConnection conn = new SqlConnection( connectionString );
                conn.Open();
                SqlCommand cmd = new SqlCommand( sql, conn );
                cmd.CommandType = System.Data.CommandType.StoredProcedure;
                cmd.BeginExecuteNonQuery( new AsyncCallback( EndExecution ), cmd );
            }
            Console.WriteLine( string.Format( @"Error count: {0}", counter ) );
            Console.ReadLine();
        }
 
        static void EndExecution( IAsyncResult c ) {
            SqlCommand endCmd = ( c.AsyncState as SqlCommand );
            try {
                endCmd.EndExecuteNonQuery( c );
            } catch( Exception ex ) {
                //counter++;
                Interlocked.Increment( ref counter );
                Console.WriteLine( ex.Message );
            } finally {
                endCmd.Connection.Close();
            }
        }
    }
}

You might have noticed that this program is extremely boring. It’s just calling one stored procedure: s_DoSomething over and over as fast as it can. The advantage is that I’m free to modify s_DoSomething (which is for me a lot easier than modifying this application).

So what does this procedure do? I’m glad you asked:

s_DoSomething

This procedure executes at random one task chosen from a set of typical OLTP tasks that use the table Sales.SalesOrderHeader. Here’s the one I used:

create procedure s_DoSomething 
as
begin
 
    declare @choice int = 1 + RAND() * 6
    declare @SalesOrderId int;
 
    if (@choice = 1)
        insert Sales.SalesOrderHeader(DueDate, OrderDate, CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID)
        select top 1000 DueDate, DATEADD(day, -1, DueDate), CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID
        from Sales.SalesOrderHeader
    else if (@choice = 2)
        delete top (100) Sales.SalesOrderHeader
    else if (@choice = 3)
        begin
            declare @personid int;
 
            select top 1 @personid = BusinessEntityID
            from Sales.SalesPerson
            order by newid();
 
            select * 
            from sales.SalesOrderHeader
            where SalesPersonID = @personid;
        end;
    else if (@choice IN (4,5,6))
        begin
            select top 1 @SalesOrderId = SalesOrderId
            from Sales.SalesOrderHeader with (nolock);
 
            select top 100 * 
            from Sales.SalesOrderHeader 
            where SalesOrderId > @SalesOrderId
        end
 
end

April 20, 2012

Modifying Tables Online – Part 3: Example With Error Handling

SERIES: Modifying Tables Online

In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

This is part 3 of the series (you can consider this Appendix 2, The main article is found here).

What you find below is exactly the same script as you find in Part 2: Implementation Example with the exception that I’ve included error handling and wrappers to handle the following goals:

  • If a statement fails, no data is lost
  • If a statement fails, other queries are not impacted (i.e. we’re online)
  • If a statement fails anywhere, it can be retried from the beginning with no lost work. This means for example that when data is copied to the staging table in batches, and that process is interrupted and restarted, the script “picks up where it left off”.
  • The process is re-runnable. This means that if the script succeeds and for some reason the script is run again then no errors are raised and no work is performed.
  • The last step, the switch, is an atomic transaction. It succeeds 100% or fails and rolls back 100%.

I present without any further comment, the scripts (with error-handling).

Create Staging Table

use AdventureWorks2012
 
-- original table isn't migrated, staging table doesn't exist
IF NOT EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.TABLES 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader_new'
    )
AND NOT EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.COLUMNS 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader' 
        AND COLUMN_NAME = 'Rowversion'
    )
BEGIN
 
    CREATE TABLE Sales.SalesOrderHeader_new(
        SalesOrderID int IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
        RevisionNumber tinyint NOT NULL,
        OrderDate datetime NOT NULL,
        DueDate datetime NOT NULL,
        ShipDate datetime NULL,
        [Status] tinyint NOT NULL,
        OnlineOrderFlag dbo.Flag NOT NULL,
        SalesOrderNumber  AS (isnull(N'SO'+CONVERT(nvarchar(23),SalesOrderID),N'*** ERROR ***')),
        PurchaseOrderNumber dbo.OrderNumber NULL,
        AccountNumber dbo.AccountNumber NULL,
        CustomerID int NOT NULL,
        SalesPersonID int NULL,
        TerritoryID int NULL,
        BillToAddressID int NOT NULL,
        ShipToAddressID int NOT NULL,
        ShipMethodID int NOT NULL,
        CreditCardID int NULL,
        CreditCardApprovalCode varchar(15) NULL,
        CurrencyRateID int NULL,
        SubTotal money NOT NULL,
        TaxAmt money NOT NULL,
        Freight money NOT NULL,
        TotalDue  AS (isnull((SubTotal+TaxAmt)+Freight,(0))),
        Comment nvarchar(128) NULL,
        rowguid uniqueidentifier ROWGUIDCOL  NOT NULL,
        ModifiedDate datetime NOT NULL,
        [Rowversion] rowversion NOT NULL, -- This is the new column!!
        constraint PK_SalesOrderHeader_new_SalesOrderID
            primary key clustered (SalesOrderID ASC),
        constraint FK_SalesOrderHeader_new_Address_BillToAddressID 
            foreign key(BillToAddressID) references Person.[Address] (AddressID),
        constraint FK_SalesOrderHeader_new_Address_ShipToAddressID 
            foreign key(ShipToAddressID) references Person.[Address] (AddressID),
        constraint FK_SalesOrderHeader_new_CreditCard_CreditCardID 
            foreign key(CreditCardID) references Sales.CreditCard (CreditCardID),
        constraint FK_SalesOrderHeader_new_CurrencyRate_CurrencyRateID 
            foreign key(CurrencyRateID) references Sales.CurrencyRate (CurrencyRateID),
        constraint FK_SalesOrderHeader_new_Customer_CustomerID 
            foreign key(CustomerID) references Sales.Customer (CustomerID),
        constraint FK_SalesOrderHeader_new_SalesPerson_SalesPersonID 
            foreign key(SalesPersonID) references Sales.SalesPerson (BusinessEntityID),
        constraint FK_SalesOrderHeader_new_SalesTerritory_TerritoryID 
            foreign key(TerritoryID) references Sales.SalesTerritory (TerritoryID),
        constraint FK_SalesOrderHeader_new_ShipMethod_ShipMethodID 
            foreign key(ShipMethodID) references Purchasing.ShipMethod (ShipMethodID),
        constraint CK_SalesOrderHeader_new_DueDate 
            check (DueDate>=OrderDate),
        constraint CK_SalesOrderHeader_new_Freight 
            check (Freight>=0.00),
        constraint CK_SalesOrderHeader_new_ShipDate 
            check (ShipDate>=OrderDate or ShipDate is null),
        constraint CK_SalesOrderHeader_new_Status 
            check ([Status]>=0 and [Status]<=8),
        constraint CK_SalesOrderHeader_new_SubTotal 
            check (SubTotal>=0.00),
        constraint CK_SalesOrderHeader_new_TaxAmt 
            check (TaxAmt>=0.00)
    )
 
    -- add default constraints
    alter table Sales.SalesOrderHeader_new add
    constraint DF_SalesOrderHeader_new_RevisionNumber  
        default 0 for RevisionNumber,
    constraint DF_SalesOrderHeader_new_OrderDate  
        default getdate() for OrderDate,
    constraint DF_SalesOrderHeader_new_Status  
        default 1 for [Status],
    constraint DF_SalesOrderHeader_new_OnlineOrderFlag  
        default 1 for OnlineOrderFlag,
    constraint DF_SalesOrderHeader_new_SubTotal  
        default 0.00 for SubTotal,
    constraint DF_SalesOrderHeader_new_TaxAmt  
        default 0.00 for TaxAmt,
    constraint DF_SalesOrderHeader_new_Freight  
        default 0.00 for Freight,
    constraint DF_SalesOrderHeader_new_rowguid  
        default newid() for rowguid,
    constraint DF_SalesOrderHeader_new_ModifiedDate  
        default getdate() for ModifiedDate
 
        -- two nc ak
    create unique nonclustered index AK_SalesOrderHeader_new_rowguid 
        on Sales.SalesOrderHeader_new(rowguid ASC)
 
    create unique nonclustered index AK_SalesOrderHeader_new_SalesOrderNumber
        on Sales.SalesOrderHeader_new(SalesOrderNumber)
 
        -- two nc ix
    create nonclustered index IX_SalesOrderHeader_new_CustomerID
        on Sales.SalesOrderHeader_new(CustomerID)
 
    create nonclustered index IX_SalesOrderHeader_new_SalesPersonID
        on Sales.SalesOrderHeader_new(SalesPersonID)
END
GO

Add Triggers

-- staging table exists
-- original table isn't migrated, 
-- triggers don't exist
IF EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.TABLES 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader_new'
    )
AND NOT EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.COLUMNS 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader' 
        AND COLUMN_NAME = 'Rowversion'
    )
AND NOT EXISTS
    (
        SELECT 1 
        FROM sys.triggers
        WHERE name = 't_i_SalesOrderHeader'
    )
BEGIN
 
    EXEC sp_executesql N'
        create trigger t_i_SalesOrderHeader
        on Sales.SalesOrderHeader
        after insert
        as 
            set identity_insert Sales.SalesOrderHeader_new on;
 
            insert SalesOrderHeader_new(SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
            select SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate
            from inserted
 
            set identity_insert Sales.SalesOrderHeader_new off;';
END
GO
 
IF EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.TABLES 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader_new'
    )
AND NOT EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.COLUMNS 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader' 
        AND COLUMN_NAME = 'Rowversion'
    )
AND NOT EXISTS
    (
        SELECT 1 
        FROM sys.triggers
        WHERE name = 't_u_SalesOrderHeader'
    )
BEGIN
    EXEC sp_executesql N'
        create trigger t_u_SalesOrderHeader
        on Sales.SalesOrderHeader
        after update
        as 
            -- assuming pk values are never altered.
            update Sales.SalesOrderHeader_new
            set 
                RevisionNumber = i.RevisionNumber,
                OrderDate  = i.OrderDate,
                DueDate  = i.DueDate,
                ShipDate  = i.ShipDate,
                [Status]  = i.[Status],
                OnlineOrderFlag  = i.OnlineOrderFlag,
                PurchaseOrderNumber  = i.PurchaseOrderNumber,
                AccountNumber  = i.AccountNumber,
                CustomerID  = i.CustomerID,
                SalesPersonID  = i.SalesPersonID,
                TerritoryID  = i.TerritoryID,
                BillToAddressID  = i.BillToAddressID,
                ShipToAddressID  = i.ShipToAddressID,
                ShipMethodID  = i.ShipMethodID,
                CreditCardID  = i.CreditCardID,
                CreditCardApprovalCode = i.CreditCardApprovalCode,
                CurrencyRateID  = i.CurrencyRateID,
                SubTotal  = i.SubTotal,
                TaxAmt  = i.TaxAmt,
                Freight  = i.Freight,
                Comment  = i.Comment,
                rowguid  = i.rowguid,
                ModifiedDate = i.ModifiedDate
            from Sales.SalesOrderHeader_new s
            join inserted i 
                on s.SalesOrderID = i.SalesOrderID;';
END
GO
 
IF EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.TABLES 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader_new'
    )
AND NOT EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.COLUMNS 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader' 
        AND COLUMN_NAME = 'Rowversion'
    )
AND NOT EXISTS
    (
        SELECT 1 
        FROM sys.triggers
        WHERE name = 't_d_SalesOrderHeader'
    )
BEGIN
    EXEC sp_executesql N'
        create trigger t_d_SalesOrderHeader
        on Sales.SalesOrderHeader
        after delete
        as 
            delete Sales.SalesOrderHeader_new
            from Sales.SalesOrderHeader_new s
            join deleted d 
                on d.SalesOrderID = s.SalesOrderID ;';
END
GO

Move the data over in batches

set nocount on;
set transaction isolation level serializable;
set deadlock_priority LOW
 
declare @i int;
declare @new_i int = 0;
declare @largestSalesOrderId int;
declare @minBatchSize int = 5000
 
set identity_insert Sales.SalesOrderHeader_new on;
 
-- largest SalesOrderId to migrate 
select @largestSalesOrderId = isnull(max(SalesOrderID), 0)
from Sales.SalesOrderHeader
 
-- lowest SalesOrderId to migrate
select top 1 @i = so.SalesOrderId
from Sales.SalesOrderHeader so
left join Sales.SalesOrderHeader_new so_new
	ON so.SalesOrderId = so_new.SalesOrderId
where so_new.SalesOrderId is null
ORDER BY so.SalesOrderId
 
-- batch loop
while @i < @largestSalesOrderId
begin
 
    declare @counter int = 0;
    declare @error int = 0;
 
    -- retry loop:
    while (1=1)
    begin
        begin try
            select top (@minBatchSize) @new_i = SalesOrderID
            from Sales.SalesOrderHeader
            where SalesOrderID > @i
            order by SalesOrderID asc;
 
            merge Sales.SalesOrderHeader_new as target
            using (
                    select SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate
                    from Sales.SalesOrderHeader
                    where SalesOrderID > @i
                    and SalesOrderID <= @new_i
                ) as source (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
            on target.SalesOrderID = source.SalesOrderID
            when not matched then
                insert (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
                values (source.SalesOrderID, source.RevisionNumber, source.OrderDate, source.DueDate, source.ShipDate, source.[Status], source.OnlineOrderFlag, source.PurchaseOrderNumber, source.AccountNumber, source.CustomerID, source.SalesPersonID, source.TerritoryID, source.BillToAddressID, source.ShipToAddressID, source.ShipMethodID, source.CreditCardID, source.CreditCardApprovalCode, source.CurrencyRateID, source.SubTotal, source.TaxAmt, source.Freight, source.Comment, source.rowguid, source.ModifiedDate);
 
            --success            
            select @i = @new_i, @error = 0;
            break;
 
        end try
        begin catch
            select @counter    = @counter + 1,
                @error = @@ERROR;
 
            if (@counter >= 5)
            begin
                -- this batch failed five times! Something's wrong
                raiserror (N'Could not copy Sales.SalesOrderHeader rows to Sales.SalesOrderHeader_new',
                    16, 1, @error);
                break;
            end
        end catch
    end 
 
    if @error <> 0
    begin
        -- an error has been raised. Discontinue.
        break;
    end
 
end
 
set identity_insert Sales.SalesOrderHeader_new off;
go

Do The Switch

set xact_abort on
set deadlock_priority high
set transaction isolation level serializable
begin transaction
IF EXISTS 
    (
        SELECT 1 
        FROM INFORMATION_SCHEMA.TABLES 
        WHERE TABLE_SCHEMA = 'Sales' 
        AND TABLE_NAME = 'SalesOrderHeader_new'
    )
BEGIN
    begin try
        -- rename pks and indexes from original table
        exec sp_rename N'Sales.PK_SalesOrderHeader_SalesOrderID', N'PK_SalesOrderHeader_old_SalesOrderID', N'OBJECT';
        exec sp_rename N'Sales.SalesOrderHeader.AK_SalesOrderHeader_rowguid', N'AK_SalesOrderHeader_old_rowguid', N'INDEX'
        exec sp_rename N'Sales.SalesOrderHeader.AK_SalesOrderHeader_SalesOrderNumber', N'AK_SalesOrderHeader_old_SalesOrderNumber', N'INDEX'
        exec sp_rename N'Sales.SalesOrderHeader.IX_SalesOrderHeader_CustomerID', N'IX_SalesOrderHeader_old_CustomerID', N'INDEX'
        exec sp_rename N'Sales.SalesOrderHeader.IX_SalesOrderHeader_SalesPersonID', N'IX_SalesOrderHeader_old_SalesPersonID', N'INDEX'
 
        -- rename checks and defaults
        exec sp_rename N'Sales.CK_SalesOrderHeader_Status', 'CK_SalesOrderHeader_old_Status', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_DueDate', 'CK_SalesOrderHeader_old_DueDate', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_ShipDate', 'CK_SalesOrderHeader_old_ShipDate', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_SubTotal', 'CK_SalesOrderHeader_old_SubTotal', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_TaxAmt', 'CK_SalesOrderHeader_old_TaxAmt', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_Freight', 'CK_SalesOrderHeader_old_Freight', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_RevisionNumber', 'DF_SalesOrderHeader_old_RevisionNumber', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_OrderDate', 'DF_SalesOrderHeader_old_OrderDate', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_Status', 'DF_SalesOrderHeader_old_Status', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_OnlineOrderFlag', 'DF_SalesOrderHeader_old_OnlineOrderFlag', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_SubTotal', 'DF_SalesOrderHeader_old_SubTotal', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_TaxAmt', 'DF_SalesOrderHeader_old_TaxAmt', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_Freight', 'DF_SalesOrderHeader_old_Freight', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_rowguid', 'DF_SalesOrderHeader_old_rowguid', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_ModifiedDate', 'DF_SalesOrderHeader_old_ModifiedDate', N'OBJECT';
 
        -- drop fks from original table
        alter table Sales.SalesOrderHeader
            drop constraint 
            FK_SalesOrderHeader_Address_BillToAddressID,
            FK_SalesOrderHeader_Address_ShipToAddressID,
            FK_SalesOrderHeader_CreditCard_CreditCardID,
            FK_SalesOrderHeader_CurrencyRate_CurrencyRateID,
            FK_SalesOrderHeader_Customer_CustomerID,
            FK_SalesOrderHeader_SalesPerson_SalesPersonID,
            FK_SalesOrderHeader_ShipMethod_ShipMethodID,
            FK_SalesOrderHeader_SalesTerritory_TerritoryID
 
        -- drop fks to original table
        alter table Sales.SalesOrderDetail
            drop constraint FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID;
 
        alter table Sales.SalesOrderHeaderSalesReason
            drop constraint FK_SalesOrderHeaderSalesReason_SalesOrderHeader_SalesOrderID;
 
        -- drop triggers on original table
        drop trigger Sales.t_i_SalesOrderHeader;
        drop trigger Sales.t_u_SalesOrderHeader;
        drop trigger Sales.t_d_SalesOrderHeader;
 
        -- drop uSalesOrderHeader (but remember its definition)
        declare @triggersql nvarchar(max);
        select @triggersql = OBJECT_DEFINITION(object_id) from sys.triggers where name = 'uSalesOrderHeader';
        drop trigger Sales.uSalesOrderHeader; 
 
        -- rename original table to "old"
        exec sp_rename N'Sales.SalesOrderHeader', N'SalesOrderHeader_old', N'OBJECT';
 
        --rename the new indexes
        exec sp_rename N'Sales.PK_SalesOrderHeader_new_SalesOrderID', N'PK_SalesOrderHeader_SalesOrderID', N'OBJECT';
        exec sp_rename N'Sales.SalesOrderHeader_new.AK_SalesOrderHeader_new_rowguid', N'AK_SalesOrderHeader_rowguid', N'INDEX'
        exec sp_rename N'Sales.SalesOrderHeader_new.AK_SalesOrderHeader_new_SalesOrderNumber', N'AK_SalesOrderHeader_SalesOrderNumber', N'INDEX'
        exec sp_rename N'Sales.SalesOrderHeader_new.IX_SalesOrderHeader_new_CustomerID', N'IX_SalesOrderHeader_CustomerID', N'INDEX'
        exec sp_rename N'Sales.SalesOrderHeader_new.IX_SalesOrderHeader_new_SalesPersonID', N'IX_SalesOrderHeader_SalesPersonID', N'INDEX'
 
        -- rename the new checks and defaults
        exec sp_rename N'Sales.CK_SalesOrderHeader_new_Status', 'CK_SalesOrderHeader_Status', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_new_DueDate', 'CK_SalesOrderHeader_DueDate', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_new_ShipDate', 'CK_SalesOrderHeader_ShipDate', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_new_SubTotal', 'CK_SalesOrderHeader_SubTotal', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_new_TaxAmt', 'CK_SalesOrderHeader_TaxAmt', N'OBJECT';
        exec sp_rename N'Sales.CK_SalesOrderHeader_new_Freight', 'CK_SalesOrderHeader_Freight', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_RevisionNumber', 'DF_SalesOrderHeader_RevisionNumber', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_OrderDate', 'DF_SalesOrderHeader_OrderDate', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_Status', 'DF_SalesOrderHeader_Status', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_OnlineOrderFlag', 'DF_SalesOrderHeader_OnlineOrderFlag', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_SubTotal', 'DF_SalesOrderHeader_SubTotal', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_TaxAmt', 'DF_SalesOrderHeader_TaxAmt', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_Freight', 'DF_SalesOrderHeader_Freight', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_rowguid', 'DF_SalesOrderHeader_rowguid', N'OBJECT';
        exec sp_rename N'Sales.DF_SalesOrderHeader_new_ModifiedDate', 'DF_SalesOrderHeader_ModifiedDate', N'OBJECT';
 
        -- rename "new" table
        exec sp_rename N'Sales.SalesOrderHeader_new', N'SalesOrderHeader', N'OBJECT';
 
        --rename fks pointing out
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_Address_BillToAddressID', N'FK_SalesOrderHeader_Address_BillToAddressID', N'OBJECT'
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_Address_ShipToAddressID', N'FK_SalesOrderHeader_Address_ShipToAddressID', N'OBJECT' 
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_CreditCard_CreditCardID', N'FK_SalesOrderHeader_CreditCard_CreditCardID', N'OBJECT'
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_CurrencyRate_CurrencyRateID', N'FK_SalesOrderHeader_CurrencyRate_CurrencyRateID', N'OBJECT'
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_Customer_CustomerID', N'FK_SalesOrderHeader_Customer_CustomerID', N'OBJECT'
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_SalesPerson_SalesPersonID', N'FK_SalesOrderHeader_SalesPerson_SalesPersonID', N'OBJECT'
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_ShipMethod_ShipMethodID', N'FK_SalesOrderHeader_ShipMethod_ShipMethodID', N'OBJECT'
        exec sp_rename N'Sales.FK_SalesOrderHeader_new_SalesTerritory_TerritoryID', N'FK_SalesOrderHeader_SalesTerritory_TerritoryID', N'OBJECT'
 
        -- re-add fks pointing in (without checks for now)
        alter table Sales.SalesOrderDetail with nocheck add
        constraint FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID foreign key (SalesOrderId)
            references Sales.SalesOrderHeader(SalesOrderId) 
 
        alter table Sales.SalesOrderHeaderSalesReason with nocheck add
        constraint FK_SalesOrderHeaderSalesReason_SalesOrderHeader_SalesOrderID foreign key (SalesOrderId)
            references Sales.SalesOrderHeader(SalesOrderId)
 
        -- re-add trigger (use sp_executesql in order to keep in transaction)
        exec sp_executesql @triggersql;
    end try
 
    begin catch
        declare @error nvarchar(max) = ERROR_MESSAGE();
        select 'An error was encountered... Rolling back' as [message],
            @error as errorMessage
        raiserror (@error, 16, 1)
        rollback
    end catch
END
if @@TRANCOUNT > 0
    commit
GO

April 17, 2012

Modifying Tables Online – Part 2: Implementation Example

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 12:00 pm

SERIES: Modifying Tables Online

In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

So this is me putting the strategy into practice. I’ve decided to migrate a table from everyone’s favorite sample datatabase Adventureworks (2012 edition). And the change that I’m implementing is the addition of a rowversion column which is considered an offline operation:

-- adding a rowversion column offline
ALTER TABLE Sales.SalesOrderHeader
ADD [rowversion] rowversion;

The table I picked is Sales.SalesOrderHeader. I picked this table because it’s got foreign keys (columns pointing to records in other tables). And other tables have foreign keys pointing to SalesOrderHeader too. I also noticed that this table has a large variety of objects and settings which are dependent on it. Besides the foreign keys, other dependencies include

  • triggers
  • indexes (some UNIQUE and some not, some nonclustered and one not)
  • an identity column
  • default constraints
  • check constraints
  • computed columns

These make Sales.SalesOrderHeader interesting. So here’s the script (an implementation of the strategy already mentioned). You might be surprised at its size, it’s directly related to how interesting the table is. I don’t use any 2012-only features so you’ll be able to use these examples on SQL Server 2008 or SQL Server 2008 R2.

Create Staging Table

use AdventureWorks2012
 
-- new table
CREATE TABLE Sales.SalesOrderHeader_new(
	SalesOrderID int IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
	RevisionNumber tinyint NOT NULL,
	OrderDate datetime NOT NULL,
	DueDate datetime NOT NULL,
	ShipDate datetime NULL,
	[Status] tinyint NOT NULL,
	OnlineOrderFlag dbo.Flag NOT NULL,
	SalesOrderNumber  AS (isnull(N'SO'+CONVERT(nvarchar(23),SalesOrderID),N'*** ERROR ***')),
	PurchaseOrderNumber dbo.OrderNumber NULL,
	AccountNumber dbo.AccountNumber NULL,
	CustomerID int NOT NULL,
	SalesPersonID int NULL,
	TerritoryID int NULL,
	BillToAddressID int NOT NULL,
	ShipToAddressID int NOT NULL,
	ShipMethodID int NOT NULL,
	CreditCardID int NULL,
	CreditCardApprovalCode varchar(15) NULL,
	CurrencyRateID int NULL,
	SubTotal money NOT NULL,
	TaxAmt money NOT NULL,
	Freight money NOT NULL,
	TotalDue  AS (isnull((SubTotal+TaxAmt)+Freight,(0))),
	Comment nvarchar(128) NULL,
	rowguid uniqueidentifier ROWGUIDCOL  NOT NULL,
	ModifiedDate datetime NOT NULL,
	[Rowversion] rowversion NOT NULL, -- This is the new column!!
	constraint PK_SalesOrderHeader_new_SalesOrderID
		primary key clustered (SalesOrderID ASC),
	constraint FK_SalesOrderHeader_new_Address_BillToAddressID
		foreign key(BillToAddressID) references Person.[Address] (AddressID),
	constraint FK_SalesOrderHeader_new_Address_ShipToAddressID
		foreign key(ShipToAddressID) references Person.[Address] (AddressID),
	constraint FK_SalesOrderHeader_new_CreditCard_CreditCardID
		foreign key(CreditCardID) references Sales.CreditCard (CreditCardID),
	constraint FK_SalesOrderHeader_new_CurrencyRate_CurrencyRateID
		foreign key(CurrencyRateID) references Sales.CurrencyRate (CurrencyRateID),
	constraint FK_SalesOrderHeader_new_Customer_CustomerID
		foreign key(CustomerID) references Sales.Customer (CustomerID),
	constraint FK_SalesOrderHeader_new_SalesPerson_SalesPersonID
		foreign key(SalesPersonID) references Sales.SalesPerson (BusinessEntityID),
	constraint FK_SalesOrderHeader_new_SalesTerritory_TerritoryID
		foreign key(TerritoryID) references Sales.SalesTerritory (TerritoryID),
	constraint FK_SalesOrderHeader_new_ShipMethod_ShipMethodID
		foreign key(ShipMethodID) references Purchasing.ShipMethod (ShipMethodID),
	constraint CK_SalesOrderHeader_new_DueDate
		check (DueDate>=OrderDate),
	constraint CK_SalesOrderHeader_new_Freight
		check (Freight>=0.00),
	constraint CK_SalesOrderHeader_new_ShipDate
		check (ShipDate>=OrderDate or ShipDate is null),
	constraint CK_SalesOrderHeader_new_Status
		check ([Status]>=0 and [Status]=0.00),
	constraint CK_SalesOrderHeader_new_TaxAmt
		check (TaxAmt>=0.00)
)
 
-- add default constraints
alter table Sales.SalesOrderHeader_new add
constraint DF_SalesOrderHeader_new_RevisionNumber
	default 0 for RevisionNumber,
constraint DF_SalesOrderHeader_new_OrderDate
	default getdate() for OrderDate,
constraint DF_SalesOrderHeader_new_Status
	default 1 for [Status],
constraint DF_SalesOrderHeader_new_OnlineOrderFlag
	default 1 for OnlineOrderFlag,
constraint DF_SalesOrderHeader_new_SubTotal
	default 0.00 for SubTotal,
constraint DF_SalesOrderHeader_new_TaxAmt
	default 0.00 for TaxAmt,
constraint DF_SalesOrderHeader_new_Freight
	default 0.00 for Freight,
constraint DF_SalesOrderHeader_new_rowguid
	default newid() for rowguid,
constraint DF_SalesOrderHeader_new_ModifiedDate
	default getdate() for ModifiedDate
 
	-- two nc ak
create unique nonclustered index AK_SalesOrderHeader_new_rowguid
	on Sales.SalesOrderHeader_new(rowguid ASC)
 
create unique nonclustered index AK_SalesOrderHeader_new_SalesOrderNumber
	on Sales.SalesOrderHeader_new(SalesOrderNumber)
 
	-- two nc ix
create nonclustered index IX_SalesOrderHeader_new_CustomerID
	on Sales.SalesOrderHeader_new(CustomerID)
 
create nonclustered index IX_SalesOrderHeader_new_SalesPersonID
	on Sales.SalesOrderHeader_new(SalesPersonID)
GO

Add Triggers

/*
create triggers to maintain new table
*/
 
use AdventureWorks2012;
go
 
create trigger t_i_SalesOrderHeader
on Sales.SalesOrderHeader
after insert
as
	set identity_insert Sales.SalesOrderHeader_new on;
 
	insert SalesOrderHeader_new(SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
	select SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate
	from inserted
 
	set identity_insert Sales.SalesOrderHeader_new off;
go
 
create trigger t_u_SalesOrderHeader
on Sales.SalesOrderHeader
after update
as
	-- assuming pk values are never altered.
	update Sales.SalesOrderHeader_new
	set
		RevisionNumber = i.RevisionNumber,
		OrderDate  = i.OrderDate,
		DueDate  = i.DueDate,
		ShipDate  = i.ShipDate,
		[Status]  = i.[Status],
		OnlineOrderFlag  = i.OnlineOrderFlag,
		PurchaseOrderNumber  = i.PurchaseOrderNumber,
		AccountNumber  = i.AccountNumber,
		CustomerID  = i.CustomerID,
		SalesPersonID  = i.SalesPersonID,
		TerritoryID  = i.TerritoryID,
		BillToAddressID  = i.BillToAddressID,
		ShipToAddressID  = i.ShipToAddressID,
		ShipMethodID  = i.ShipMethodID,
		CreditCardID  = i.CreditCardID,
		CreditCardApprovalCode = i.CreditCardApprovalCode,
		CurrencyRateID  = i.CurrencyRateID,
		SubTotal  = i.SubTotal,
		TaxAmt  = i.TaxAmt,
		Freight  = i.Freight,
		Comment  = i.Comment,
		rowguid  = i.rowguid,
		ModifiedDate = i.ModifiedDate
	from Sales.SalesOrderHeader_new s
	join inserted i
		on s.SalesOrderID = i.SalesOrderID;
go
 
create trigger t_d_SalesOrderHeader
on Sales.SalesOrderHeader
after delete
as
	delete Sales.SalesOrderHeader_new
	from Sales.SalesOrderHeader_new s
	join deleted d
		on d.SalesOrderID = s.SalesOrderID ;
go

Move the data over in batches

/* 
populate new table in batches
*/
set nocount on;
set transaction isolation level serializable;
 
declare @i int = 0;
declare @new_i int = 0;
declare @rowcount int = -1;
declare @minBatchSize int = 5000
 
set identity_insert Sales.SalesOrderHeader_new on;
 
while @rowcount <> 0
begin
	select top (@minBatchSize) @new_i = SalesOrderID
	from Sales.SalesOrderHeader
	where SalesOrderID > @i
	order by SalesOrderID asc;
 
	merge Sales.SalesOrderHeader_new as target
	using (
			select SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate
			from Sales.SalesOrderHeader
			where SalesOrderID > @i
			and SalesOrderID <= @new_i
		) as source (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
	on target.SalesOrderID = source.SalesOrderID
	when not matched then
		insert (SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, [Status], OnlineOrderFlag, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, Comment, rowguid, ModifiedDate)
		values (source.SalesOrderID, source.RevisionNumber, source.OrderDate, source.DueDate, source.ShipDate, source.[Status], source.OnlineOrderFlag, source.PurchaseOrderNumber, source.AccountNumber, source.CustomerID, source.SalesPersonID, source.TerritoryID, source.BillToAddressID, source.ShipToAddressID, source.ShipMethodID, source.CreditCardID, source.CreditCardApprovalCode, source.CurrencyRateID, source.SubTotal, source.TaxAmt, source.Freight, source.Comment, source.rowguid, source.ModifiedDate);
 
	set @rowcount = @@ROWCOUNT
 
	set @i = @new_i
 
end
 
set identity_insert Sales.SalesOrderHeader_new off;

Do The Switch

-- rename pks and indexes from original table
exec sp_rename N'Sales.PK_SalesOrderHeader_SalesOrderID', N'PK_SalesOrderHeader_old_SalesOrderID', N'OBJECT';
exec sp_rename N'Sales.SalesOrderHeader.AK_SalesOrderHeader_rowguid', N'AK_SalesOrderHeader_old_rowguid', N'INDEX'
exec sp_rename N'Sales.SalesOrderHeader.AK_SalesOrderHeader_SalesOrderNumber', N'AK_SalesOrderHeader_old_SalesOrderNumber', N'INDEX'
exec sp_rename N'Sales.SalesOrderHeader.IX_SalesOrderHeader_CustomerID', N'IX_SalesOrderHeader_old_CustomerID', N'INDEX'
exec sp_rename N'Sales.SalesOrderHeader.IX_SalesOrderHeader_SalesPersonID', N'IX_SalesOrderHeader_old_SalesPersonID', N'INDEX'
 
-- rename checks and defaults
exec sp_rename N'Sales.CK_SalesOrderHeader_Status', 'CK_SalesOrderHeader_old_Status', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_DueDate', 'CK_SalesOrderHeader_old_DueDate', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_ShipDate', 'CK_SalesOrderHeader_old_ShipDate', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_SubTotal', 'CK_SalesOrderHeader_old_SubTotal', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_TaxAmt', 'CK_SalesOrderHeader_old_TaxAmt', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_Freight', 'CK_SalesOrderHeader_old_Freight', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_RevisionNumber', 'DF_SalesOrderHeader_old_RevisionNumber', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_OrderDate', 'DF_SalesOrderHeader_old_OrderDate', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_Status', 'DF_SalesOrderHeader_old_Status', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_OnlineOrderFlag', 'DF_SalesOrderHeader_old_OnlineOrderFlag', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_SubTotal', 'DF_SalesOrderHeader_old_SubTotal', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_TaxAmt', 'DF_SalesOrderHeader_old_TaxAmt', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_Freight', 'DF_SalesOrderHeader_old_Freight', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_rowguid', 'DF_SalesOrderHeader_old_rowguid', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_ModifiedDate', 'DF_SalesOrderHeader_old_ModifiedDate', N'OBJECT';
 
-- drop fks from original table
alter table Sales.SalesOrderHeader
	drop constraint
	FK_SalesOrderHeader_Address_BillToAddressID,
	FK_SalesOrderHeader_Address_ShipToAddressID,
	FK_SalesOrderHeader_CreditCard_CreditCardID,
	FK_SalesOrderHeader_CurrencyRate_CurrencyRateID,
	FK_SalesOrderHeader_Customer_CustomerID,
	FK_SalesOrderHeader_SalesPerson_SalesPersonID,
	FK_SalesOrderHeader_ShipMethod_ShipMethodID,
	FK_SalesOrderHeader_SalesTerritory_TerritoryID
 
-- drop fks to original table
alter table Sales.SalesOrderDetail
	drop constraint FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID;
 
alter table Sales.SalesOrderHeaderSalesReason
	drop constraint FK_SalesOrderHeaderSalesReason_SalesOrderHeader_SalesOrderID;
 
-- drop triggers on original table
drop trigger Sales.t_i_SalesOrderHeader;
drop trigger Sales.t_u_SalesOrderHeader;
drop trigger Sales.t_d_SalesOrderHeader;
 
-- drop uSalesOrderHeader (but remember its definition)
declare @triggersql nvarchar(max);
select @triggersql = OBJECT_DEFINITION(object_id) from sys.triggers where name = 'uSalesOrderHeader';
drop trigger Sales.uSalesOrderHeader; 
 
-- rename original table to "old"
exec sp_rename N'Sales.SalesOrderHeader', N'SalesOrderHeader_old', N'OBJECT';
 
--rename the new indexes
exec sp_rename N'Sales.PK_SalesOrderHeader_new_SalesOrderID', N'PK_SalesOrderHeader_SalesOrderID', N'OBJECT';
exec sp_rename N'Sales.SalesOrderHeader_new.AK_SalesOrderHeader_new_rowguid', N'AK_SalesOrderHeader_rowguid', N'INDEX'
exec sp_rename N'Sales.SalesOrderHeader_new.AK_SalesOrderHeader_new_SalesOrderNumber', N'AK_SalesOrderHeader_SalesOrderNumber', N'INDEX'
exec sp_rename N'Sales.SalesOrderHeader_new.IX_SalesOrderHeader_new_CustomerID', N'IX_SalesOrderHeader_CustomerID', N'INDEX'
exec sp_rename N'Sales.SalesOrderHeader_new.IX_SalesOrderHeader_new_SalesPersonID', N'IX_SalesOrderHeader_SalesPersonID', N'INDEX'
 
-- rename the new checks and defaults
exec sp_rename N'Sales.CK_SalesOrderHeader_new_Status', 'CK_SalesOrderHeader_Status', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_new_DueDate', 'CK_SalesOrderHeader_DueDate', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_new_ShipDate', 'CK_SalesOrderHeader_ShipDate', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_new_SubTotal', 'CK_SalesOrderHeader_SubTotal', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_new_TaxAmt', 'CK_SalesOrderHeader_TaxAmt', N'OBJECT';
exec sp_rename N'Sales.CK_SalesOrderHeader_new_Freight', 'CK_SalesOrderHeader_Freight', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_RevisionNumber', 'DF_SalesOrderHeader_RevisionNumber', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_OrderDate', 'DF_SalesOrderHeader_OrderDate', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_Status', 'DF_SalesOrderHeader_Status', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_OnlineOrderFlag', 'DF_SalesOrderHeader_OnlineOrderFlag', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_SubTotal', 'DF_SalesOrderHeader_SubTotal', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_TaxAmt', 'DF_SalesOrderHeader_TaxAmt', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_Freight', 'DF_SalesOrderHeader_Freight', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_rowguid', 'DF_SalesOrderHeader_rowguid', N'OBJECT';
exec sp_rename N'Sales.DF_SalesOrderHeader_new_ModifiedDate', 'DF_SalesOrderHeader_ModifiedDate', N'OBJECT';
 
-- rename "new" table
exec sp_rename N'Sales.SalesOrderHeader_new', N'SalesOrderHeader', N'OBJECT';
 
--rename fks pointing out
exec sp_rename N'Sales.FK_SalesOrderHeader_new_Address_BillToAddressID', N'FK_SalesOrderHeader_Address_BillToAddressID', N'OBJECT'
exec sp_rename N'Sales.FK_SalesOrderHeader_new_Address_ShipToAddressID', N'FK_SalesOrderHeader_Address_ShipToAddressID', N'OBJECT'
exec sp_rename N'Sales.FK_SalesOrderHeader_new_CreditCard_CreditCardID', N'FK_SalesOrderHeader_CreditCard_CreditCardID', N'OBJECT'
exec sp_rename N'Sales.FK_SalesOrderHeader_new_CurrencyRate_CurrencyRateID', N'FK_SalesOrderHeader_CurrencyRate_CurrencyRateID', N'OBJECT'
exec sp_rename N'Sales.FK_SalesOrderHeader_new_Customer_CustomerID', N'FK_SalesOrderHeader_Customer_CustomerID', N'OBJECT'
exec sp_rename N'Sales.FK_SalesOrderHeader_new_SalesPerson_SalesPersonID', N'FK_SalesOrderHeader_SalesPerson_SalesPersonID', N'OBJECT'
exec sp_rename N'Sales.FK_SalesOrderHeader_new_ShipMethod_ShipMethodID', N'FK_SalesOrderHeader_ShipMethod_ShipMethodID', N'OBJECT'
exec sp_rename N'Sales.FK_SalesOrderHeader_new_SalesTerritory_TerritoryID', N'FK_SalesOrderHeader_SalesTerritory_TerritoryID', N'OBJECT'
 
-- re-add fks pointing in (without checks for now)
alter table Sales.SalesOrderDetail with nocheck add
constraint FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID foreign key (SalesOrderId)
	references Sales.SalesOrderHeader(SalesOrderId) 
 
alter table Sales.SalesOrderHeaderSalesReason with nocheck add
constraint FK_SalesOrderHeaderSalesReason_SalesOrderHeader_SalesOrderID foreign key (SalesOrderId)
	references Sales.SalesOrderHeader(SalesOrderId)
 
-- re-add trigger
exec sp_executesql @triggersql;

April 16, 2012

Modifying Tables Online – Part 1: Migration Strategy

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 12:00 pm

SERIES: Modifying Tables Online

In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.

This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.

Database Schema Changes

Database schema changes are inevitable. Some of these changes can be performed quickly (like altering ntext columns to nvarchar columns). Changes like these don’t have to process every single row in the table and we say that these quick changes are online.
Of course that means there are also such things as offline changes. These are changes that need to lock the table and process each row (like adding a persisted computed column to a table). Offline changes are sometimes okay when:

  • You have scheduled maintenance window to do work
  • The table isn’t that large or
  • the estimated amount of downtime is tolerable

But sometimes the estimated downtime is not tolerable and you want your changes to be online. Microsoft has improved and continue to improve SQL Server’s online features. For example:

There are a few operations that are still not online. Here’s an example of one of them:

ALTER TABLE [Tablename]
ADD Rowversion [Rowversion] NOT NULL

The downtime for this table change can last seconds or minutes depending on the size of the table. If your business can’t tolerate that downtime, what options are there? I’m going to explain one strategy which helps me with these following goals:

  • I want to add a rowversion column to a table. (But the strategy applies to many other offline changes).
  • The straightforward alter statement would lock the table for longer than we can tolerate
  • My change can place an exclusive lock on the table, but only for a short amount of time. This means that the change should work concurrently with quick transactions, but not necessarily with large bulk transactions.
  • It would be an advantage if this were a database-only solution because while I might understand the applications and processes that query my database, I don’t necessarily have as much control over when or how these queries are sent.

So here’s something that might work:

The Strategy

The strategy is fairly straightforward:

A) I create a staging table which will store the same data as the original table but with the new rowversion column. The staging table is empty at first:

B) Then I add triggers to the original table to keep the staging table up to date. Any changes in the original table will also be applied to matching data in the staging table.

C) Fill the staging table a little at a time. This processing may take time, but the original table is still available to others.

D) Finally remove triggers and rename the tables:

Some Notes

Dependencies I didn’t mention it yet, but we also have to worry about table dependencies. Tables can have a lot of dependencies (triggers, constraints, indexes etc…). You can create dependencies for the staging table, and rename them as well during the switch.

Step D, the Switch: The last step is the tricky one. It’s the one that takes a schema modification (Sch-M) lock on the tables. A Sch-M lock on the table is incompatible with every other kind of lock. So that means that no in-flight transactions are allowed to use the table during the switch. SQL Server handles it pretty well, but if there are long running queries on this table, there will be blocking.

What’s next?

  • In part 2, I implement this strategy for Adventureworks’ Sales.SalesOrderHeader table which seems to demonstrate every kind of table dependency SQL Server allows.
  • In part 3, I throw in error handling code to make the script more robust. So that if things go wrong, the application doesn’t notice and I don’t lose data.
  • In part 4, I explain the testing I do. It gives me more confidence that I don’t miss an update or a delete somehow.

Thanks

I posted a question on dba.stackexchange.com to look for some strategies. The answer I got there (thanks Brent!) is slightly different than the solution I’m describing here. Brent described a solution which used a view that UNION’s the staging table and the original table. It uses less space than the solution I describe, but it’s got some extra gotchas.

Some of this work was done in the course of my regular job. So thanks to Desire2Learn for letting me share some of the lessons I learned there. By the way, we’re hiring.

March 28, 2012

Sending Query Results to Others

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

I think most of you have all been here before. You want to know the results of a query, but you have to get it from a colleague through email or some other proxy:

Jerry Maguire is asked to "Show Me The Data"

So how do you usually do it? How do you ask someone to send you the results of a query? There’s a million ways to answer that question and I’ve discovered that none of them are perfect. Depending on the situation, different ways are better than others. So let’s pick a situation and explore the best way to ask for those query results.

The Situation

The facts were these…

  • Goal: I want the results of the following query. It retrieves the top 20 cached queries and plans based on total consumed cpu.
  • I have to rely on my friend to run the query for me.
  • My friend is not a database administrator but is comfortable around SQL Server Management Studio
  • I want to minimize or simplify instructions.
  • I don’t want any truncated results.
  • I want to assure my friend that these queries are safe.
  • For this situation, I’m not worried how my friend gets the data to me (email, ftp, etc…). I only care that my friend is able to persist the data to some file.

And to complete the situation, here’s that query I’m curious about…

-- TOP 20 queries (by CPU)
SELECT TOP (20)
    [TotalCPU] = total_worker_time,
    [TotalElapsedTime] = total_elapsed_time,
    [ExecutionCount] = execution_count,
    [AverageCPUInMicroseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
    [DBName] = DB_NAME(ST.dbid),
    [ObjectName] = OBJECT_NAME(ST.objectid, ST.dbid),
    [QueryText] = (SELECT [processing-instruction(q)] = CASE
            WHEN [sql_handle] IS NULL THEN ' '
            ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                (CASE
                        WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                        ELSE QS.statement_end_offset
                        END - QS.statement_start_offset) / 2))
            END
			FOR XML PATH(''), type),
    [QueryPlan] = qp.query_plan
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
ORDER BY total_worker_time DESC

Notice that the results of this query contains xml, and that xml might contain commas, spaces, tabs, newlines and all the typical delimiters. That makes my job trickier. So now I’m going to list a number of methods and their pros and cons. I can think of a number of methods off the top of my head.

Method 1: Cut and Paste into Excel

This is most people’s usual method and it works in 90% of all cases because Excel is so versatile. However in this particular case, the whitespace in the query plans and the sql text mess up some of the rows and formatting, one field in each row. I get something like this, ugh:

A jumbled Excel spreadsheet

But there’s a better way using Excel. It’s not too hard to use the Data Connection Wizard. So it’s not too much work to guide my friend through it either. If I can, then this is a really useful nice way to pass around data. That wizard is accessed like this:

Using Excel to query the data

Method 2: Store Results As Tables in a Temporary DB and Back It Up

That can be done with a single T-SQL script which my friend should be able to run no problem. The script looks something like this:

create database myresults
go
-- TOP 20 queries (by CPU)
SELECT TOP (20)
    [TotalCPU] = total_worker_time,
    [TotalElapsedTime] = total_elapsed_time,
    [ExecutionCount] = execution_count,
    [AverageCPUInMicroseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
    [DBName] = DB_NAME(ST.dbid),
    [ObjectName] = OBJECT_NAME(ST.objectid, ST.dbid),
    [QueryText] = (SELECT [processing-instruction(q)] = CASE
            WHEN [sql_handle] IS NULL THEN ' '
            ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                (CASE
                        WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                        ELSE QS.statement_end_offset
                        END - QS.statement_start_offset) / 2))
            END
			FOR XML PATH(''), type),
    [QueryPlan] = qp.query_plan
INTO myresults.dbo.myresultset
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
ORDER BY total_worker_time DESC
go
 
backup database myresults to disk = 'C:\Users\Michael Swart\Desktop\Output.bak'
go
 
drop database myresults
go

The best part of this is that the data is backed up perfectly, the data captures all the data with no truncation. Xml fields are stored as xml fields and once restored, it’s easy to access the resultset exactly as it was on the target computer.

The bad news is that although it should be fairly safe, it’s not as easy to assure my friend that the query above changes nothing. It does! It creates a whole database and backs it up. It then drops that database.

Hang on a second, if my friend can use this method, that means my friend (the non-dba) is authorized to drop databases. Yikes! What’s up with that?

Method 3: Using the Export Data Wizard

It seems like SQL Server’s “Export Data” wizard was just built for this job. That can be accessed here:

SSMS's Export Data...

But I don’t recommend it, not one bit. It launches a wizard which takes me through a set of choices, and no matter what I choose, it always seems to choke on the xml column.

It’s clear that this Export Data Wizard is using SQL Server Integration Services (SSIS) under the covers. How do I know it’s SSIS under the covers? Well if the clumsy data type handling weren’t a giveaway (zing!), the wizard can save the job I specified as an SSIS package.

The SSIS shows through too much to recommend to my friend, who has no dreams of becoming a B.I. Maestro.

So it seems like this method is only appropriate for those situations when cutting and pasting to Excel would be just as appropriate. And it’s kind of hard to write instructions simpler than “Cut and paste into Excel”.

Method 4: Use Management Studio’s Results-To-File

Most people use the query window’s default setting, Results-To-Grid. Sometimes there’s a use for Results-To-Text (like when using sp_helptext for example). But what about Results-To-File? I can set that using the Query window’s context menu here:

The context menu used to access "results to file"

I can also enable results to file with Ctrl+Shift+F. I had to look that up though. There’s a reason I haven’t learned that shortcut by heart. It’s because results-to-file doesn’t work very well. As far as I can tell, it’s like specifying results-to-text and then pasting that into a text file.

The biggest problem here is that depending on the results, the data gets truncated easily and the newlines and tabs in the data get mixed up with the rest of the formatting. By the way, I’m also out of luck (and for the same reason) if I use SSMS to save grid results.

Method 5: Use BCP, along with SQLCMD mode

If I can instruct my friend to use a query window in SQLCMD mode, then I can write a script which will get a bcp script to export data. Enable SQLCMD mode by choosing it from the Query menu in SSMS:

Selecting SQLCMD Mode from the Query Menu

And the script I would have my friend run looks like this:

!!bcp "Select 'query omitted' as [BigLongQuery]" queryout "c:\temp\desktop\output5.dat" -SSWARTLAPTOP -T -n

which I could then get back on my side using bcp or using OPENROWSET.

This method is okay. It’s kind of awkward (and any instructions to my friend will be equally as awkward) and it always takes some work to get the syntax just right. But BCP is kind of like Rob Lowe, he hasn’t changed in years and he’s really not that bad.

Method 6: Output the Whole Thing Using FOR XML PATH

Hey! Now I’m getting somewhere! XML was built for this kind of thing. All I do is take the query, and append FOR XML PATH to it and send the query to my friend saying “Run this and paste the results into notepad.” Here’s the whole query:

SELECT TOP (20)
    [TotalCPU] = total_worker_time,
    [TotalElapsedTime] = total_elapsed_time,
    [ExecutionCount] = execution_count,
    [AverageCPUInMicroseconds] = cast(total_worker_time / (execution_count + 0.0) as money),
    [DBName] = DB_NAME(ST.dbid),
    [ObjectName] = OBJECT_NAME(ST.objectid, ST.dbid),
    [QueryText] = (SELECT [processing-instruction(q)] = CASE
            WHEN [sql_handle] IS NULL THEN ' '
            ELSE (SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
                (CASE
                        WHEN QS.statement_end_offset = -1 THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
                        ELSE QS.statement_end_offset
                        END - QS.statement_start_offset) / 2))
            END
			FOR XML PATH(''), type),
    [QueryPlan] = qp.query_plan
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text([sql_handle]) ST
CROSS APPLY sys.dm_exec_query_plan ([plan_handle]) QP
ORDER BY total_worker_time DESC
FOR XML PATH

And the results that are sent back are immediately accessible without any processing.

I think this method works best for this particular situation. It hits my goals perfectly. Now if you dear reader want to use this method in your situation, there’s just some extra caveats:

  • SSMS restricts XML data results to 2 megabytes, although that’s configurable.
  • If you query binary fields (say query plan handles for example), then you might be surprised if the results come back in base64 encoding rather than hex which is a bit of a pain.

Other methods

Powershell I know you powershell pros are eager to give a one line script that does everything here. So lets hear it the comments! I know that powershell treats sets of objects (rather than unix’s cshell which deals with text). So I’m curious how well it serializes objects to a file.

SQL Server Integration Services I haven’t forgotten about SSIS either, it’s the tool that was built for moving data from here to there in some format (any here, any there, any format). But I just couldn’t get the instructions simple enough for this scenario.

March 7, 2012

SQL Server 2012 Launches Today

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , — Michael J. Swart @ 1:08 am

Question: What just arrived today, has been months in the making and makes me super-excited today?
Answer: It’s my new nephew! I’m an uncle again as of early early this morning!
Question: Okay, what else?
Answer: What? That’s not enough? okay… (Original article follows)

Today is March 7th, 2012 and SQL Server 2012 launches today. Woohoo!

Here’s the scoop:

  • We’ve had the Community Technology Previews (CTPs) to mess around with for a while now.
  • We’ve had a Release Candidate (RC0) for a little bit.
  • Today the product launches.
  • April 1st, (no kidding) the product is released to general availability.
  • But don’t wait until then. Today you can download the RTM evaluation copy.

Also

  • SQL Server Data Tools is now RTW! More on that below.

The Virtual Launch

... it's just an illustration of one

Microsoft is marking the event with a virtual launch here. An online event that lets you watch them demo what they’ve been working on since the last release (And unless you’re a BI professional, that means since 2008)

Personally What I’m Most Excited About: SSDT

The code I work with gets deployed to a large number of systems which run on multiple versions of SQL Server. In order to avoid maintaining multiple code-lines, I can’t make use of the new features until we de-support all the older versions.

For example, I can’t use the IIF function (new in 2012) if I’m worried that the code could be deployed to a 2008 instance somewhere. So there’s going to be some waiting for me and probably some of you …

Not so fast, there are still a few things we can use today. And I want to focus on one of them. It’s called SQL Server Data Tools (SSDT) and it was Released To Web (RTW) today.

What’s SSDT do? It’s a free development environment. It lets you develop SQL for any version of SQL Server (including SQL Azure) whether you’re connected or not. It supports Intellisense (and as far as I can tell, even against 2005). It does schema comparisons and data comparisons.

The clincher for me is that the table designer is the first one I’ve ever preferred over a plain old text editor.  It’s that slick. Check it out.

 

February 15, 2012

Careful Hashing

Filed under: SQLServerPedia Syndication,Technical Articles — Tags: , , , , , — Michael J. Swart @ 12:00 pm

Cryptographic hashing functions like MD5 or SHA-1 are examples of one-way functions that take any data you give it and return a fixed set of bytes (for example, 16 bytes for MD5 and 20 bytes for SHA-1).

I recently had trouble coming up with the same hash value for what I thought were the exact same inputs. Compare the following two examples:

// C# example
byte[] byteArray = Encoding.Unicode.GetBytes("I'm a lumberjack and I'm okay.");
 
SHA1 sha = new SHA1CryptoServiceProvider();
byte[] hashedPasswordBytes = sha.ComputeHash(byteArray);
 
Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
//30-F1-CE-3E-7E-1F-17-27-A7-B5-97-39-44-D0-FE-EA-DA-69-53-2B

That’s different than this:

-- SQL example
select HASHBYTES('SHA1', 'I''m a lumberjack and I''m okay.')
--0x4D6936CEDD0DE794AC86F9A4099DCBB4EFED8E1F

So what gives? Before reading any further, can you spot the problem? In both examples above, two things are happening. First a string of characters is being converted into a string of bytes and then that string of bytes gets hashed. I’ve learned that it’s not the hash function that’s the problem, it’s converting the character string into binary that is inconsistent.

String Conversions

Maybe that’s not surprising, but still, it’s one more thing to be careful about: converting strings to bytes. SQL Server’s wide characters (NCHARs and NVARCHARs) are equivalent to .Net’s Encoding.Unicode and SQL Server’s single non-unicode characters (CHAR and VARCHAR) (single-byte characters) are equivalent to .net’s Encoding.ASCII.

So for completeness sake, these should be equivalent:

-- wide characters (remember unicode string literals are prefixed with 'N')
select HASHBYTES('SHA1', N'I''m a lumberjack and I''m okay.')
--0x30F1CE3E7E1F1727A7B5973944D0FEEADA69532B
 
-- non-unicode characters:
select HASHBYTES('SHA1', 'I''m a lumberjack and I''m okay.')
--0x4D6936CEDD0DE794AC86F9A4099DCBB4EFED8E1F

And

// Unicode
byte[] byteArray = Encoding.Unicode.GetBytes("I'm a lumberjack and I'm okay.");
SHA1 sha = new SHA1CryptoServiceProvider();
byte[] hashedPasswordBytes = sha.ComputeHash(byteArray);
Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
//30-F1-CE-3E-7E-1F-17-27-A7-B5-97-39-44-D0-FE-EA-DA-69-53-2B
 
//ASCII
byteArray = Encoding.ASCII.GetBytes("I'm a lumberjack and I'm okay.");
hashedPasswordBytes = sha.ComputeHash(byteArray);
Console.WriteLine(BitConverter.ToString(hashedPasswordBytes));
//4D-69-36-CE-DD-0D-E7-94-AC-86-F9-A4-09-9D-CB-B4-EF-ED-8E-1F

This means that if you’re going to hash strings in SQL Server and compare them to strings that are hashed in C#, you want to use

  • Encoding.Unicode for NCHARS and NVARCHARS
  • Encoding.ASCII for CHARS and VARCHARS

This also means that for this use case, I wouldn’t feel safe using other encodings (such as Encoding.UTF8) because SQL Server can’t duplicate that string to binary conversion.

Caveats

  • Don’t trust me and test it out for yourself!
  • Collations matter a little (sort order is not relevant here). The code page for your SQL Server collation should match the code page of the .net encoding when dealing with single byte characters. Unicode strings should not care about collation.
  • If you do find any counter-examples for anything I’ve written here, let me know in the comments.

February 9, 2012

Microsoft Model Databases: Some Unconventional Alternatives?

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:53 pm

It seems like Microsoft has always provided example databases in order to illustrate database concepts.

Pubs: Back when SQL Server 2000 was new, Microsoft provided this sample database and maybe you remember it. Pubs was a really simple database that was designed to represent data that a fictional book publishing company might use. The database had a name (pubs) but I don’t think anyone ever gave the company a name. From here we’re told that it was “used to demonstrate many of the options available” for SQL Server.

Northwind also was available for SQL Server 2000. It was a database that was used for a fictitious company “Northwind Traders”. The company “imports and exports specialty foods from around the world.” It was better than pubs, but that wasn’t saying much. Pubs set a pretty low bar.

AdventureWorks: The Adventureworks database came along with SQL Server 2005 and we were introduced to the bicycle manufacturer Adventure Works Cycle. The database was pretty comprehensive and it touched on a lot more features of SQL Server. I’ve seen countless performance demos that query Sales.SalesOrderDetail. There’s even an Adventureworks Data Warehouse, an Adventureworks OLAP cube and even a light version: AdventureworksLT.

Contoso And lately there’s a new sample database Contoso. It’s another fictional retail company (Contoso Ltd. This time they’re selling electronics and appliances). This database is used for Business Intelligence demos. With Contoso, Microsoft has developed and made a data warehouse and an SSAS cube available.

Presumably, each of these databases is an example of Microsoft putting their best foot forward. Or at least it’s a database which enables them to show off what their database software can do.

Beyond Demos

So those databases are good for demos.

But if we’re looking for other Microsoft-designed databases, we don’t have to stop there. There’s other ones that we can look at. I’m talking about the databases that Microsoft ships along with their products. When looking at these, remember:

  • There are some good examples, …
    We can look at those databases and see how they’re constructed. If you see some interesting db designs, you can use that.
  • … and there are some not so good examples …
    These databases weren’t built as database role models. So there’s bound to be a lot of database design choices that are not ideal. I know that Microsoft has huge pressure to maintain backwards compatibility.
  • … and that’s okay.
    It’s actually a fun exercise to pick out the good designs from the bad. It’s like a smorgasbord, you take the lessons you like and leave the rest.

Some examples of what I’m talking about.

ReportServer

Install and configure an instance of SQL Server Reporting Services. Take a look at the ReportServer. It’s a pretty nice database schema. One thing that was interesting was the choice of identity column data types:

  • Catalog seems to be one of the main tables here and they’re using uniqueidentifiers (GUIDs) for identity columns. Normally people scoff at that, but I think it makes sense in this case. Catalog is the table that tracks reports, datasources and folder structures. Even at it’s largest, it can’t be that big (I just checked my own installation and the clustered index fits on 1 extent). That means that all the worries about GUIDs for primary keys are mitigated and the choice of using a globally unique identifier might make sense here.
  • ExecutionLogStorage Contrast that with ReportServer’s largest table, ExecutionLogStorage. Depending on how busy your Reporting Services installation is, this table can grow very large with one row for every report view. When I look at the choice for their primary key, I see that it’s clustered index on a single BIGINT column defined as it’s IDENTITY column. Well done.

Any Database’s Metadata

  • I like the object oriented influence here:
    • The relationship between sys.objects –> sys.tables and between sys.objects –> sys.procedures is the subtype relationship class –> subclass. And that shows in the schema, for example, there’s one sys.object row for every sys.table row.
    • The relationship between sys.foreign_keys –> and sys.foreign_key_columns is the object composition relationship. And that also shows in the schema too. The “primary key” for the table foreign_key_columns is (constraint_object_id, constraint_column_id) and it contains the primary key of its owner.
  • Another good example is the sys.index_columns table. Look at the schema diagram below. It’s a perfect instance of my favourite DB Design example

Any Sharepoint database

  • [Sigh] These are mostly counter-examples here, I took a brief glance at the schema once and decided quickly that there are no good lessons. Can anyone tell me if that’s changed?

February 2, 2012

Removing Duplicate Dimension Rows in SSIS

Something New For Me


I’m relatively new to SQL Server Integration Services (SSIS). And so I want to explain how I solved a problem and invite Business Intelligence (BI) experts (and wannabe experts) to comment or point out things that I missed.

A Short Description of the Problem

So here’s a short description of the problem I’m trying to solve. It involves populating a dimension table. I want a data flow transformation which removes duplicate key values. In other words, every time I come across a new key value for a dimension, I want to pass through that entire row (not just the key values). This is kind of like a cross between T-SQL’s DISTINCT and FIRST_VALUE.

I found out recently that people have treated this topic before:

But because of my bad luck, I only found these posts after I muddled through a solution. But I was surprised to find out my solution had some interesting differences.

The Setup

First I’ll explain my situation. Say I’ve got a flat file containing data about Prime Ministers:

The source data

My main goal is to use this data to populate a data warehouse. That data warehouse has a schema like this:

The Data Warehouse Schema

This example is just for illustration. In real life, the number of fact rows is very large, and the number of dimension rows is relatively small. The time dimension will be pre-populated, but I want to load the other dimensions using values from the flat file. So that means I plan to process the data twice: The first time will be to load these dimensions, and the second time will be to load the fact table. When I process the data for dimensions, I :

  • Use a lookup transform to find whether that dimension already exists in the warehouse
  • Use the No Match Output to identify dimension rows I haven’t seen before. Those rows will be used to load the dimension. (i.e. I’m treating the dimension like a Never-Changing-Dimension rather than a Slowly Changing Dimension.)

The problem is that these unmatched rows can be duplicated and I don’t want those duplicate dimensions rows getting into the dimension table.

Things I considered:

  • I considered using a staging table on the DB side. But that required too much space, the amount of space needed was comparable to the fact table, not the dimension table.
  • I also considered inserting into dimensions a row-at-a-time using the MERGE statement on the database side. But I didn’t consider that for long. The performance with that strategy was so unacceptable.
  • I could have used a sort/aggregate, but that was a blocking transformation and I really didn’t need the data sorted.

What I ended up with

I created a script transform which seems to do the trick for what I want. My Data Flow task looks like this:

In those script transforms you see up there, I told SSIS that each script will be used to filter rows out or let them pass through. I did that by setting ExclusionGroup to 1 (by default, it’s normally 0). Here’s where that’s done:

Then in the script, I created a list of seen dimensions. And I let rows pass through if they haven’t yet been seen. The script simply looks like this:

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
 
    if (Row.Party_IsNull)
        return;
 
    if (!seenParty.Contains(Row.Party))
    {
        seenParty.Add(Row.Party);
        Row.DirectRowToOutput0();
    }
 
}

Logically it behaves exactly the same as the sort transformation (as explained in “Dead Easy Duplicate Key Removal” by Todd McDermid). Except for these things:

  • It seems to perform fast
  • It’s not a blocking transformation
  • But you don’t get all that for nothing. What we’re not getting with this method is that the rows aren’t sorted. In most cases, that doesn’t matter.
  • It takes some memory, so you have to be aware of that. Ask yourself, how well can all unseen dimension keys fit in memory?

Some questions for you BI guys

And here’s where I’d like you to chime in.

  1. What’s the most common approach to this problem?
  2. I get the feeling that I could use the Cache Transform for this technique. But I haven’t tried it.
  3. Are there any new features in SQL 2012 that help?
  4. Am I re-inventing the wheel here?
  5. Am I missing anything else that should be obvious?
Older Posts »

Powered by WordPress