SERIES: Modifying Tables Online
In the next few days I want to describe a way to modify the definition of very large tables while keeping the table available for other queries. Some table changes are already online operations. But offline changes need extra work to keep the table available. I explore one way to do that in this series.
This article used to be a single blog post, but I broke it into four parts because it seemed too long. Read Part 1, and understand the strategy I’m using. Treat Parts 2-4 as appendixes. When you want to use this plan (or one like it), come back and use them as examples. You won’t be able to make use of the examples by cutting and pasting directly (unless you happen to be running an bike equipment store called Adventureworks), but modifications shouldn’t be too hard.
This is part 4 of the series (you can consider this Appendix 3, The main article is found here).
So my testing is also pretty straightforward. I
- Backup the Adventureworks2012 database on my dev machine
- Start sending a whack of queries to the database to simulate concurrent activity
- Make Adventureworks2012.Sales.SalesOrderHeader larger by adding lots of rows
- Migrate the database using the script from part 3
- Look for any errors, fix them, restore the db and repeat the process if necessary
- Otherwise … profit?
You should be able to handle each of these steps. Except that I want to explain exactly how I personally handle number 2: I use an application to send queries to the database many times. Sounds easy right? The trick is that I want to call the stored procedure many times at once.
Activity Generator
This is a C# program which I compile into a .net application (I can’t wait for .net 4.5’s async and await)
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Data.SqlClient; using System.Threading; namespace ConsoleApplication1 { class Program { static int counter = 0; static void Main( string[] args ) { SqlConnectionStringBuilder cs = new SqlConnectionStringBuilder(); cs.DataSource = @"."; cs.InitialCatalog = "Adventureworks2012"; cs.IntegratedSecurity = true; cs.AsynchronousProcessing = true; string connectionString = cs.ToString(); string sql = @"s_DoSomething"; for( int i = 0; i < 100000; i++ ) { SqlConnection conn = new SqlConnection( connectionString ); conn.Open(); SqlCommand cmd = new SqlCommand( sql, conn ); cmd.CommandType = System.Data.CommandType.StoredProcedure; cmd.BeginExecuteNonQuery( new AsyncCallback( EndExecution ), cmd ); } Console.WriteLine( string.Format( @"Error count: {0}", counter ) ); Console.ReadLine(); } static void EndExecution( IAsyncResult c ) { SqlCommand endCmd = ( c.AsyncState as SqlCommand ); try { endCmd.EndExecuteNonQuery( c ); } catch( Exception ex ) { //counter++; Interlocked.Increment( ref counter ); Console.WriteLine( ex.Message ); } finally { endCmd.Connection.Close(); } } } } |
You might have noticed that this program is extremely boring. It’s just calling one stored procedure: s_DoSomething over and over as fast as it can. The advantage is that I’m free to modify s_DoSomething (which is for me a lot easier than modifying this application).
So what does this procedure do? I’m glad you asked:
s_DoSomething
This procedure executes at random one task chosen from a set of typical OLTP tasks that use the table Sales.SalesOrderHeader. Here’s the one I used:
create procedure s_DoSomething as begin declare @choice int = 1 + RAND() * 6 declare @SalesOrderId int; if (@choice = 1) insert Sales.SalesOrderHeader(DueDate, OrderDate, CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID) select top 1000 DueDate, DATEADD(day, -1, DueDate), CustomerID, BillToAddressID, ShipToAddressID, ShipMethodID from Sales.SalesOrderHeader else if (@choice = 2) delete top (100) Sales.SalesOrderHeader else if (@choice = 3) begin declare @personid int; select top 1 @personid = BusinessEntityID from Sales.SalesPerson order by newid(); select * from sales.SalesOrderHeader where SalesPersonID = @personid; end; else if (@choice IN (4,5,6)) begin select top 1 @SalesOrderId = SalesOrderId from Sales.SalesOrderHeader with (nolock); select top 100 * from Sales.SalesOrderHeader where SalesOrderId > @SalesOrderId end end |
Hi Mike,
I have a question in regards to the data size of the tables. How bif is your table under consideration. I have a db which is abt 250 GB in which my tables are 20 GB in size with 2 NC indexes 10 Gb each. My thought process in using this approach was that I have to keep double this space (one for original,one for staging) to make this approach happen. Am i correct or am I missing something?
Comment by Lakshmi — June 21, 2016 @ 3:27 pm
Hi Lakshmi,
That’s exactly the ballpark of sizes that I’d be considering. You’re not missing anything.
I wish you good luck, it’s a difficult script to write. Remember to look at part 5 of this series. Part five includes a strategy that may simplify migration tasks like this.
Michael
Comment by Michael J. Swart — June 21, 2016 @ 7:47 pm
Than you Mike for replying quickly, Yes I looked at the schema swap and indeed it is difficult to do this ,but will slowly get there I hope.
Comment by Lakshmi — June 23, 2016 @ 10:11 am