Michael J. Swart

January 23, 2014

Generating Concurrent Activity

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:21 am
Testing Concurrency

I’ve discovered that DBAs use different methods to accomplish what amounts to the same thing: Generating concurrent activity on a server. I want to explore a number of methods I’ve seen. For each method, I want to call one particular procedure, many times at once, and often… Oh, and for free.

Why are we doing this?

  • For Testing: The whole point is to generate concurrent activity on the server. Testing with this kind of activity is one of the only ways to avoid resource contention issues.
  • For Demos: Concurrency issues are so common and so varied, that it’s not surprising how often we see demos that depend on this kind of generated database activity.
  • For Troubleshooting: This is my favorite reason. A concurrency issue can’t be fixed well unless it can be reproduced reliably on some test environment. That’s why the methods below have a spot on my troubleshooting toolbelt.

The whole idea is to get many workers active on SQL Server at once.

The exocomps were smart enough to exit on their own

For each method below, look out for the place where I specify these “parameters”

  • Number of virtual workers (50 in my example)
  • Number of iterations (for instance 10,000)
  • Connection string
  • Procedure to execute (s_DoSomething in my case)

One other thing to watch for is the overhead that this tool puts on the machine. Ideally, the method is suitable to run this from the same test environment that the SQL Server sits on. So I want my method to be fairly lightweight. This means that it’s best to handle iterations on SQL Server which cuts down on overhead associated with opening connections. So in most cases, instead of

exec dbo.s_DoSomething

I have

declare @i int = 0; while (@i < 10000) begin exec dbo.s_DoSomething; set @i+= 1; end

Notepad + DOS Method

I adapted this method from a clever trick I saw once. It was Paul Randal giving a demo on tempdb contention.  You can find a video of that demo by visiting this newsletter.
It’s a simple idea. You have two batch files, the first is called Run.bat:

echo off
sqlcmd -S MYSERVER\MYINSTANCE -E -Q "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set @i+= 1; end" > NUL
exit

and the second is called Run50.bat:

start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat
start run.bat

Click Run50.bat and boom: instant parallel activity. Now it is a little sloppy – it opens up 50 command windows – but it does the trick pretty nicely.

Using PowerShell

We should be able to do this in one line right? Yep. Adapted from a post from Kendra Little I came up with this:

foreach($worker in 1..50) `
{ `
  Start-Job `
    -ScriptBlock `
      { `
        Import-Module sqlps; `
        Invoke-Sqlcmd `
          -Query "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set @i+= 1; end" `
          -ServerInstance "MYSERVER\MYINSTANCE" `
          | Out-Null `
      } `
}

Except that I don’t think I got it quite right. Whatever I changed makes this method unsuitable.

The script schedules a job 50 times, but it takes about a minute just to schedule them all. Once they’re scheduled, the jobs take time to start and not enough of them work in parallel to generate the needed concurrent activity so I give this method a “skip”. If you’re really comfortable with powershell, maybe you can get this to work faster. If you can, let me know.

C#

This is my preferred method. It started out as the program I used to test upsert concurrency at http://michaeljswart.com/go/upsert but a friend at work showed me that .net 4.5 has some nifty new asynchronous methods that make it look nicer, perform faster and weigh lighter.

using System;
using System.Linq;
using System.Data.SqlClient;
using System.Threading.Tasks;
 
namespace AConsoleApplication {
    class Program {
 
        static void Main( string[] args ) {
 
            var cs = new SqlConnectionStringBuilder();
            cs.DataSource = @"MYSERVER\MYINSTANCE";
            cs.InitialCatalog = "tempdb";
            cs.IntegratedSecurity = true;
            cs.AsynchronousProcessing = true;
 
            string connectionString = cs.ToString();
 
            Task[] tasks = Enumerable.Range( 0, 50 )
                .Select( i => RunCommand( connectionString ) )
                .ToArray();
 
            Task.WaitAll( tasks );
        }
 
        private static async Task RunCommand(
                string connectionString
            ) {
 
            string sql = @"s_DoSomething";
 
            SqlConnection conn = new SqlConnection( connectionString );
            conn.Open();
 
            SqlCommand cmd = new SqlCommand( sql, conn );
            cmd.CommandType = System.Data.CommandType.StoredProcedure;
 
            for( int i = 0; i < 10000; i++ ) {
                try {
                    await cmd.ExecuteNonQueryAsync();    
                } catch( Exception ex ) {
                    Console.WriteLine( ex.Message );
                }
            }
        }
    }
}

HammerDB

HammerDB (originally HammerOra) is a free utility that allows users to run benchmarks against various environments. Although it was originally built for running benchmarks on Oracle, the utility now works on Windows and for SQL Server (hence the name change). I was first introduced to the utility via Kendra Little (again):

Follow these links to learn how to use the tool for its typical function, running benchmarks. Then, once you know how to do that, it’s a quick step to repurpose the tool for your own concurrent activity. For example, replace HammerDB’s generated script with this one:

#!/usr/local/bin/tclsh8.6
package require tclodbc 2.5.1
database connect odbc "DRIVER=SQL Server Native Client 11.0;SERVER=MYSERVER\\MYINSTANCE;PORT=1433;TRUSTED_CONNECTION=YES";
odbc "set nocount on; declare @i int = 0; while (@i < 10000) begin exec tempdb.dbo.s_DoSomething; set @i+= 1; end"
odbc disconnect

It’s steady and lightweight and works really well.

SQLQueryStress

SQLQueryStress is a tool written by Adam Machanic that you can download and install for free.

SQLQueryStress

It’s fairly idiot proof and avoids a lot of the complexity of the other methods. For my own purposes, I want the utility to be lightweight. So I …

  • remember to turn off “Collect I/O Statistics”
  • remember to turn off “Collect Time Statistics”
  • Set “Number of iterations” to one.
  • Iterate in the query window, i.e. modify the query to call s_DoSomething 10,000 times.

My Ranking

Concurrent Activity Method Stars Notes
C# console app ★★★★½ It performs fastest and uses the least resources. Also, because I’m biased, it gets a small bump by not suffering from NIH issues.
HammerDB ★★★ Very fast (2nd fastest), but the interface is clumsy and modifications need tclsh experience <yuck> It’s best to use this for its intended purpose, as a benchmark tool.
Notepad and DOS ★★★ Quick and dirty and really fast. However, it’s still pretty clumsy. Interrupting a test part way through is difficult.
Powershell ★½ Yuck. I couldn’t get two runs that looked the same and it was near impossible to sort powershell cpu and memory pressure from SQL Server pressure.
SQLQueryStress ★★★★ It does the job well. It was a little difficult to interrupt a test. It also takes care to make it a lightweight framework.

Next Week

I’ll show some typical ways I define s_DoSomething.

January 6, 2014

SQL Simplicity Methods

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:00 am

Takeaway: Keep your database queries simple. Simple for you, simple for others and simple for SQL Server.

This isn’t a controversial topic. It’s widely accepted that simplicity is a virtue. That’s the message, I hope to give some methods and motivation.

*mwah*

Write Simple SQL For Yourself

Revisit old code from time to time and write code that you won’t mind revisiting.

All SQL Developers are translators. We translate English descriptions of what we want into SQL. Simpler queries mean less mental energy is required when “translating” these requirements into SQL. It also goes the other way. If I were write some SQL and then revisit it a year later, then I have to translate backwards in order to understand the query’s “intent”.

What? you never revisit old code? Whether it’s code or blog articles, when I look back at what I wrote two short years ago, I’m often not happy with what I read. I sometimes cringe a bit at what I’ve written and will often update old blog posts that need it. SQL is no exception. But reviewing old writing is a useful exercise. Joan Didion, a favorite author of mine, points out “We are well advised to keep on nodding terms with the people we used to be.”

Write Simple SQL For Others

Try organizing complicated queries using CTEs. It helps others understand them.

Simpler SQL doesn’t need to be shorter SQL. Jeremiah Peschka pointed me at a blog post by Selena Deckelmann How I Write Queries Using PLSQL: CTEs
Selena describes (better than I can) how useful CTEs can be when developing SQL. The CTEs provide a way to construct “a set of understandable ‘paragraphs’ of SQL” which can be used to “explain the logic of the query with others”.

Now that is how you do self-documenting code with SQL. When written with CTEs, your SQL will be much clearer than when you use subqueries. But don’t take my word for it. Check out a brilliant example in Brad Schulz’s “Playing Poker With SQL“. In the blog post he develops a single query which reports the results of a 10 hand poker game. Here’s an abbreviated version of his final query:

with DeckOfCards as
(. . .)
,ShuffleAndDeal as
(. . .)
,HandEvaluation1 as
(. . .)
. . .
,HandEvaluation7 as
(. . .)
select PlayerID
      ,Hand=Card1+' '+Card2+' '+Card3+' '+Card4+' '+Card5
      ,HandDescript
from HandEvaluation7
pivot (max(CardName) for CardSeqName in (Card1,Card2,Card3,Card4,Card5)) P
order by PlayerRanking
/*
PlayerID  Hand            HandDescript
--------  --------------- ---------------
       3  2♠ 7♠ 8♠ 9♠ 10♠ Flush
       1  5♥ 5♣ 5♠ 6♣ K♣  Three of a Kind
       5  2♥ 2♦ 6♥ K♠ K♦  Two Pair
       4  6♠ 10♦ Q♠ Q♥ K♥ Two of a Kind
       6  4♦ 7♣ 7♦ 8♥ J♦  Two of a Kind
       2  2♣ 3♣ 3♠ 9♥ J♣  Two of a Kind
       9  5♦ 9♦ J♥ Q♦ A♦  Nothing
       7  3♥ 4♣ 6♦ 10♥ A♠ Nothing
       8  3♦ 4♥ 8♣ 9♣ A♣  Nothing
      10  4♠ 8♦ 10♣ J♠ Q♣ Nothing
*/

In his post, Brad develops his query incrementally using CTEs. It accomplishes something very complicated, but it seems simple. I use it as a model for how to organize complex queries (when I must).

Write Simple SQL For SQL Server

Shorter SQL performs better. Consider breaking larger queries up into smaller chunks.

Why?

  • Fewer joins means the optimizer has fewer query plans to evaluate. And that means superior (or even optimal) plans.
  • Larger query trees can mean less effective cardinality estimates. With less effective estimates, inferior plans can be chosen resulting in poor performance behaviors like excessive CPU, excessive IO or tempdb spills.

A better and more in depth explanation by Paul White here in an answer to a SqlPerformance.com question.

Read his tips. They’re very well explained. After his explanation, he mentions a strategy to deal with complex queries. He writes “My usual approach is to break the query into manageable parts, storing reasonably-sized intermediate results in #temporary tables.” I want to show an example demonstrating how something like that could work.

I use a query I made up against Adventureworks2012 which, for blogging purposes, we call complicated:

USE Adventureworks2012
SELECT 
    p.BusinessEntityID
    ,p.FirstName
    ,p.LastName
    ,e.JobTitle  
    ,pp.PhoneNumber
    ,pnt.Name AS PhoneNumberType
    ,a.AddressLine1
    ,a.AddressLine2
    ,a.City
    ,sp.Name AS StateProvinceName 
    ,a.PostalCode
    ,cr.Name AS CountryRegionName 
    ,cust.Purchases
    ,sale.Sales as SalesCount
FROM Person.Person p
LEFT JOIN HumanResources.Employee e
    ON p.BusinessEntityID = e.BusinessEntityID 
LEFT JOIN Person.BusinessEntityAddress bea 
    JOIN Person.[Address] a 
        ON a.AddressID = bea.AddressID
    JOIN Person.StateProvince sp 
        ON sp.StateProvinceID = a.StateProvinceID
    JOIN Person.CountryRegion cr 
        ON cr.CountryRegionCode = sp.CountryRegionCode
    ON bea.BusinessEntityID = p.BusinessEntityID 
LEFT JOIN Person.PersonPhone pp
    JOIN Person.PhoneNumberType pnt
        ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID
    ON pp.BusinessEntityID = p.BusinessEntityID
LEFT JOIN 
    (
        SELECT COUNT(1), c.PersonID
        FROM Sales.SalesOrderHeader soh
        JOIN Sales.Customer c
            ON c.CustomerID = soh.CustomerID
        GROUP BY c.PersonID
    ) as cust(Purchases, PersonID)
    ON p.BusinessEntityID = cust.PersonID
LEFT JOIN 
    (
        SELECT COUNT(1), SalesPersonID
        FROM Sales.SalesOrderHeader
        GROUP BY SalesPersonID
    ) as sale(Sales, PersonId)
    ON p.BusinessEntityID = sale.PersonId
WHERE p.FirstName = 'Michael'

Most people’s intuition is that a single query is preferable. But just like Paul White, I have found that performance can sometimes be improved when the work is split into many queries. Here’s an example of what that might look like:

use AdventureWorks2012
 
CREATE TABLE #Results (
    BusinessEntityID int,
    FirstName nvarchar(50),
    LastName nvarchar(50),
    JobTitle nvarchar(50),
    PhoneNumber nvarchar(25),
    PhoneNumberType nvarchar(50),
    AddressLine1 nvarchar(60),
    AddressLine2 nvarchar(60),
    City nvarchar(30),
    StateProvinceName nvarchar(50),
    PostalCode nvarchar(15),
    CountryRegionName nvarchar(50),
    Purchases int,
    SalesCount int 
);
 
INSERT #Results (BusinessEntityID, FirstName, LastName)
SELECT BusinessEntityID, FirstName, LastName
FROM Person.Person
WHERE FirstName = 'Michael';
 
UPDATE #Results
SET JobTitle = e.JobTitle
FROM #Results r
JOIN HumanResources.Employee e
    on r.BusinessEntityID = e.BusinessEntityID;
 
UPDATE #Results
SET AddressLine1 = a.AddressLine1,
    AddressLine2 = a.AddressLine2,
    City = a.City,
    StateProvinceName = sp.Name,
    PostalCode = a.PostalCode,
    CountryRegionName = cr.Name
FROM #Results r
JOIN Person.BusinessEntityAddress bea 
    ON bea.BusinessEntityID = r.BusinessEntityID 
JOIN Person.[Address] a 
    ON a.AddressID = bea.AddressID
JOIN Person.StateProvince sp 
    ON sp.StateProvinceID = a.StateProvinceID
JOIN Person.CountryRegion cr 
    ON cr.CountryRegionCode = sp.CountryRegionCode;
 
UPDATE #Results
SET PhoneNumber = pp.PhoneNumber,
    PhoneNumberType = pnt.Name
FROM #Results r
JOIN Person.PersonPhone pp
    ON pp.BusinessEntityID = r.BusinessEntityID
JOIN Person.PhoneNumberType pnt
    ON pp.PhoneNumberTypeID = pnt.PhoneNumberTypeID;
 
WITH cust (Purchases, PersonID) AS
(
    SELECT COUNT(1), c.PersonID
    FROM Sales.SalesOrderHeader soh
    JOIN Sales.Customer c
        ON c.CustomerID = soh.CustomerID
    GROUP BY c.PersonID
)
UPDATE #Results
SET Purchases=cust.Purchases
FROM #Results r
JOIN cust
    on cust.PersonID = r.BusinessEntityID;
 
WITH sale (SalesCount, PersonId) AS
(
    SELECT COUNT(1), soh.SalesPersonID
    FROM Sales.SalesOrderHeader soh
    GROUP BY soh.SalesPersonID
)
UPDATE #Results
SET SalesCount=sale.SalesCount
FROM #Results r
JOIN sale
    ON sale.PersonId = r.BusinessEntityID;
 
SELECT * FROM #Results;

When is this technique appropriate?

I like to use this performance technique before I consider query hints, (but after other simpler improvements like indexing). Even so, this technique is not always appropriate. I’ve seen it work best on complicated queries (How do you know when they’re complicated?). And I’ve seen this work best against large datasets (processing millions of rows for example).  Complicated queries have a higher risk of generating poor query plans. But breaking these huge queries into smaller parts addresses this problem. 

In my example, I’ve split the original query into seven. That was just for illustration. Maybe better is splitting your monster query into only three queries. Always test.

Empirical evidence tells me that simpler SQL performs better. I’ve split up complicated queries and often they become much easier to maintain but almost as often I’ve never needed to!

Really?

I’ve added a few comments below that qualify some of the things I’ve said here.

December 4, 2013

Overly Complex Views, Procedures And Functions

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 4:03 pm

Takeaway: I define a metric which indicates a code-smell for overly complex views, procedures and functions: “FROM” count.

In the programming world, developers have defined metrics for how complex a piece of code is. For example, Visual Studio defines several metrics that are meant to give developers some idea about how complex their code is getting. These measures won’t be perfect because code complexity is not something that can be measured directly. But many think that these measures indicate complexity often enough to be useful.

Code Smells

That’s what a code smell is meant for. My own definition is that a code smell is an anti-rule-of-thumb <cough>cursors!</cough>. It doesn’t necessarily indicate something’s rotten, but like a bad smell, it’s worth checking into.

Measuring Complexity in SQL Modules

And here’s my idea about measuring complexity for SQL. It’s based on the premise that complex procedures or views will have many queries/subqueries and that most queries have FROM clauses. So what are my procedures/views/functions that may be too complex? Let’s find out:

DECLARE @Odor NVARCHAR(30) = N'FROM';
 
with
   L0 as (select 1 as C union all select 1)       --2 rows
  ,L1 as (select 1 as C from L0 as A, L0 as B)    --4 rows
  ,L2 as (select 1 as C from L1 as A, L1 as B)    --16 rows
  ,L3 as (select 1 as C from L2 as A, L2 as B)    --256 rows
  ,L4 as (select 1 as C from L3 as A, L3 as B)    --65536 rows
  ,Nums as (select row_number() over (order by (select 0)) as N from L4)
SELECT 
    OBJECT_SCHEMA_NAME(m.object_id) as SchemaName,
    OBJECT_NAME(m.object_id) as ObjectName, 
    count(1) as OdorCount
FROM Nums
CROSS JOIN sys.sql_modules m
WHERE Nums.N < LEN(m.definition)
    AND SUBSTRING(m.definition, Nums.N, LEN(@Odor)) = @Odor
GROUP BY m.object_id
ORDER BY count(1) desc, object_name(m.object_id) asc

I’ve found the metric slightly better than the simpler “procedure size” metric:

SELECT 
    OBJECT_SCHEMA_NAME(m.object_id) as SchemaName,
    OBJECT_NAME(m.object_id) as ObjectName, 
    LEN(m.definition) as ModuleSize
FROM sys.sql_modules m
ORDER BY LEN(m.definition) desc, object_name(m.object_id) asc

Try it out on your own environments and let me know if it identifies the monster procedures you know are lurking in your database.

Thanks to Adam Machanic for the substring counting syntax.

Performance Improvement!

Update December 5, 2013: In the comments, George Mastros provided a simpler and faster version of this query which does the same thing:

DECLARE @Odor NVARCHAR(30) = N'FROM';
 
Select	OBJECT_SCHEMA_NAME(object_id) As SchemaName, 
        OBJECT_NAME(object_id) As ObjectName, 
        (DataLength(definition) - DataLength(Replace(definition, @Odor, ''))) / DataLength(@Odor) As OdorCount
From	sys.sql_modules
Order By OdorCount DESC;

This is much simpler and much faster. What’s extra interesting is that George’s query itself has an odor count of 2 while my original one had a count of 7. Thanks so much George!

October 30, 2013

“You’ve got access, but I’m not sure why”

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 9:00 am

Takeaway: Use “xp_logininfo” when curious about how and why someone has access to a database.

I had this I.M. conversation with a colleague at work yesterday.

Me: What was that server name again?
Stephen: It was SQL001
Me: Thanks I’m in.
Stephen: eyeballing security… I don’t see you on that list so if you do actually have access I’m confused

To be honest, I didn’t know why I had access either. Normally, to examine access, I look at security logins and look for users or groups that might match my own credentials. I do that here in SQL Server Management Studio’s Object Explorer:

Looking At Logins

The way I used to look at logins

In this case it’s not immediately obvious why I have access and what level of access I have.

Enter xp_logininfo

So I had this exchange with Allen Kinsel (@AllenKinsel) (via #sqlhelp on twitter):

Allen Kinsel saves the day

xp_logininfo works beautifully for what I wanted to do.

It’s a command that’s too obscure. I think xp_loginfo should be more widely known (hence this blog post). And when I run:

exec xp_logininfo 'MYDOMAIN\mswart', 'all'

I get:

account name type privilege mapped login name permission path
MYDOMAIN\mswart user user MYDOMAIN\mswart MYDOMAIN\SupportGroup
MYDOMAIN\mswart user admin MYDOMAIN\mswart BUILTIN\Administrators

This tells me exactly what kind of access I have and why. In this case, it looks like I have access for two reasons. First, I have access because I belong to the support group and the other reason is because I’m part of the “BUILTIN/Administrators” group. This is so much better than eyeballing a list of logins.

xp_logininfo: It’s worth remembering.

September 19, 2013

My Queries For Top 20 Resource Hogs

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Takeaway: This post is for me. This post is a handy place for me to put some queries that I use often. If you find them useful too, that’s wonderful, bookmark this page with http://michaeljswart.com/go/Top20.

These queries will give the top 20 resource consumers for cached queries based on a few different metrics. I’ve posted queries like this before, and others have written many other versions of this query. All these queries are based on sys.dm_exec_query_stats.

But I’ve tweaked my own queries recently based on a couple things I’ve learned. So you could say that I gave my old queries a new coat of paint.

Here what I’ve added to these queries recently:

  • I added OPTION (RECOMPILE) to the query. It prevents these queries from showing up in their own results on quiet systems.
  • Joe Sack pointed out that query_hash can be used to aggregate queries that only differ by literal values. I’m doing that here. What’s extra awesome about this technique, is that it can also be used to aggregate the same queries that were executed in different procedures or in different databases. This is critical if we want to measure the impact of a single query on a server regardless of where the query is called.
  • I’m ignoring (for now) those queries with query_hash equal to 0×0. Again, Joe Sack points out that these are cursor fetches.
  • You may notice that the database name is sometimes null. It’s null when the query is not part of a stored procedure. I suppose if I was keen, I could pick out a arbitrary database name from the accompanying query plan.
  • Update September 27, 2013: Actually, In the comment section of this post, Ian Stirk gave me a really nice way to retrieve the database name from the dmvs. I’ve updated the queries below accordingly.
  • Remember that the stats found in sys.dm_exec_query_stats are only as good as what’s in cache. The cache is not a permanent store. It changes and it gets reset every server restart.

(Download all queries)

Or check them out individually below:

Top 20 Executed Queries

These are queries that run often. Frequent queries can be vulnerable to concurrency problems.

;with frequent_queries as
(
    select top 20 
        query_hash, 
        sum(execution_count) executions
    from sys.dm_exec_query_stats 
    where query_hash <> 0x0
    group by query_hash
    order by sum(execution_count) desc
)
select @@servername as server_name,
    coalesce(db_name(st.dbid), db_name(cast(pa.value AS INT)), 'Resource') AS [DatabaseName],
    coalesce(object_name(ST.objectid, ST.dbid), '<none>') as [object_name],
    qs.query_hash,
    qs.execution_count,
    executions as total_executions_for_query,
    SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
        (CASE 
            WHEN QS.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
            ELSE QS.statement_end_offset
            END - QS.statement_start_offset) / 2) as sql_text,
    qp.query_plan
from sys.dm_exec_query_stats qs
join frequent_queries fq
    on fq.query_hash = qs.query_hash
cross apply sys.dm_exec_sql_text(qs.sql_handle) st
cross apply sys.dm_exec_query_plan (qs.plan_handle) qp
outer apply sys.dm_exec_plan_attributes(qs.plan_handle) pa
where pa.attribute = 'dbid'
order by fq.executions desc,
    fq.query_hash,
    qs.execution_count desc
option (recompile)

Top 20 I/O Consumers

Specifically logical reads and writes. Still my favorite metric.

;with high_io_queries as
(
    select top 20 
        query_hash, 
        sum(total_logical_reads + total_logical_writes) io
    from sys.dm_exec_query_stats 
    where query_hash <> 0x0
    group by query_hash
    order by sum(total_logical_reads + total_logical_writes) desc
)
select @@servername as servername,
    coalesce(db_name(st.dbid), db_name(cast(pa.value AS INT)), 'Resource') AS [DatabaseName],
    coalesce(object_name(ST.objectid, ST.dbid), '<none>') as [object_name],
    qs.query_hash,
    qs.total_logical_reads + total_logical_writes as total_io,
    qs.execution_count,
    cast((total_logical_reads + total_logical_writes) / (execution_count + 0.0) as money) as average_io,
    io as total_io_for_query,
    SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
        (CASE 
            WHEN QS.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
            ELSE QS.statement_end_offset
            END - QS.statement_start_offset) / 2) as sql_text,
    qp.query_plan
from sys.dm_exec_query_stats qs
join high_io_queries fq
    on fq.query_hash = qs.query_hash
cross apply sys.dm_exec_sql_text(qs.sql_handle) st
cross apply sys.dm_exec_query_plan (qs.plan_handle) qp
outer apply sys.dm_exec_plan_attributes(qs.plan_handle) pa
where pa.attribute = 'dbid'
order by fq.io desc,
    fq.query_hash,
    qs.total_logical_reads + total_logical_writes desc
option (recompile)

Top 20 CPU Consumers

Another popular metric for measuring work done.

;with high_cpu_queries as
(
    select top 20 
        query_hash, 
        sum(total_worker_time) cpuTime
    from sys.dm_exec_query_stats 
    where query_hash <> 0x0
    group by query_hash
    order by sum(total_worker_time) desc
)
select @@servername as server_name,
    coalesce(db_name(st.dbid), db_name(cast(pa.value AS INT)), 'Resource') AS [DatabaseName],
    coalesce(object_name(ST.objectid, ST.dbid), '<none>') as [object_name],
    qs.query_hash,
    qs.total_worker_time as cpu_time,
    qs.execution_count,
    cast(total_worker_time / (execution_count + 0.0) as money) as average_CPU_in_microseconds,
    cpuTime as total_cpu_for_query,
    SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
        (CASE 
            WHEN QS.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
            ELSE QS.statement_end_offset
            END - QS.statement_start_offset) / 2) as sql_text,
    qp.query_plan
from sys.dm_exec_query_stats qs
join high_cpu_queries hcq
    on hcq.query_hash = qs.query_hash
cross apply sys.dm_exec_sql_text(qs.sql_handle) st
cross apply sys.dm_exec_query_plan (qs.plan_handle) qp
outer apply sys.dm_exec_plan_attributes(qs.plan_handle) pa
where pa.attribute = 'dbid'
order by hcq.cpuTime desc,
    hcq.query_hash,
    qs.total_worker_time desc
option (recompile)

Top 20 Queries By Elapsed Time

Results found here, but not in the results of the other queries, usually suffer from things like excessive blocking or ASYNC_NETWORK_IO.

;with long_queries as
(
    select top 20 
        query_hash, 
        sum(total_elapsed_time) elapsed_time
    from sys.dm_exec_query_stats 
    where query_hash <> 0x0
    group by query_hash
    order by sum(total_elapsed_time) desc
)
select @@servername as server_name,
    coalesce(db_name(st.dbid), db_name(cast(pa.value AS INT)), 'Resource') AS [DatabaseName],
    coalesce(object_name(ST.objectid, ST.dbid), '<none>') as [object_name],
    qs.query_hash,
    qs.total_elapsed_time,
    qs.execution_count,
    cast(total_elapsed_time / (execution_count + 0.0) as money) as average_duration_in_microseconds,
    elapsed_time as total_elapsed_time_for_query,
    SUBSTRING(ST.TEXT,(QS.statement_start_offset + 2) / 2,
        (CASE 
            WHEN QS.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX),ST.text)) * 2
            ELSE QS.statement_end_offset
            END - QS.statement_start_offset) / 2) as sql_text,
    qp.query_plan
from sys.dm_exec_query_stats qs
join long_queries lq
    on lq.query_hash = qs.query_hash
cross apply sys.dm_exec_sql_text(qs.sql_handle) st
cross apply sys.dm_exec_query_plan (qs.plan_handle) qp
outer apply sys.dm_exec_plan_attributes(qs.plan_handle) pa
where pa.attribute = 'dbid'
order by lq.elapsed_time desc,
    lq.query_hash,
    qs.total_elapsed_time desc
option (recompile)

By the way. This post seems to be really popular. So I waive any copyright I have on these four queries. Copy them without attribution wherever you like. Profit if you can. Go nuts.

September 11, 2013

Three More Tricky Tempdb Lessons

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Takeaway: Tempdb contention is a common DBA problem. I want to share three lesser known surprises about everyone’s favorite bottleneck.

If my professional life was made into a series, tempdb contention would have a recurring role as the antagonist. It provides challenges on a regular basis and I finally believe I understand it’s behavior well enough to have contention licked for good. But even recently, tempdb still holds some surprises for me.

You Can Still Get SGAM Contention After TF1118

You may know what trace flag TF1118 does. It affects the allocation of pages in all databases. But it’s most often used for its effect in tempdb. The idea is that TF1118 will cause SQL Server to store all table data in dedicated extents rather than mixed extents. Without mixed extents, the need to access SGAM pages is reduced and that leads to less contention as described in this KB article.

But the use of SGAM isn’t reduced to zero. What?! How can that be? Who’s still using mixed extents? It turns out that IAM pages are always stored on mixed extents. Every time any object is created (or destroyed) that will require at least one latch on an SGAM page. On an extremely busy system, you can still see contention here.

Equally Sized Data Files Can Become Unbalanced After Autogrow

That’s not the surprising thing. The surprising thing is that it can impact the effectiveness of having multiple data files and it needs to be kept in mind. Once again from Microsoft:

If data files [have] unequal sizes, the proportional fill algorithm tries to use the largest file more for GAM allocations instead of spreading the allocations between all the files.

This makes multiple data files less effective than it could be when tackling contention. But don’t disable auto-grow, just size your tempdb data files large enough (Tip: google “instant file initialization”)

Before we move on, if your tempdb data files have auto-grown, do you know what the initial sizes were?

Tempdb Contention Can Be Sudden and Catastrophic

I hadn’t seen this before. Most contention I’m familiar with looks like a “sluggish” database. The response times on particular queries just aren’t what they ought to be – getting gradually worse until the impact is felt hard. But just recently I have seen tempdb contention show up all-of-a-sudden making the database nearly useless. Especially after excessive blocking kicks in. Here’s how that can happen:

  • Imagine a super-frequent procedure creates a temp table.
  • Several of these temp tables are cached. (There can be several execution contexts for the procedure and so several temp tables).
  • A table used in the procedure has enough modified rows to have it’s statistics auto-updated.
  • The procedure’s query plan is invalidated and is dropped from the cache.
  • The cached temp tables that were used for the procedure are marked to be destroyed
  • The procedure is recompiled and new temp tables are created
  • A system process destroys the marked temp tables that need to be cleaned up

With a very busy system, these things can happen in only a few seconds. In fact those two last items describe actions where hundreds of threads can get into a fight over allocation pages.

The following query may give a list of query plans for frequently called procedures as described:

with frequentSprocs as 
(
    select top 10 count(1) as [memory objects], cp.plan_handle from sys.dm_exec_cached_plans cp
    cross apply sys.dm_exec_cached_plan_dependent_objects(cp.plan_handle) do
    join sys.dm_os_memory_objects mo
        on do.memory_object_address = mo.memory_object_address
    where objtype = 'Proc'
    group by cp.plan_handle
    order by 1 desc
)
select fs.*, qp.query_plan
from frequentSprocs fs
cross apply sys.dm_exec_query_plan(fs.plan_handle) qp
option (recompile)

What Next?

I’m lucky to work with software that runs on systems that are crazy powerful and crazy busy. It’s unlikely that you’ll run into tempdb problems in the exact same way that we did. So the typical advice for PFS/GAM/SGAM contention is still valid:

  • Enable trace flag TF 1118
  • Create multiple (equally sized) data files for tempdb
  • Reduce how frequently you use temp tables and table variables

 

August 15, 2013

A Tuning Technique For Parallel Queries

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

Takeaway: I describe a technique I use to tune parallel queries. I tune the serial version of the query using MAXDOP 1. It’s a simpler query plan. Faster understanding leads to faster solutions.

In a parallel universe, I'm blogging about cartography and projection mapping.

How better to explain than by example!

The Example

For this example, I use AdventureWorks2012 and I beef up the sales order tables using Jonathan Kehayias’s script Enlarging the AdventureWorks Sample Databases. For my example, I want to know which salesperson sold the most red items in their last ten orders (looking at orders with at least one red item).

My first crack at the query gives me this:

SELECT P.FirstName, P.LastName, SUM(OrderQty) Items
FROM Person.Person P
CROSS APPLY (
    SELECT TOP 10 SUM(OrderQty) as OrderQty, H.SalesOrderID
    FROM Sales.SalesOrderDetailEnlarged D
    JOIN Sales.SalesOrderHeaderEnlarged H
        ON D.SalesOrderID = H.SalesOrderID
    JOIN Production.Product Pr
        ON D.ProductID = Pr.ProductID
    WHERE H.SalesPersonID = P.BusinessEntityID
        AND Pr.Color = 'red'
    GROUP BY H.SalesOrderID, H.ShipDate
    ORDER BY H.ShipDate DESC
) AS LastTenOrdersWithRedItems
GROUP BY P.BusinessEntityID, P.FirstName, P.LastName
ORDER BY SUM(OrderQty) DESC

Great! That query gives me the Adventureworks salespeople and the number of red items they’ve sold in their last ten orders. But the execution time was a bit slow and so I decide to tune the query. The query plan looks like this:
The query plan for the previous query

There’s a couple things that jump out at me right away: Some of the thick lines tell me that there’s a lot of data getting processed. There’s also a warning that tells me that tempdb was used for a sorting operation. And in general, I notice that there are a lot of operators here including parallelism operators.

So now I’m going to try to look at a serial query plan for the same query. I have to tell SQL Server that the maximum degree of parallelism it should consider is one thread. That’s done with the MAXDOP query hint like this:

SELECT P.FirstName, P.LastName, SUM(OrderQty) Items
FROM Person.Person P
CROSS APPLY (
    SELECT TOP 10 SUM(OrderQty) as OrderQty, H.SalesOrderID
    FROM Sales.SalesOrderDetailEnlarged D
    JOIN Sales.SalesOrderHeaderEnlarged H
        ON D.SalesOrderID = H.SalesOrderID
    JOIN Production.Product Pr
         ON D.ProductID = Pr.ProductID
    WHERE H.SalesPersonID = P.BusinessEntityID
        AND Pr.Color = 'red'
    GROUP BY H.SalesOrderID, H.ShipDate
    ORDER BY H.ShipDate DESC
) AS LastTenOrders
GROUP BY P.BusinessEntityID, P.FirstName, P.LastName
ORDER BY SUM(OrderQty) DESC
OPTION (MAXDOP 1) -- this line has been added

The hint is in there, but it’s not meant to be a permanent thing. I put it in there in order to look at the serial version of the query plan. Sure enough, it looks simpler and it’s easier to examine:

The query plan for the previous query

The information is still the same, but to me it’s a little clearer where the problem is. I see that SalesOrderHeader is scanned completely and then spooled (one iteration per person) and then sorted by date. I also see that there’s a lot of data coming from Person.Person.

So in this case, I may decide to add an index on SalesOrderHeader(SalesPersonId, ShipDate) like this:

CREATE INDEX IX_SalesOrderHeaderEnlarged_SalesPersonShipDate 
    ON Sales.SalesOrderHeaderEnlarged(SalesPersonId ASC, ShipDate DESC)

A Bonus Optimization
There’s another optimization I’d like to consider. Notice that all 19,972 rows in the Person.Person table are scanned. Well I can do better than that. I can join Person.Person to HumanResources.Employee because I know that all salespeople are employees. The join acts as a filter and it really cuts down on processing. The query now looks like:

SELECT P.FirstName, P.LastName, SUM(OrderQty) Items
FROM Person.Person P
JOIN HumanResources.Employee E
    ON P.BusinessEntityID = E.BusinessEntityID
CROSS APPLY (
    SELECT TOP 10 SUM(OrderQty) as OrderQty, H.SalesOrderID
    FROM Sales.SalesOrderDetailEnlarged D
    JOIN Sales.SalesOrderHeaderEnlarged H
        ON D.SalesOrderID = H.SalesOrderID
    JOIN Production.Product Pr
        ON D.ProductID = Pr.ProductID
    WHERE H.SalesPersonID = P.BusinessEntityID
        AND Pr.Color = 'red'
    GROUP BY H.SalesOrderID, H.ShipDate
    ORDER BY H.ShipDate DESC
) AS LastTenOrders
GROUP BY P.BusinessEntityID, P.FirstName, P.LastName
ORDER BY SUM(OrderQty) DESC

With the new index, this query performs about as well as we can hope. The new plan now looks like this:

The query plan for the previous query

Why this Works For Me

Adding MAXDOP 1 is a quick way to give me a serial version of the query plan and the serial version of the plan is simpler. This trick saves me time. Or it has often enough that it’s worthwhile to try.

Another reason this works for me is because the queries I write and maintain are often required to run sub-second. You might be in the same boat if you write your queries for high volume OLTP systems. This usually means that there’s a high standard for query performance. Acceptably tuned queries are fast enough that SQL Server will rarely decide to consider parallel query plans. And that means that it’s tempting to look for parallelism to find candidates for poorly tuned queries.

Okay, Here Come The Caveats

  • Reading the title, some will assume that I’m claiming a parallel query plan is a bad query plan. Absolutely not. Queries that are executed with parallel plans are queries that do a lot of work. That’s not bad in itself. Sometimes queries need to do a lot of work. The question that should be asked is whether these queries are doing too much work.
  • I’m not claiming MAXDOP 1 is a tuning technique in itself. It’s just a trick I use to look at a query from a different point of view. If your query is a thousand times faster after adding MAXDOP 1 and you don’t know why, then you’re not done yet.
  • Parallel execution plans are not dull or useless! They’re fascinating! I’ve learned that the mental effort spent following a Paul White post on parallelism always pays off.

July 3, 2013

It Took Me Ages To Figure This One Out: A Puzzle

Filed under: Miscelleaneous SQL,SQL Scripts,Technical Articles — Michael J. Swart @ 12:00 pm

Takeaway: I tackled a problem recently and I present a version of the problem here as a challenge. The challenge is to figure out why a query and a stored procedure return different results even though the procedure is defined using the same query.

If no one cracks this one, I’ll leave the answer in the comments in a couple days.

Too Easy

The Setup

Okay, I use AdventureWorks2012 here. I define a table data type IDList and a stored procedure s_GetCustomerProductPurchases:

use Adventureworks2012
go
 
create type IDList as TABLE(id int);
go
 
CREATE PROCEDURE dbo.s_GetCustomerProductPurchases 
	@CustomerIds IDList READONLY,
	@ProductIds IDList READONLY,
	@SearchString NVARCHAR(1000)
AS
SET NOCOUNT ON 
 
DECLARE @ResultList TABLE ( 
	ProductId INT, 
	ProductName Name 
	);
 
insert @ResultList (ProductId, ProductName)
select distinct TOP (2000) p.ProductID, p.Name
from sales.salesorderdetail sod
join sales.salesorderheader soh 
    on soh.SalesOrderID = sod.SalesOrderID
join Production.Product p
    on p.ProductID = sod.ProductID
join Sales.Customer c
    on c.CustomerID = soh.CustomerID
join @CustomerIds CIds
    on CIds.id = c.CustomerID
join @ProductIds PIds
	on PIds.id = p.ProductID
where p.Name like '%' + @SearchString + '%'
	and soh.[Status] = 5
 
IF	@@ROWCOUNT = 0 
	SELECT	0 ProductId, 'No Matching Products' ProductName
ELSE 
	SELECT	 ProductId, ProductName
	FROM	 @ResultList 
	ORDER BY ProductId

Executing the Query

Next lets run a query against Adventureworks2012. We define some variables that are used as filters in a query. This query populates a temporary results table and then we select from it:

DECLARE 
	@ProductIds IDList,
	@CustomerIds IDList,
	@SearchString Name = 'Red';
insert @ProductIds (id) VALUES (707), (717), (727)
insert @CustomerIds (id) VALUES (11000), (11103), (11201)
 
SET NOCOUNT ON 
 
DECLARE @ResultList TABLE ( 
	ProductId INT, 
	ProductName Name 
	);
 
insert @ResultList (ProductId, ProductName)
select distinct TOP (2000) p.ProductId, p.Name
from sales.salesorderdetail sod
join sales.salesorderheader soh 
    on soh.SalesOrderID = sod.SalesOrderID
join Production.Product p
    on p.ProductID = sod.ProductID
join Sales.Customer c
    on c.CustomerID = soh.CustomerID
join @CustomerIds CIds
    on CIds.id = c.CustomerID
join @ProductIds PIds
	on PIds.id = p.ProductID
where p.Name like '%' + @SearchString + '%'
	and soh.[Status] = 5
 
IF	@@ROWCOUNT = 0 
	SELECT	0 ProductId, 'No Matching Products' ProductName
ELSE 
	SELECT	 ProductId, ProductName
	FROM	 @ResultList 
	ORDER BY ProductId

These are the results I get:

ProductId ProductName
707 Sport-100 Helmet, Red

Executing the Procedure

Now run the procedure which contains the exact same query:

DECLARE 
	@ProductIds IDList,
	@CustomerIds IDList,
	@SearchString Name = 'Red';
insert @ProductIds (id) VALUES (707), (717), (727)
insert @CustomerIds (id) VALUES (11000), (11103), (11201)
 
exec dbo.s_GetCustomerProductPurchases @ProductIds, @CustomerIds, @SearchString

The results we get now look like:

ProductId ProductName
0 No Matching Products

So Why Are These Results Different?

That’s what I want you to figure out. Try to crack this one before looking at the comments.

SQL Fiddle

Have you used SQL Fiddle yet? It’s a site that allows SQL developers to set up different scenarios for use in online discussions. So if you don’t have Adventureworks2012 handy, I’ve setup up two SQLFiddles here and here. Your job then is to follow these two links and explain why the results are different.

April 17, 2013

The Sch-M lock is Evil

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

A necessary evil, but still evil. Why? Because it it won’t share with Sch-S and Sch-S is taken and held for absolutely everything (Yes, even your NOLOCK queries). And that can lead to some interesting concurrency problems. Let me explain.

Sch-M

Sch-M is an abbreviation for Schema Modification. It refers to a kind of lock that is taken on a table/index or other object whenever you want to modify that object. SQL Server allows only one Sch-M lock on an object at a time. So if you want to modify a table, your process waits to take a Sch-M lock on that table. Once that lock is granted, the modification is completed and then the lock is released.

Sch-S

Sch-S is an abbreviation for Schema Stability. It is a kind of lock that is taken on an object when a process doesn’t want that object to change its definition. It makes sense. If I’m reading a set of rows through a table, I don’t want a column to disappear on me half way through. SQL Server allows many Sch-S locks on a table.

Sch-S vs. Sch-M

But Sch-S locks are incompatible with Sch-M locks. This means that when you want to modify a table, you’re not granted a Sch-M lock immediately. You have to wait for everyone using that table to finish using it. You’re essentially put on hold until the existing queries complete and their existing Sch-S locks are released. This also means that while you’re waiting, every query who wants to begin using that table is waiting in line behind you too. Basically “Everybody outta the pool while the lifeguards change shifts.” But that’s usually acceptable right? Database schema modifications are a big enough change to require a maintenance window.

Index Rebuilds Are Table Modifications

It’s true, if you have the luxury of maintenance windows for your DB changes, you’ll be alright. But you also have to consider your database maintenance plans (automated or otherwise). Those plans can launch index rebuilds while the database is online. And all index rebuilds also count as table modifications and take Sch-M locks. An index rebuild has syntax like this:

ALTER INDEX [PK_MyTable] ON [MyTable] REBUILD WITH (ONLINE=ON)

Hopefully you’ve remembered that ONLINE=ON part. When you use that part, the index is rebuilt in the background and at the end of that processing time, a Sch-M lock is taken and released very quickly.

But maybe you’re not so lucky. Maybe you’re not running 2012 yet and have an index that includes blobs. Or maybe you’re running on Standard Edition. In those cases you won’t be able to use the ONLINE=ON feature. In that case, the Sch-M lock is taken by the rebuild process and it’s held the entire time that index is rebuilt. During the rebuild, that index is now truly offline. No access for you.

You Can Get Stuck

Just like I did. A while ago, I was asked to help with this exact situation. An index rebuild had been running for hours it was offline and the Sch-M lock that was held was preventing anybody from using or even looking at that table. I was stuck between a rock and a hard place. I had to choose between letting the index rebuild complete (which could take hours) or cancelling the job (whose rollback could take hours). There was nothing I could do to avoid additional hours of downtime. There was another bit of irony in my situation. We didn’t care about the existing data in that table. A truncate table or drop/recreate table would have suited us just fine.

… Like Really Stuck

It occurred to me to try something out. What if I created an identical empty table with a different name. We didn’t need any data in the locked table. So using a new table could work. And because the locked table is only accessed by stored procedures, I can modify those procedures to use the new table instead of the locked one.

Nope!

For some reason, the ALTER PROCEDURE requires a Sch-S lock on the old table, the table it no longer refers to. The sprocs can’t operate or be modified without a Sch-S lock on that locked table. This sketch illustrates my situation.

Make sure your index plans never attempt to rebuild big indexes offline.

Extras

Some relevant DBAReactions gifs:

April 3, 2013

Altering Text Columns: Only a Metadata Change?

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 8:00 am

Say you want to change the type of a text column using the ALTER TABLE … ALTER COLUMN syntax. It is valuable to know how much work SQL Server will have to do to fulfill your request. When your tables are large, it can mean the difference between a maintenance window that lasts five minutes, or one that lasts five hours or more.

I give a list of exactly when you’ll feel that pain and when you won’t.

A joke about char(max)

(BTW, CHAR(MAX) columns are impossible, you’ll get the gag if you figure out why)

When is the Whole Table Processed?

Here are conditions which require processing the entire table:

  • switching from unicode to non-unicode or vice versa.
  • changing a column from nullable to not nullable.
  • going from fixed length field to variable length field or vice versa.
  • decreasing the maximum length of a field.
  • increasing the maximum length of a fixed length field.
  • converting from limited length columns to unlimited or vice versa. (e.g. varchar(10) to varchar(max)).
  • collation modifications on non-unicode columns that change character set or code page. (See Collation Hell Part 3 by Dan Guzman)

On large tables, any of the above alterations will be a heavy hitter and will take time and fill transaction log (except that a shrink of fixed length fields seems to only require a scan).

What Changes are Metadata Only Changes?

That’s a lot of conditions! What’s alterations are left?

Not much:

  • Increasing the maximum length of a variable length column.
  • Changing the type from text to varchar(max).
  • Changing the type from ntext to nvarchar(max).
  • Any of the above while making a non-nullable field nullable.
  • Any of the above with a change in collation (with some big caveats, see Collation Hell Part 3 by Dan Guzman).

These changes are metadata only changes which means SQL Server doesn’t have to touch any of the actual data. So the size of the table will not impact the time it takes SQL Server to process the ALTER TABLE command. But see some notes about concurrency below.

Some Notes

Some notes about the above:

  • I ignored and make no claims about migrations where text or ntext is the target column type because the exceptions are strange and that scenario seems to fall under “Why would you want to do that?”
  • The above applies to only versions I’ve tested. Specifically 2008, and 2012.
  • The metadata-only changes I described above is not entirely on-line. There are still concurrency concerns to watch out for. These ALTER statements still request Schema modification (Sch-M) locks on the table, and once granted, only hold them briefly. But if you try to alter a column on a live environment and some long running query blocks your ALTER TABLE statement, then other queries that need access to the table will be blocked as well. 
  • Terms I used
    • fixed length: char(x), nchar(x)
    • variable length: varchar(x), nvarchar(x)
    • unlimited length: varchar(max), nvarchar(max), text, ntext
    • unicode: nchar(x), nvarchar(x), nvarchar(max), ntext
    • non-unicode: char(x), varchar(x), varchar(max), text
« Newer PostsOlder Posts »

Powered by WordPress