Michael J. Swart

December 18, 2012

A Grade School Data Project

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

This post is a story about my daughter’s school project with a plug for Wolfram Alpha and maybe Excel.

My daughter came to me the other day and asked me to help her with some homework… She said she needed data. She was learning about plotting graphs in math class and she was asked to bring in some data that she could plot. Learning, Math, Data. I felt pretty proud (Just as any parent might when kids take an interest in their profession.)

Personally, I thought this was great! My kid is asking me about data! At the very least, I could explain to her a little better what it is that I do at work. At best maybe she could “catch” what that I like about the topic.

The project itself was pretty straightforward. The students were asked to bring in a list of numbers that they could use in class to create a chart. The data could be about anything and there were extra marks for plotting two series instead of one.

What We Did

So I asked my daughter what she wanted to bring… it could be anything she was curious about. Eventually we:

  • We started at Wolfram Alpha
  • And we picked the local temperature for 2012.
  • For the extra series, I picked temperature for a location that I thought would contrast well with the local temperature. That location was Sydney Australia.
  • This led me to search Wolfram Alpha for 2012 temperature for Kitchener, Canada and Sydney, Australia
  • But we needed the data points, this led me to use the most beautiful and useful feature that Wolfram Alpha provides, Data Download:
  • A little hidden, but super-useful

    Pretty much the only feature analysts need

  • It was the work of a minute (using Excel) to summarize the average monthly temperatures for both cities. I got the data into the right shape, created a pivot table, and adjusted some of the filters.

After everything was done, the chart itself was interesting. As I expected, the average temperature (here) reaches its maximum in July and we saw the opposite trend in Australia. One surprising thing was that the temperatures varies very little in Sydney. It looks like 10°C (50 °F) is a very cold day in Australia. So the numbers confirmed what we guessed and showed us something new… Not bad.

Keeping away from SQL Server (For Now)

Not once did I even mention or use SQL Server or tables or databases. And I didn’t dwell on the Excel features. I just explained what we were looking at. In this case, I thought that it was better to show what the data tools could do before showing how they do it.

Missed Opportunitiy?

But I feel like I must have missed out on something here. Don’t you think so? Tracking temperature is kind of boring…  I felt like a carpenter who was trying to show off what could be done with a router but ending up with a birdhouse. What kind of data would you have picked? I tried to think of something that’s actually interesting or useful but came up empty and went with average temperature. If you have any better ideas, let me know.

December 5, 2012

Well That Wasn’t Obvious

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 12:00 pm

I’ve discovered that non-intuitive lessons stick in the brain the best. I share three examples that I either learned recently (or at least re-learned recently).

Uniqueifiers on Indexes That I Thought Shouldn’t Need Them

Do you remember what a Uniquifier is? SQL Server uses these hidden values on rows to keep non-clustered indexes in sync with their non-unique clustered indexes. Think of this questionable schema:

create table Actors
(
	ActorId int identity not null,
	LastName nvarchar(100) not null,
	FirstName nvarchar(100) not null,
	AgentId int null,
	primary key nonclustered (ActorId)
)
 
create clustered index IX_Actors
	on Actors(LastName, FirstName)

Then add some data:

insert Actors (LastName, FirstName, AgentId)
values ('Douglas', 'Michael', 3),
       ('Douglas', 'Michael', 4),
       ('Keaton', 'Michael', 5),
       ('Douglas', 'Kirk', 4)

I can demonstrate the uniquifier value here because of the non-unique clustered index:

select sys.fn_PhysLocFormatter (%%physloc%%) as [physical rid], *
from Actors -- in my case, this shows page 231
 
DBCC TRACEON (3604);
DECLARE @dbid int = DB_ID(); 
DBCC PAGE(@dbid, 1, 231, 3); 
 
/* showing (among other things):
...
Slot 1 Offset 0x60 Length 51
UNIQUIFIER = 0                       
LastName = Douglas                   
FirstName = Michael                  
ActorId = 1                          
AgentId = 3                          
KeyHashValue = (16035ff378a3)        
 
Slot 2 Offset 0x93 Length 55
UNIQUIFIER = 1                       
LastName = Douglas                   
FirstName = Michael                  
ActorId = 2                          
AgentId = 4                          
KeyHashValue = (16035e22af06)        
...
*/

Cool! Just like real life (See Michael Douglas (I – XXVI) at IMDB). This uniquifier value behaves very much like the roman numerals you see at IMDB.

Here’s the part that wasn’t obvious… what if I chose a clustered index whose columns include the primary key…

drop index IX_Actors ON Actors;
create clustered INDEX IX_Actors
	ON Actors(LastName, FirstName, ActorId)

That should be unique right? The index includes the primary key column (ActorId) that guarantees it. So SQL Server shouldn’t need the uniquifier right?

But it does! 

SQL Server doesn’t count on the index being unique unless you say so. I actually discovered the above while using Kendra Little’s sp_BlitzIndex, a tool so underexposed it’s pasty.

UPDATE Modifies Column Values Only Once

What does the following code produce?

-- set up test table 
declare @test TABLE (id int, value int);
declare @source TABLE (id int, increment int);
 
insert @test (id, value)
values (1, 0), (2, 0)
 
insert @source (id, increment)
values (1, 10),(1, 20),(1, 30),(2, 100),(2, 100),(2, 100)
 
-- "sum": add each value in the source to the existing value
update @test 
set value = value + increment
from @test t
join @source s
    on s.id = t.id
 
-- check the results
select id, value
from @test
/* The results:
id          value
----------- -----------
1           10
2           100
*/

Surprised? I think this example (or one very much like it) surprises those of us who started our careers as programmers. Many of us follow up this lesson with the related “UPDATE statements don’t have GROUP BY clauses” and then the lesson “How do I use CTEs?”

Indexed Views Don’t Support Max/Min Aggregates

Indexed views support a couple aggregate functions like COUNT_BIG() and SUM(). And with some trickery you can calculate AVG() and STDEV(). But SQL Server restricts the use of MAX() and MIN() in indexed views despite how useful they’d be.

It might help to understand why. SQL Server maintains indexed views as physical database objects and it can maintain aggregate values like COUNT() and SUM() by processing only the changing rows in the base table while safely ignoring the rest of the table. But I can’t say the same for MAX() or MIN(). If SQL Server supported MAX() and MIN() in indexed views, then when you delete the row in the base table that represents the MAX() value, SQL Server would have to search the rest of the base table to find the new MAX() value.

Check out the microsoft.connect feature suggestion Expand aggregate support in indexed views (MIN/MAX). Aaron Bertrand created this Connect suggestion and I like it because it shows how effective constructive feedback can be. I like it because of its description, comments and the useful workaround. The Microsoft team even gave some insight into how they almost included this feature in SQL 2008. This connect item only seems to be lacking an E.T.A. so go and cast your vote!

Bonus Content

I didn’t draw any illustrations this week so I’m including some bonus content (admittedly written by others):

Non-obvious Things From Twitter Friends

Request for Tweets: What thing in sql surprised you? Something not obvious. For me it was how the TRY and CATCH didn't always.

 I love how the log shipping "use default data and log folders" ... doesn't

how about how INSERT <tbl> VALUES (1),(2),(3) limits you to 1000 entries

Rolling time-window filtered index. The cool surprise - building each new index uses the existing one, so it's really quick.

Trivia

  • Did you know Michael Keaton (born Michael Douglas) changed his name to avoid confusion with that other guy? Pretty wise.
  • Jes Schultz Borland reminded me recently that good writers use more active verbs. I took that advice to heart and turned the writing of this article into an exercise. I avoided using words like is, was or are here and I think it turned out pretty well.

Powered by WordPress