This post is a story about my daughter’s school project with a plug for Wolfram Alpha and maybe Excel.
My daughter came to me the other day and asked me to help her with some homework… She said she needed data. She was learning about plotting graphs in math class and she was asked to bring in some data that she could plot. Learning, Math, Data. I felt pretty proud (Just as any parent might when kids take an interest in their profession.)
Personally, I thought this was great! My kid is asking me about data! At the very least, I could explain to her a little better what it is that I do at work. At best maybe she could “catch” what that I like about the topic.
The project itself was pretty straightforward. The students were asked to bring in a list of numbers that they could use in class to create a chart. The data could be about anything and there were extra marks for plotting two series instead of one.
What We Did
So I asked my daughter what she wanted to bring… it could be anything she was curious about. Eventually we:
For the extra series, I picked temperature for a location that I thought would contrast well with the local temperature. That location was Sydney Australia.
But we needed the data points, this led me to use the most beautiful and useful feature that Wolfram Alpha provides, Data Download:
Pretty much the only feature analysts need
It was the work of a minute (using Excel) to summarize the average monthly temperatures for both cities. I got the data into the right shape, created a pivot table, and adjusted some of the filters.
After everything was done, the chart itself was interesting. As I expected, the average temperature (here) reaches its maximum in July and we saw the opposite trend in Australia. One surprising thing was that the temperatures varies very little in Sydney. It looks like 10°C (50 °F) is a very cold day in Australia. So the numbers confirmed what we guessed and showed us something new… Not bad.
Keeping away from SQL Server (For Now)
Not once did I even mention or use SQL Server or tables or databases. And I didn’t dwell on the Excel features. I just explained what we were looking at. In this case, I thought that it was better to show what the data tools could do before showing how they do it.
Missed Opportunitiy?
But I feel like I must have missed out on something here. Don’t you think so? Tracking temperature is kind of boring… I felt like a carpenter who was trying to show off what could be done with a router but ending up with a birdhouse. What kind of data would you have picked? I tried to think of something that’s actually interesting or useful but came up empty and went with average temperature. If you have any better ideas, let me know.
I’ve discovered that non-intuitive lessons stick in the brain the best. I share three examples that I either learned recently (or at least re-learned recently).
Uniqueifiers on Indexes That I Thought Shouldn’t Need Them
Do you remember what a Uniquifier is? SQL Server uses these hidden values on rows to keep non-clustered indexes in sync with their non-unique clustered indexes. Think of this questionable schema:
create table Actors
(
ActorId int identity not null,
LastName nvarchar(100) not null,
FirstName nvarchar(100) not null,
AgentId int null,
primary key nonclustered (ActorId)
)
create clustered index IX_Actors
on Actors(LastName, FirstName)
I can demonstrate the uniquifier value here because of the non-unique clustered index:
select sys.fn_PhysLocFormatter(%%physloc%%)as[physical rid], *from Actors -- in my case, this shows page 231DBCC TRACEON (3604);
DECLARE @dbid int=DB_ID();
DBCC PAGE(@dbid, 1, 231, 3);
/* showing (among other things):
...
Slot 1 Offset 0x60 Length 51
UNIQUIFIER = 0
LastName = Douglas
FirstName = Michael
ActorId = 1
AgentId = 3
KeyHashValue = (16035ff378a3)
Slot 2 Offset 0x93 Length 55
UNIQUIFIER = 1
LastName = Douglas
FirstName = Michael
ActorId = 2
AgentId = 4
KeyHashValue = (16035e22af06)
...
*/
select sys.fn_PhysLocFormatter (%%physloc%%) as [physical rid], *
from Actors -- in my case, this shows page 231
DBCC TRACEON (3604);
DECLARE @dbid int = DB_ID();
DBCC PAGE(@dbid, 1, 231, 3);
/* showing (among other things):
...
Slot 1 Offset 0x60 Length 51
UNIQUIFIER = 0
LastName = Douglas
FirstName = Michael
ActorId = 1
AgentId = 3
KeyHashValue = (16035ff378a3)
Slot 2 Offset 0x93 Length 55
UNIQUIFIER = 1
LastName = Douglas
FirstName = Michael
ActorId = 2
AgentId = 4
KeyHashValue = (16035e22af06)
...
*/
Cool! Just like real life (See Michael Douglas (I – XXVI) at IMDB). This uniquifier value behaves very much like the roman numerals you see at IMDB.
Here’s the part that wasn’t obvious… what if I chose a clustered index whose columns include the primary key…
dropindex IX_Actors ON Actors;
createclusteredINDEX IX_Actors
ON Actors(LastName, FirstName, ActorId)
drop index IX_Actors ON Actors;
create clustered INDEX IX_Actors
ON Actors(LastName, FirstName, ActorId)
That should be unique right? The index includes the primary key column (ActorId) that guarantees it. So SQL Server shouldn’t need the uniquifier right?
But it does!
SQL Server doesn’t count on the index being unique unless you say so. I actually discovered the above while using Kendra Little’s sp_BlitzIndex, a tool so underexposed it’s pasty.
UPDATE Modifies Column Values Only Once
What does the following code produce?
-- set up test table declare @test TABLE(id int, valueint);
declare @source TABLE(id int, increment int);
insert @test (id, value)values(1, 0), (2, 0)insert @source (id, increment)values(1, 10),(1, 20),(1, 30),(2, 100),(2, 100),(2, 100)-- "sum": add each value in the source to the existing valueupdate @test
setvalue=value+ increment
from @test t
join @source s
on s.id= t.id-- check the resultsselect id, valuefrom @test
/* The results:
id value
----------- -----------
1 10
2 100
*/
-- set up test table
declare @test TABLE (id int, value int);
declare @source TABLE (id int, increment int);
insert @test (id, value)
values (1, 0), (2, 0)
insert @source (id, increment)
values (1, 10),(1, 20),(1, 30),(2, 100),(2, 100),(2, 100)
-- "sum": add each value in the source to the existing value
update @test
set value = value + increment
from @test t
join @source s
on s.id = t.id
-- check the results
select id, value
from @test
/* The results:
id value
----------- -----------
1 10
2 100
*/
Surprised? I think this example (or one very much like it) surprises those of us who started our careers as programmers. Many of us follow up this lesson with the related “UPDATE statements don’t have GROUP BY clauses” and then the lesson “How do I use CTEs?”
Indexed Views Don’t Support Max/Min Aggregates
Indexed views support a couple aggregate functions like COUNT_BIG() and SUM(). And with some trickery you can calculate AVG() and STDEV(). But SQL Server restricts the use of MAX() and MIN() in indexed views despite how useful they’d be.
It might help to understand why. SQL Server maintains indexed views as physical database objects and it can maintain aggregate values like COUNT() and SUM() by processing only the changing rows in the base table while safely ignoring the rest of the table. But I can’t say the same for MAX() or MIN(). If SQL Server supported MAX() and MIN() in indexed views, then when you delete the row in the base table that represents the MAX() value, SQL Server would have to search the rest of the base table to find the new MAX() value.
Check out the microsoft.connect feature suggestion Expand aggregate support in indexed views (MIN/MAX). Aaron Bertrand created this Connect suggestion and I like it because it shows how effective constructive feedback can be. I like it because of its description, comments and the useful workaround. The Microsoft team even gave some insight into how they almost included this feature in SQL 2008. This connect item only seems to be lacking an E.T.A. so go and cast your vote!
Bonus Content
I didn’t draw any illustrations this week so I’m including some bonus content (admittedly written by others):
Non-obvious Things From Twitter Friends
Trivia
Did you know Michael Keaton (born Michael Douglas) changed his name to avoid confusion with that other guy? Pretty wise.
Jes Schultz Borland reminded me recently that good writers use more active verbs. I took that advice to heart and turned the writing of this article into an exercise. I avoided using words like is, was or are here and I think it turned out pretty well.