Michael J. Swart

October 12, 2022

You Can Specify Two Indexes In Table Hint?

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

Yes, It turns out that you can specify two indexes in a table hint:

SELECT Id, Reputation
FROM dbo.Users WITH (INDEX (IX_Reputation, PK_Users_Id))
WHERE Reputation > 1000

And SQL Server obeys. It uses both indexes even though the nonclustered index IX_Reputation is covering:
Two Indexes

But Why?

I think this is a solution looking for a problem.

Resolving Deadlocks?
My team wondered if this could be used as to help with a concurrency problem. We recently considered using it to resolve a particular deadlock but we had little success.

It’s useful to think that SQL Server takes locks on index rows instead of table rows. And so the idea we had was that perhaps taking key locks on multiple indexes can help control the order that locks are taken. But after some effort, it didn’t work at avoiding deadlocks. For me, I’ve had better luck using the simpler sp_getapplock.

Forcing Index Intersection?
Brent Ozar wrote about index intersection a while ago. Index intersection is a rare thing to find in a query plan. Brent can “count on one hand the number of times [he’s] seen this in the wild”.

In theory, I could force index intersection (despite the filter values):

SELECT Id
FROM dbo.Users WITH (INDEX (IX_UpVotes, IX_Reputation))
WHERE Reputation > 500000
AND UpVotes > 500000

But I wouldn’t. SQL Server choosing index intersection is already so rare. And so I think the need to force that behavior will be even rarer. This is not a tool I would use for tuning queries. I’d leave this technique alone.

Have You Used More Than One Index Hint?

I’d love to hear about whether specifying more than one index in a table hint has ever helped solve a real world problem. Let me know in the comments.

October 6, 2022

The Tyranny Of Cumulative Costs (Save and Forget Build Up)

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

50:50 Triangle

Using the right triangle above draw a vertical line separating the area of the triangle in to two parts with the same area.
The triangle on the left is 70.7% of the width of the original triangle.

Cumulative Storage Costs

Think of this another way. The triangle above is a graph of the amount of data you have over time. And if you pay for storage as an operational expense such as when you’re renting storage in the cloud (as opposed to purchasing physical drives). Then the cost of storage is the area of the graph. The monthly bills are ever-increasing, so half of the total cost of storage will be for the most recent 29%.

Put yet another way: If you started creating a large file in the cloud every day since March 2014, then the amount you paid to the cloud provider before the pandemic started is the same amount you paid after the pandemic started (as of August 2022).

How Sustainable is This?

If the amount of data generated a day isn’t that much, or the storage you’re using is cheap enough then it really doesn’t matter too much. As an example, AWS’s cheapest storage, S3 Glacier Deep Archive, works out to about $0.001 a month per GB.

But if you’re using Amazon’s Elastic Block Storage like the kind of storage needed for running your own SQL Servers in the cloud, the cost can be closer to $.08 a month per GB.

The scale on the triangle graph above really matters.

Strategies

This stresses the need for a data life-cycle policy. An exit story for large volumes of data. Try to implement Time-To-Live (TTL) or clean up mechanisms right from the beginning of even the smallest project. Here’s one quick easy example from a project I wrote that collects wait stats. The clean-up is a single line.

Look at Netflix does approaches this issue. I like how they put it. “Data storage has a lot of usage and cost momentum (i.e. save-and-forget build-up).”

Netflix stresses the importance of “cost visibility” and they use that to offer focused recommendations for cleaning up unused data. I recommend reading that whole article. It’s fascinating.

It’s important to implement such policies before that triangle graph gets too large.

Powered by WordPress