Get Rid of RID Lookups

August 31, 2009

Takeaway: If you see an unexpected RID lookup in an execution plan, consider adding a clustered index as a db schema improvement.

What is a RID lookup?

You may come across the RID lookup operator as part of an execution plan. With Management Studio, RID lookups are displayed with the following icon (sort of … I fancied it up a little).

A RID Lookup is a lookup into a heap using a Row ID. The Row is included with entries in a non-clustered index in order to find the rest of a table’s data in a heap. (Remember, with a heap, the table data is stored unordered so a Row ID is needed for the correlation).

The RID Lookup’s official docs can be read here at Books Online. But it’s light on the details and it references a more thorough explanation of lookups in general which can be found at Craig Freedman’s blog here.

They’re bad… or at least not good.

When you find an RID Lookup in a query plan, it’s a symptom. It indicates a database schema that breaks several rules of thumb. Those rules-of-thumb are:

Each table should have a clustered index (of course there are exceptions but we’re dealing with rules-of-thumb here).
A non-clustered index has been created indicating that someone somewhere identified an ordering on one or more columns that made sense for that data.
There is at least one query (i.e. the one that generated the RID Lookup) that needs columns that are not covered by the non-clustered index.

These three points mean a wasted opportunity. A clustered index should have been created. A heap plus a non-clustered index take about the same space as a clustered index. At least logically: the non-clustered pages map to the clustered index’s index nodes and the heap’s pages map to the clustered index’s leaf nodes.

If possible, definitely consider modifying the schema to include a clustered index (After assessing risks of course).

Really? Always?

Well of course no respectable blogger writes in absolutes and Grant Fritchley has a video here that discusses RID lookups further. He says that when the RID lookup only deals with one row, that the impact is negligible. True, but had the table been designed with a clustered index, the performance would have been no worse.

-- Comments (26)

26 Comments »

Just what I was looking for, thanks.

Comment by Abe Miessler — December 28, 2009 @ 1:25 pm
Good to hear. Thanks for the feedback.

Comment by Michael J. Swart — December 28, 2009 @ 7:40 pm
Do you know if having an IsNull on one of your join values can cause an RID lookup even if the proper indexes are in place?
Example:

select ID, Name from Table1 as t1
INNER JOIN Table2 as t2
ON ISNULL(t1.id,123) = t2.id

Comment by Abe Miessler — February 10, 2010 @ 7:23 pm
I don’t think that could happen. There might be scans and other stuff going on, but no RID lookups.

As far as I know, RID lookups only happen when there is no clustered index.

However, if this is what you’re seeing try using
EXEC sp_helpindex 't1'
EXEC sp_helpindex 't1'
and
EXEC sp_helpindex 't2'
EXEC sp_helpindex 't2'
to look at the indexes on these tables. My bet is that one of them has indexes that are only non-clustered.

Comment by Michael J. Swart — February 11, 2010 @ 8:38 am
I’ll give that a shot. Thanks for you help Michael!

Comment by Abe Miessler — February 11, 2010 @ 11:41 am
I DIDNOT UNDERSTAND YOUR SECOND POINT…. CAN YOU PLEASE EXPLAIN ?

Comment by AK1516 — November 22, 2011 @ 2:41 pm
Hi there ak1516. I’m not sure what you’re having trouble with? (Also this)

Comment by Michael J. Swart — November 22, 2011 @ 2:54 pm
non-clustered index has been created indicating that someone somewhere identified an ordering on one or more columns that made sense for that data.

i did not get this…

Comment by AK1516 — November 22, 2011 @ 3:19 pm
Hi AK1516,
I mean that a non-clustered index has been created by somebody. That person created the index for some reason. Find that person and ask them why they didn’t create it as a clustered index.

Comment by Michael J. Swart — November 25, 2011 @ 8:20 am
>>As far as I know, RID lookups only happen when there is no clustered index.

Doesn’t the inclusion of ‘… INCLUDE(select_column) …’ when creating the index prevent RIDs without the index having to be clustered?

Comment by David — January 4, 2012 @ 5:09 pm
Great question Dave, using the INCLUDE clause in a nonclustered index is a great technique.
If you include a column when creating the nonclustered index, you do indeed avoid the RID lookups without the index having to be clustered.

One interesting point though is that if a query requests all columns for a row, you may find yourself wanting to include all columns in the INCLUDE clause of a nonclustered index. The interesting thing is that if all columns are included in the nonclustered index, then that index is now structurally equivalent to a clustered one.

Comment by Michael J. Swart — January 5, 2012 @ 3:00 pm
[…] MATCH was gone, but I was left with an RID LOOKUP (HEAP) at 50% and an INDEX SEEK at 50%. Well the RID LOOKUP’s are not that cool either. An RID LOOKUP is a lookup into a heap using a ROW ID. Basically it is a […]

Pingback by A Successful Query Optimization | Did It Save — March 2, 2012 @ 4:13 pm
Ouch! I’m on the situation of your 11th comment, have a big query with lot of columns on the SELECT clause. Already have several indexes on the table, clustered, non clustered, etc. I was willing to add all the columns on a new non clustered index to avoid the lookup but googled first and found this post and comment… needless to say my hopes where dashed 🙁 will continue searching for a solution…great blog by the way, just suscribed

Comment by Yaroslav — June 13, 2012 @ 6:43 am
Hi Yaroslav, Not so quick.

Don’t feel bad about creating indexes that are “covering” which is essentially what you’re after. It just doubles space and modifications. It’s up to you whether your database can handle that.

For example, I’d have no problem including all columns in an index for a table that’s less than a thousand rows. But if we’re talking billions, maybe not. Some more care has to be taken.

Good luck Yaroslav, I’m always glad to hear from a new reader.

Comment by Michael J. Swart — June 13, 2012 @ 8:37 am
Awesome! An author that indeed answers and helps the readers… found a gold mine 😉

In my case I have “only” 40 extra columns that maybe should include. But I’m in doubt as I’ve already another non clustered index with covering columns, around 30 of them. The current lookup is a clustered key lookup, and it is using a Seek Predicate as:

Seek Keys[1]: Prefix: [Orders].OrderPk = Scalar Operator([Orders].[OrderPk] as [p].[OrderPk])

There is already a clustered index on the OrderPk column. I do not know if it can harm the performance on this case, but there are 3 columns type nvarchar(Max). Tried not including them but I got similar results on the execution plan an on the Client Statistics report.

Comment by Yaroslav — June 13, 2012 @ 9:31 am
Thanks for the compliment 🙂

Wow, such a wide table. Only 40 extra columns!

I think that there’s so much to take into account when designing indexes. I’m sure I don’t have the whole picture either (like typical use of the table. Or how volatile the table is (i.e. are new rows mostly appended? Or are they often modified?) but I’ll offer some points to watch out for:
- With such a wide table, the cost of extra wide indexes is going to be bigger.
- There’s really no such thing as a mostly covering index. If you don’t include some columns in an index, then SQL Server can’t use it (alone) to serve the data
- The best test for your solution may be to experiment and test. Try it and see. If you have a good way to measure performance (Query plans are good, SET STATISTICS IO, TIME ON is also good), then if you find something that works, chances are you can ignore any other advice 🙂
Comment by Michael J. Swart — June 13, 2012 @ 11:59 am
In fact is not a table, I’m doing the query against a view. This view is created from another about 15 tables. The main table has many inserts and modifications but no delete and grows maybe around a couple of thousands rows daily. The extra cost of space is one of the “but” I was thinking on. As you pointed out, I will experiment and do some changes to see what happens. Indeed after posting my previous post I already made some changes and saw some improvements by not including several nvarchar(max) fields on the view and the final query.

Comment by Yaroslav — June 13, 2012 @ 5:04 pm
I guess this question just asked on StackOverflow can make for a good post:
http://stackoverflow.com/questions/11035650/computed-columns-index-clustered-index-and-covering-index

Comment by Yaroslav — June 14, 2012 @ 10:58 am
[…] MATCH was gone, but I was left with an RID LOOKUP (HEAP) at 50% and an INDEX SEEK at 50%. Well the RID LOOKUP’s are not that cool either. An RID LOOKUP is a lookup into a heap using a ROW ID. Basically it is a […]

Pingback by A Successful Query Optimization | Jason Kassay — January 21, 2014 @ 7:14 pm
I have a report on a database that runs about 30 minutes. I found that the column that the join was on was part of a unique, non-clustered index on the main table. I simply defined that non-clustered index as clustered and ta-da; the report runs in 1 second. Thanks for the helpful article.

Comment by cc — June 4, 2014 @ 1:34 pm
Wow! That’s always good to hear cc.
That’s a large part of what I like about solving database performance problems. Sometimes, but not always, the query can be improved to run with a duration an order of magnitude shorter than the original duration. It’s good to hear about these dramatic results.

Comment by Michael J. Swart — June 4, 2014 @ 1:40 pm
Thank you. good article to understand this concept

Comment by julian castiblanco — August 21, 2015 @ 11:33 am
One thing to note is that you don’t always want to have an index on all the fields returned by your query. And so that will mean that you will still have RID Lookups, and that’s fine, and can even be very fast for large amounts of data.

For example, if you are searching for records by key and returning a lot of other fields from the same table, associated with that key, and you don’t want to create an index for all the fields you are returning – either because it makes writing to that table unacceptably slow or because it causes the clustered index to take up an unacceptable amount of disk space.

Comment by Peter — September 13, 2017 @ 1:05 am
I think you’re missing something Peter.
The clustered index is the table. The clustered index contains all the data at the leaf level.

If there are any indexes on a table, then my suggestion is that one of them should be the clustered index. Following that advice will mean no more RID lookups. Key lookups maybe, but no RID lookups.

Comment by Michael J. Swart — September 13, 2017 @ 9:57 am
Thanks for your advise Michael, but how about the scenario I mentioned where a lot of data (many rows, and many fields in the table) needs to be inserted into the table very quickly? Having a clustered index will be much slower than a heap because with the clustered index all the data needs to be reorganized to slot the new rows into the correct places, whereas writing to a heap is as quick as doing a single write to disk. In such a scenario surely it is best not to have a clustered index, and so RID Lookups will be unavoidable.

Comment by Peter — September 14, 2017 @ 12:25 am
Hi Peter,
Okay, I get what you’re saying now.

The reorganization overhead that is incurred on insert can be a problem because they cause page splits. And that can be a significant overhead.

But writing to a heap is not as simple as doing a single write to disk. You have to do an insert to the nonclustered indexes too, which can be just as expensive. The data in the index needs to be reorganized to slot the new rows into the correct places. In other words that nonclustered index can have page splits too.

So I would say that if
- You have a really wide table (many columns)
- And only one or two indexes
- And you really care about write performance way more than read performance
- And your index keys are not “monotonically increasing”, (new rows don’t go at the end of the table so that inserts might cause page splits)
Then I might consider a heap. But even then, I’d want to be shown test results to be convinced that it truly is better.

Comment by Michael J. Swart — September 15, 2017 @ 8:59 am