Takeaway: I explain how SQL Server is aware that inner joins are commutative and so the order of tables in your queries doesn’t matter.
Update Jan. 11, 2013: Please see my update at the end of this article where I qualify that statement (because it depends).
I want to explore a common question I get from people who are getting up to speed on this whole SQL thing. At one time or another, we’ve all wondered whether we get any performance improvements by varying the order that we join tables together (and by joins I mean inner joins).
The answer is no, so you can safely stop messing with the join order of your tables for performance reasons. So…
The point is that when SQL Server executes a query it explores query plans with different join orders, it then evaluates the estimated cost of each plan and picks the best one. It follows directly that changing the order of the tables in the from clause is not an effective optimization technique.
In other words, A JOIN B is equivalent to B JOIN A and SQL Server knows it. You can see this for yourself:
Showing that Joins are Commutative
The best way to demonstrate that is to come up with an example where SQL Server chooses a different join order for a query plan than the order specified in the query.
First create tables A, B and C and populate them
use tempdb; create table A ( id int identity constraint PK_A primary key, value uniqueidentifier default newid() ); create table B ( id int identity constraint PK_B primary key, value uniqueidentifier default newid() ); create table C ( id int identity constraint PK_C primary key, value uniqueidentifier default newid() ); GO set nocount on; insert A default values; GO 100 insert B default values GO 500 insert C default values GO 1000
Query With Two Joins
Check out the following query and query plan (with no query hints):
select a.value, b.value, c.value from c join b on b.id = c.id join a on a.id = b.id where c.id > 95
Notice that SQL Server has changed the join order from C-B-A to A-B-C because it’s better that way.
Same Query With Restricted Join Order
If you really want to, you can force the join order with the FORCE ORDER query hint. We’re basically telling the query optimizer to not explore plans with different join orders. The query optimizer uses different rules when exploring different plans to evaluate. One of the rules is called JoinCommute. We can actually turn it off using the undocumented query hint QUERYRULEOFF.
select a.value, b.value, c.value from c join b on b.id = c.id join a on a.id = b.id where c.id > 95 OPTION ( QUERYRULEOFF JoinCommute)
And we see that turning off JoinCommute gives behavior and performance just like FORCE ORDER.
Kind of neat eh? Now when the next person asks you whether the join order makes any difference, you can confidently say no. And if that person is from Missouri (the “Show me” state) you can point them to this post.
Update: But Not Always
From the above, you could take it that equivalent queries with different join orders should have the same query plan. But it turns out that in some cases, equivalent queries (with different join orders) generate different plans if the query itself has many tables to join. In this case what’s happening is that SQL Server’s query optimizer chooses a plan that’s not optimal. It can do this if:
- While evaluating different plans, it found a Good Enough Plan and then stopped evaluating others. It’s rare, but sometimes what’s good enough for SQL Server is not good enough for you.
- While evaluating different plans, it Timed Out and stopped evaluating others. In this case SQL Server didn’t evaluate all the different join orders, it evaluated a fair number of them and picked the best one it found
It’s tempting to then throw this article out and resume caring about the join order. It’s tempting to place smallest tables or the tables with the most selective filters up front telling SQL Server: “You’re going to evaluate a variety of join orders, maybe not all of them, but at least evaluate this one.”
All I can say is that in my time with this stuff, I’ve never needed to care. I mean I’ve seen that it can make a difference with very complicated queries or views, but I’ve also always been able to find a more stable/standard/supported way to improve performance. Like making sure statistics and indexes are up to snuff. The practice of using temp tables to store intermediate results was suggested by Brent and has worked for me too. As a last resort, I might consider different join orders but like I said, I’ve never needed to yet.
One last note, Microsoft’s Conor Cunningham wrote a very relevant post on this topic: http://blogs.msdn.com/b/conor_cunningham_msft/archive/2009/12/10/conor-vs-does-join-order-matter.aspx. Again, thanks Brent Ozar for that link.