Michael J. Swart

September 12, 2012

When I Use Nested Joins

Filed under: SQL Scripts,SQLServerPedia Syndication,Technical Articles — Tags: — Michael J. Swart @ 12:00 pm

I want to explain how I avoid RIGHT OUTER joins in favor of LEFT OUTER joins and how I avoid OUTER joins in favor of INNER joins.

There's no place bitchin-er than Kitchener

Inner joins have no direction, but outer joins do so that we have three kinds of joins:

  • INNER JOIN (JOIN for short)
  • LEFT OUTER JOIN (LEFT JOIN for short)
  • RIGHT OUTER JOIN (RIGHT JOIN for short)

I leave out FULL OUTER JOINS for now because I never use them. And in fact RIGHT OUTER JOINS can always be written as LEFT OUTER JOINS, so in practice I only use just the two kinds of joins:

  • INNER JOIN
  • LEFT OUTER JOIN

Now as a rule of thumb, inner joins are more efficient than outer joins so it would be better to write queries that avoid outer joins. Let’s get to an example. My goal here is to write a query that gives me a list of employees and their director (if any) based on this schema.

CREATE TABLE STAFF
(
    Id NVARCHAR(20) NOT NULL PRIMARY KEY,
    Name NVARCHAR(400) NOT NULL,
    Department NVARCHAR(20),
    Role NVARCHAR(20) NOT NULL
)
 
CREATE TABLE BOSSES
(
    EmployeeId NVARCHAR(20)
        REFERENCES STAFF(Id),
    BossId nvarchar(20)
        REFERENCES STAFF(Id),
    PRIMARY KEY (EmployeeId, BossId)
)

BOSSES is a table that contains not just direct reports, but all direct and indirect reports (making it handy for this query).

Using a RIGHT JOIN

The answer is fairly straightforward. I join the BOSSES table with the STAFF table to give me all the directors and their reports:
Two inner joins

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Boss
join BOSSES B
    on B.BossId = S_Boss.Id
join STAFF S_Employee
	on B.EmployeeId = S_Employee.Id
where S_Boss.Role = 'Director'

But wait, this isn’t a complete list of employees. What about those in the company that don’t report to any director (not even indirectly). This is where the right outer join comes in:
An inner join and a right join

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Boss
join BOSSES B
    on B.BossId = S_Boss.Id
RIGHT join STAFF S_Employee
    on B.EmployeeId = S_Employee.Id
    and S_Boss.Role = 'Director'

Notice that the S_Boss.Role filter can’t belong to the where clause any more. If it did, we’d lose director-less employees again and we’d be back where we started.

Using LEFT JOINS Only

That works… but for the sake of style, let’s only use left joins. I prefer using only left joins in this case because logically, the results are meant to be the set of employees. So I like to start with that list of employees and then join in the bosses if necessary. That’s why I start with the set of employees as the first table. The other joined tables aren’t the stars of this query; they’re only there to help provide that director attribute. So I have this:
Two left joins

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Employee
left join BOSSES B
    on B.EmployeeId = S_Employee.Id
left join STAFF S_Boss
    on B.BossId = S_Boss.Id
    and S_Boss.Role = 'Director'

But you notice that I’m now using two left joins… Really I only want the one outer join that I was using in the first example. Turns out I can do that:

Using Nested Joins

Well that just looks like this:
One left join

select S_Employee.Name, S_Boss.Name as Director
from STAFF S_Employee
left join BOSSES B
    join STAFF S_Boss
        on B.BossId = S_Boss.Id
        and S_Boss.Role = 'Director'
    on B.EmployeeId= S_Employee.Id

This is logically equivalent to the right join but it uses left joins instead and only uses one outer join. Great!

A lot about this example is only about style. Maybe you prefer the right join example and think that I’m breaking some grammar rule here. It feels like I’m splitting an infinitive or something. Let me know what your preference is.

15 Comments »

  1. Even though I’m right-handed, I prefer LEFT JOINs. Seems like everybody else does too, I rarely ever see RIGHT JOINs. In fact, I see *= much more than RIGHT JOINs.

    Comment by Brian K — September 12, 2012 @ 12:39 pm

  2. As a matter of style, *= is so last century 🙂

    Comment by Michael J. Swart — September 12, 2012 @ 1:03 pm

  3. Who knew this syntax existed? I didn’t. Funny.

    The same could be achieved with derived tables of course.

    I found that when I do this trick (convert “two left joins with one of them really being an inner join” to “left join to an inner join”) I sometimes gain and sometimes loose performance. It seems to me that the query optimizer has a hole here. The transformation you described could be done automatically by the optimizer at least in this case.

    Comment by tobi — September 12, 2012 @ 4:04 pm

  4. You’re right about performance… I was careful not to make any claims about performance here.

    In the case I mentioned the chained left joins could in theory be transformed into an execution plan equivalent to the last example because they would logically be the same. That’s true because comparing NULL to a value always returns UNKNOWN (which is not TRUE).

    I could have built an example where my join condition used things like IS NULL and COALESCE where it mattered. But you’re right. Aside from that case, in theory it is possible for the query optimizer to recognize and convert this kind of outer join into an inner join.

    Comment by Michael J. Swart — September 12, 2012 @ 4:12 pm

  5. […] When I Use Nested Joins – Join Michael J. Swart (Blog|Twitter) for an excellent walk-through by example. […]

    Pingback by Something for the Weekend - SQL Server Links 14/09/12 — September 14, 2012 @ 5:56 am

  6. Hi Michael,

    The optimizer can perform the conversion you seek, though whether it does or not is a cost-based decision as always. For example, try:

    — Transformed to left on inner join
    select S_Employee.Name, S_Boss.Name as Director
    from STAFF S_Boss
    join BOSSES B
    on B.BossId = S_Boss.Id
    RIGHT join STAFF S_Employee
    on B.EmployeeId = S_Employee.Id
    and S_Boss.Role = ‘Director’
    OPTION (MERGE JOIN);

    I think adding parentheses when using ‘nested’ syntax makes it easier to understand and maintain:

    select S_Employee.Name, S_Boss.Name as Director
    from STAFF S_Employee
    left join
    (
    BOSSES B
    join STAFF S_Boss
    on B.BossId = S_Boss.Id
    and S_Boss.Role = ‘Director’
    ) on B.EmployeeId= S_Employee.Id;

    Comment by Paul White — September 14, 2012 @ 8:20 am

  7. Hi Paul,
    The optimizer can turn a left join into a right join (and vice versa) based on cost analysis. I think that’s expected.
    What it doesn’t do (and I don’t expect it to) is turn an outer join to an inner join when logically they should be equivalent.

    So something I didn’t make clear in my post is that
    * I prefer inner joins over outer joins for practical reasons.
    * And I prefer left joins over right joins for reasons of style (strictly a syntax choice here and not a query plan preference)

    And because of style, I love the parentheses that you use. I wasn’t aware that was an option. So I’m sold. You’ll never see me use a nested join without them again 🙂

    Comment by Michael J. Swart — September 14, 2012 @ 11:15 am

  8. “What it doesn’t do (and I don’t expect it to) is turn an outer join to an inner join when logically they should be equivalent.”

    I think I’m missing something here. The optimizer does un-outer joins:

    — Inner joins
    SELECT
    S_Employee.Name,
    S_Boss.Name as Director
    FROM STAFF S_Employee
    LEFT JOIN BOSSES B ON
    B.EmployeeId = S_Employee.Id
    LEFT JOIN STAFF S_Boss ON
    B.BossId = S_Boss.Id
    AND S_Boss.Role = ‘Director’
    WHERE
    B.EmployeeId IS NOT NULL
    AND S_Boss.Id IS NOT NULL;

    One other thing about LEFT vs RIGHT that I meant to mention before, but forgot. With a hash join, LEFT vs RIGHT can make a difference because it determines which input is the build input. Hash memory depends on the size of the build (left) input, so there are circumstances where rewriting RIGHT JOIN as LEFT JOIN might make a query perform worse, use more resources, or both. Naturally the optimizer will look to transform LEFT RIGHT, but there are ways to write queries that can make that impossible. Everything depends, right? ;c)

    Comment by Paul White — September 14, 2012 @ 11:27 am

  9. What do you know. It does do un-outer joins. Kudos to Microsoft!

    I hate that I made an assumption without double-checking. I continued with my assumption when you demonstrated RIGHT–>LEFT query plan transformation, but not the OUTER–>INNER transformation.

    Part of my confusion is that with the example:

    SELECT
    S_Employee.Name,
    S_Boss.Name as Director
    FROM STAFF S_Employee
    LEFT JOIN BOSSES B ON
    B.EmployeeId = S_Employee.Id
    LEFT JOIN STAFF S_Boss ON
    B.BossId = S_Boss.Id
    AND S_Boss.Role = ‘Director’;

    In this example, it is possible to deduce that if the query plan were to “un-outer join” the join between STAFF S_Boss and BOSSES then it would still be logically equivalent. But maybe it doesn’t matter because a “Good Enough Plan” was found.

    Thanks for the feedback Paul, it’s much appreciated.

    Comment by Michael J. Swart — September 14, 2012 @ 11:48 am

  10. Funny, I wasn’t sure what you meant by “nested” joins, but seeing the explanation the naming sounds correct. Also, funny, is that I prefer NOT to use parentheses, but I do usually put a comment in. I think the reason I don’t like parentheses is because they always make me think it is a derived table.

    Comment by Jack D Corbett — September 17, 2012 @ 10:09 am

  11. Another possible way to handle the inner-joins within an outer join is to use a CTE. I find that a WITH clause can go a long way toward making these joins much easier to understand with less indentation, especially when many tables are involved.

    Comment by Mark Freeman — September 17, 2012 @ 10:13 am

  12. Hi Jack,
    Hmm… Nested joins… That’s what they’re called aren’t they? I can’t remember where I heard the term first.
    I never thought of comparing nested joins to derived tables before, but derived tables are very very close to these kinds of joins… Essentially, they’re just missing a SELECT clause and the FROM keyword.

    Hi Mark,
    I agree with you Mark: “especially with many tables”. I love CTE’s. CTE’s are great and not just for their recursive features, but for their style. They can make everything look so neat and tidy and understandable.

    Comment by Michael J. Swart — September 17, 2012 @ 10:21 am

  13. I agree with Brian K as mostly you are starting with a big recordset and trying to drill down.

    Another option is to skip joins and use sub queries. At times they are faster than joins.

    Comment by Kash — September 25, 2012 @ 3:12 pm

  14. […] When I Use Nested Joins by Michael J. Swart A simple explanation of SQL right vs. left joins and inner vs. outer joins. […]

    Pingback by Technology Post Roundup–4th Edition « Jonathan Rozenblit — September 28, 2012 @ 4:12 am

  15. […] Some people refer to this syntax style as ‘nested join syntax’. Example blog post. […]

    Pingback by View designer strange join syntax | CL-UAT — December 14, 2014 @ 10:40 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress