You want fries with that?: Join vs. IN Not IN

8/09/2016

Join vs. IN Not IN

This post has been moved to:

http://dallasdbas.com/join-vs-in-not-in/

Come on over and look around!

5 comments:

Sebastian Leupold said...: there are use cases, where in and not in are faster: consider you are having a column with many Null values, e.g. having a CoverImageID column and an Images table with ImageID, but most of your books don't have the CoverImageID populated. if you are searching for a book with a specific cover image, it might be more efficient to query SELECT * FROM All_books WHERE CoverImageID IN (SELECT ImageID FROM Images WHERE fileName Like @fn).; 6:00 AM
Gerald Britton, MCSA SQL Server, MVP said...: It depends. With small datasets you get one plan. Try it on some Millon row tables with appropriate indexes. You'll get different plans; 7:23 AM
Kevin3NF said...: Thanks for the feedback!

Both good points. The point of the post was to explain to Junior DBAs, Analysts, etc how different syntax that does the same thing can be wildly different in performance, in simple, easy to relate to terms :)

This actually came out of a coffee room conversation with an Analyst that only queries data...; 8:11 AM
Unknown said...: The NOT IN will work much better if it is correlated:
SELECT
AB.*
FROM
AllBooks AS AB
WHERE
BookID NOT IN
(SELECT BookID from MyBooks WHERE BookID = AB.BookID);

The resulting plan gives the best of both: the efficiency of a left anti semi join, but minus the filter of the NOT NULL query.; 8:31 AM
Thomas Franz said...: A third way would be an NOT EXISTS query:
SELECT AB.*
FROM AllBooks AS AB
WHERE NOT EXISTS (SELECT BookID from MyBooks WHERE BookID = AB.BookID);

A EXISTS query (without NOT) is particularly useful, if it is a 1:n query, e.g. because MyBooks not only contains your own books, but also the books of your wife, kids, parents, best friends and some books are very populare, so that there are multiple entries in your table. If you would use an INNER JOIN in this case, you would receive duplicates (three times the same AB.*-record), with the EXISTS (or the IN) clause you would get it only once.; 2:51 PM