there are use cases, where in and not in are faster: consider you are having a column with many Null values, e.g. having a CoverImageID column and an Images table with ImageID, but most of your books don't have the CoverImageID populated. if you are searching for a book with a specific cover image, it might be more efficient to query SELECT * FROM All_books WHERE CoverImageID IN (SELECT ImageID FROM Images WHERE fileName Like @fn).
Both good points. The point of the post was to explain to Junior DBAs, Analysts, etc how different syntax that does the same thing can be wildly different in performance, in simple, easy to relate to terms :)
This actually came out of a coffee room conversation with an Analyst that only queries data...
The NOT IN will work much better if it is correlated: SELECT AB.* FROM AllBooks AS AB WHERE BookID NOT IN (SELECT BookID from MyBooks WHERE BookID = AB.BookID);
The resulting plan gives the best of both: the efficiency of a left anti semi join, but minus the filter of the NOT NULL query.
A third way would be an NOT EXISTS query: SELECT AB.* FROM AllBooks AS AB WHERE NOT EXISTS (SELECT BookID from MyBooks WHERE BookID = AB.BookID);
A EXISTS query (without NOT) is particularly useful, if it is a 1:n query, e.g. because MyBooks not only contains your own books, but also the books of your wife, kids, parents, best friends and some books are very populare, so that there are multiple entries in your table. If you would use an INNER JOIN in this case, you would receive duplicates (three times the same AB.*-record), with the EXISTS (or the IN) clause you would get it only once.
5 comments:
there are use cases, where in and not in are faster: consider you are having a column with many Null values, e.g. having a CoverImageID column and an Images table with ImageID, but most of your books don't have the CoverImageID populated. if you are searching for a book with a specific cover image, it might be more efficient to query SELECT * FROM All_books WHERE CoverImageID IN (SELECT ImageID FROM Images WHERE fileName Like @fn).
It depends. With small datasets you get one plan. Try it on some Millon row tables with appropriate indexes. You'll get different plans
Thanks for the feedback!
Both good points. The point of the post was to explain to Junior DBAs, Analysts, etc how different syntax that does the same thing can be wildly different in performance, in simple, easy to relate to terms :)
This actually came out of a coffee room conversation with an Analyst that only queries data...
The NOT IN will work much better if it is correlated:
SELECT
AB.*
FROM
AllBooks AS AB
WHERE
BookID NOT IN
(SELECT BookID from MyBooks WHERE BookID = AB.BookID);
The resulting plan gives the best of both: the efficiency of a left anti semi join, but minus the filter of the NOT NULL query.
A third way would be an NOT EXISTS query:
SELECT AB.*
FROM AllBooks AS AB
WHERE NOT EXISTS (SELECT BookID from MyBooks WHERE BookID = AB.BookID);
A EXISTS query (without NOT) is particularly useful, if it is a 1:n query, e.g. because MyBooks not only contains your own books, but also the books of your wife, kids, parents, best friends and some books are very populare, so that there are multiple entries in your table. If you would use an INNER JOIN in this case, you would receive duplicates (three times the same AB.*-record), with the EXISTS (or the IN) clause you would get it only once.
Post a Comment