My query construct was the following:
SELECT foo, bar FROM table1
WHERE foo IN (SELECT foo FROM table2 WHERE condition1 AND condition2)
AND bar IN (SELECT bar FROM table2 WHERE condition1 AND condition2);
Each of those two subqueries returned ~260,000 records, which, needless to say, took forever and a day to complete. I later discovered that the IN clause should mostly be used for small result sets, basically for when the subquery returns less than 1024 records. For result sets as massive as the ones I was getting, it is better to use the EXISTS clause. My query above could easily be modified to use EXISTS in the following way:
SELECT table1.foo, table1.bar FROM table1
WHERE EXISTS (SELECT table2.foo, table2.bar from table2
WHERE table1.foo = table2.foo
AND table1.bar = table2.bar
AND condition1
AND condition2);
As you can see, you have to join the two tables being used in order for it to work, otherwise EXISTS will return true and the query will just return the results of the master query.
Remember, when dealing with large result sets, EXISTS is your friend!
No comments:
Post a Comment