Showing posts with label lead. Show all posts
Showing posts with label lead. Show all posts

Wednesday, March 21, 2012

Is subquery bad?

Hi,
I got sceamed at by a lead developerfor using subquery. That'sOK, but I want
to make sure he was right.
Here was the story:
I was asked to write statments that return customers from a table excluding
duplicated records based on email.
My statement:
SELECT
c.*
FROM
customers c
JOIN
(
SELECT
MAX(CustomerID) AS CustomerID
FROM
customers
GROUP BY
email
) s
ON
c.CustomerID = s.CustomerID
The statment worked, but the lead developer said I shouldn't use subquery in
any circumstances, use temp table instead.
In this example, if there are 1 million records in the customers table, the
subquery will get executed 1 million times. Was he right?
TIAThere is nothing at all inherently wrong with subqueries. And in MANY cases,
subqueries are BETTER than temp tables. Temp tables are good when you need
the data in multiple queries, but if you're only going to use it in a single
query, your derived table is probably as good or better than a temp table.
The subquery will not be 'executed' a million times. It will be treated as
if it's a table and joined to the outer query exactly as if it were a temp
table, without all the overhead of creating a new object.
You can try writing this query with a temp table, and measuring the
performance of each, and showing your lead developer that he still has
things to learn, as we all do.
HTH
Kalen Delaney, SQL Server MVP
www.solidqualitylearning.com
"Raymond Du" <rdrd@.yahoo.com> wrote in message
news:%23eNzccaPGHA.720@.TK2MSFTNGP14.phx.gbl...
> Hi,
> I got sceamed at by a lead developerfor using subquery. That'sOK, but I
> want to make sure he was right.
> Here was the story:
> I was asked to write statments that return customers from a table
> excluding duplicated records based on email.
> My statement:
> SELECT
> c.*
> FROM
> customers c
> JOIN
> (
> SELECT
> MAX(CustomerID) AS CustomerID
> FROM
> customers
> GROUP BY
> email
> ) s
> ON
> c.CustomerID = s.CustomerID
> The statment worked, but the lead developer said I shouldn't use subquery
> in any circumstances, use temp table instead.
> In this example, if there are 1 million records in the customers table,
> the subquery will get executed 1 million times. Was he right?
> TIA
>
>
>|||--BEGIN PGP SIGNED MESSAGE--
Hash: SHA1
Well, your solution could have been done like this:
SELECT CustomerID
FROM Customers
GROUP BY CustomerID
HAVING COUNT(DISTINCT email) = 1
But, he was wrong about your "subquery" 'cuz it is a derived table. The
optimizer should generate this derived table before running the rest of
the query - IOW, that "subquery" runs once. A temp table is created for
the derived table in the tempdb DB and the join is between the regular
table Customers and the temporary, derived table in tempdb.
He was thinking about correlated subqueries. Something like this:
SELECT *
FROM Orders As O
WHERE OrderDate = (SELECT MAX(OrderDAte)
FROM Orders
WHERE CustomerID = O.CustomerID)
This subquery will run for each CustomerID in Orders table.
He's wrong about never using subqueries. That's like saying never use
reverse in your car 'cuz it will reverse the odometer. Crikey!
--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
--BEGIN PGP SIGNATURE--
Version: PGP for Personal Privacy 5.0
Charset: noconv
iQA/ AwUBRAZqp4echKqOuFEgEQLF6QCgni0awZ4bUAKb
0jaksFEj/lgoFwsAoNMv
FDzir29jPID44xdI+H42ERoj
=JcMp
--END PGP SIGNATURE--
Raymond Du wrote:
> Hi,
> I got sceamed at by a lead developerfor using subquery. That'sOK, but I wa
nt
> to make sure he was right.
> Here was the story:
> I was asked to write statments that return customers from a table excludi
ng
> duplicated records based on email.
> My statement:
> SELECT
> c.*
> FROM
> customers c
> JOIN
> (
> SELECT
> MAX(CustomerID) AS CustomerID
> FROM
> customers
> GROUP BY
> email
> ) s
> ON
> c.CustomerID = s.CustomerID
> The statment worked, but the lead developer said I shouldn't use subquery
in
> any circumstances, use temp table instead.
> In this example, if there are 1 million records in the customers table, th
e
> subquery will get executed 1 million times. Was he right?|||Fear of Subqueries is amazingly common, and while they can be misused,
I find temp tables to be misused far more often.
Yesterday I got a desperate call about a stored procedure that was
timing out. Since it is part of the system for all managers in the
company to enter the data for their staff's annual raises this was
considered a Very Bad Thing.
I made many changes to the proc, but one change in particular cut the
execution time by about two thirds. I removed one temp table (there
were two) and replaced it with a derived table in the one place it was
referenced. By putting all the information in the one query the
optimizer had a better picture of the data, and did a much better job.
SQL is a wonderfully powerful language, and most of that power is
reached by using subqueries. If you have not used them, and learned
from experience what they can do and how well they can do it, it is
easy to reason your way into believing they can not be efficient. But
nobody with actual experience would make such a blanket statement.
Roy Harvey
Beacon Falls, CT
On Wed, 1 Mar 2006 19:12:05 -0800, "Raymond Du" <rdrd@.yahoo.com>
wrote:

>Hi,
>I got sceamed at by a lead developerfor using subquery. That'sOK, but I wan
t
>to make sure he was right.
>Here was the story:
>I was asked to write statments that return customers from a table excludin
g
>duplicated records based on email.
>My statement:
>SELECT
> c.*
>FROM
> customers c
>JOIN
> (
> SELECT
> MAX(CustomerID) AS CustomerID
> FROM
> customers
> GROUP BY
> email
> ) s
>ON
> c.CustomerID = s.CustomerID
>The statment worked, but the lead developer said I shouldn't use subquery i
n
>any circumstances, use temp table instead.
>In this example, if there are 1 million records in the customers table, the
>subquery will get executed 1 million times. Was he right?
>TIA
>
>|||just wanted to add that in some cases temp tables have one clear
advantage: you can build indexes/statistics on them. That done, the
optimizer might have a better estimate, and the query might run much
faster.
But such cases are not frequent, and they are definitely less frequent
now than with SQL Server 7.|||In your case, if CustomerID is an indexed column then the subquery is
not bad. The join will be faster because it can use the CustomerID
index to s matching columns. In case of temp table an index s may
not have been possible.
Depending on how you use them, subqueries can be bad or good.|||The SQL Server query optimizer will not execute this subquery N times.
Subqueries are not bad (quite the opposite - they are very powerful).
In far more complex queries, the optimizer may not be able to determine the
proper join orders to use and temporary tables could be useful. I wouldn't
use this as a default option - usually only as a workaround for specific
queries where the estimated rows vs. actual rows are not anywhere close (one
reason that this could happen is that there is data correlation between
columns used in predicates in the query).
I'll suggest that you try to run each and see how much faster/slower your
original query is to the temporary table path.
Thanks,
Conor Cunningham
SQL Server Query Optimization Development Lead
"Raymond Du" <rdrd@.yahoo.com> wrote in message
news:%23eNzccaPGHA.720@.TK2MSFTNGP14.phx.gbl...
> Hi,
> I got sceamed at by a lead developerfor using subquery. That'sOK, but I
> want to make sure he was right.
> Here was the story:
> I was asked to write statments that return customers from a table
> excluding duplicated records based on email.
> My statement:
> SELECT
> c.*
> FROM
> customers c
> JOIN
> (
> SELECT
> MAX(CustomerID) AS CustomerID
> FROM
> customers
> GROUP BY
> email
> ) s
> ON
> c.CustomerID = s.CustomerID
> The statment worked, but the lead developer said I shouldn't use subquery
> in any circumstances, use temp table instead.
> In this example, if there are 1 million records in the customers table,
> the subquery will get executed 1 million times. Was he right?
> TIA
>
>|||The advantage is not so clear when temp tables are in a stored
procedure, as the index does not exist when the execution plans are
formulated. I believe 2000 took care of this in part by adding extra
overhead that caused the plan to be re-evaluated as the proc executed.
Roy Harvey
Beacon Falls, CT
On 2 Mar 2006 06:40:06 -0800, "Alexander Kuznetsov"
<AK_TIREDOFSPAM@.hotmail.COM> wrote:

>just wanted to add that in some cases temp tables have one clear
>advantage: you can build indexes/statistics on them. That done, the
>optimizer might have a better estimate, and the query might run much
>faster.
>But such cases are not frequent, and they are definitely less frequent
>now than with SQL Server 7.|||in 2000, the plans are recompiled when necessary.|||On 2 Mar 2006 10:15:54 -0800, "Alexander Kuznetsov"
<AK_TIREDOFSPAM@.hotmail.COM> wrote:

>in 2000, the plans are recompiled when necessary.
Which is overhead, possible considerable.
Roy