Friday, February 24, 2012

Is one better than the other?

Sql server 2000 SP4.

Here is the schema, other details are ignored for simplicity.

Create Table TableC ( TCID int NOT NULL)

Create Table TableW (TCID int NOT NULL , EndDate datetime)

TCID is PK in both the tables.

I have two queries one with subquery with IN operator and the other with traditional join. Execution plans are almost identical except one uses right semi join and the other uses inner join. Resource cost is also same except 'Estimated row size' for right semi-join is less than the traditional join. Does Semi-joins cost less than the traditional inner joins?

How are these two different from each other? Is there any performance related difference between semi joins and joins?

select * from dbo.TableC where TCID in
(select TCID from dbo.TableW where EndDate IS NULL )


StmtText
|--Merge Join(Right Semi Join, MERGE: ([TableW].[TCID])=([TableC].[TCID]), RESIDUAL: ([TableC].[TCID]=[TableW].[TCID]))
|--Clustered Index Scan(OBJECT: ([DBNAME].[dbo].[TableW].[PK_TableW]), WHERE: ([TableW].[cEndDate]=NULL) ORDERED FORWARD)
|--Clustered Index Scan(OBJECT: ([DBNAME].[dbo].[TableC].[PK_TableC]), ORDERED FORWARD)

select c.* from TableC c join TableW w
on c.TCID = w.TCID
and EndDate IS NULL


StmtText
|--Merge Join(Inner Join, MERGE: ([w ].[TCID])=([c ].[TCID]), RESIDUAL: ([w ].[TCID]=[c ].[TCID]))
|--Clustered Index Scan(OBJECT: ([DBNAME].[dbo].[TableW].[PK_TableW] AS [w ]), WHERE: ([w ].[cEndDate]=NULL) ORDERED FORWARD)
|--Clustered Index Scan(OBJECT: ([DBNAME].[dbo].[TableC].[PK_TableC] AS [c ]), ORDERED FORWARD)

Sankar,

One thing is that the two queries are not equivalent. The IN query returns each row of TableC at most once, but the JOIN query may return rows from TableC more than once. That's where the join vs. semijoin difference comes in. From what you show here, I don't see a reason why the row sizes are different. In the graphical query plan, do you see more columns being passed through the plan in one query than in the other? I don't have an instance of 2000 handy to check whether you can inspect an operators Output List to see this information, but if you can see it, and there's a difference, can you figure out why more columns would need to be carried through?

I know that's not quite an answer to your question, but maybe it will help a bit.

Steve Kass

Drew University

http://www.stevekass.com

|||

Steve,

"One thing is that the two queries are not equivalent." I thought about this but TCID is PK in both the tables and we will get the row atmost once only. I checked the graphical plan and output list is the same for both queries. The "estimated row size" is different for the merge join operator not for the individual "index scan operators". I wonder whats causing this number to be lower for the semi join.

|||This part of my reply was suggesting how you might investigate the row size difference: "From what you show here, I don't see a reason why the row sizes are different. In the graphical query plan, do you see more columns being passed through the plan in one query than in the other? I don't have an instance of 2000 handy to check whether you can inspect an operators Output List to see this information, but if you can see it, and there's a difference, can you figure out why more columns would need to be carried through?"

SK
|||

Interesting stuff. I ran the same query against the SQL 2005 instance (not live version, trial version running on my pc) and the execution plan is same for both the queries. I didn't get the Semi Join for the subquery with IN clause and optimizer re-wrote it to the inner join.

No comments:

Post a Comment