Friday, February 24, 2012

IS NULL on field is not using index placed on that field

Hi All,
I have the table structure as follows
->Deliveries (table)
|__OrderID (Guid)
|__DeliverDate (DateTime) (ALLOW NULLS)
Index is on DeliverDate (ASC)
60000 records
look at the query plan for the following and is using the proper index:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
If I now change the query to:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
NULL
The query plan does not choose the DeliverDate index anymore?
What is the reason for this?
Regards,
Adrian.Adrian Bezzina wrote:
> Hi All,
> I have the table structure as follows
> ->Deliveries (table)
> |__OrderID (Guid)
> |__DeliverDate (DateTime) (ALLOW NULLS)
> Index is on DeliverDate (ASC)
> 60000 records
> look at the query plan for the following and is using the proper
> index:
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
> If I now change the query to:
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR
> DeliverDate IS NULL
> The query plan does not choose the DeliverDate index anymore?
> What is the reason for this?
> Regards,
> Adrian.
OR statements can sometimes be an index killer. It could also be because
you have a lot of NULL values in the index.
The first thing you should do is never use SELECT *. For one thing, it
almost always returns more column information than required. It also
does not give SQL Server the opportunity to use a covering index (unless
all columns are covered).
OR statements are, many times, not optimizeable. Even if they are, SQL
Server will only use a non-clustered index if the number of rows
returned is low enough that using the index doesn't cause more overhead
than scanning the table. Remember, unless your index is covering, every
index s requires a bookmark lookup operation (especially if you use
SELECT *).
You could try using:
Select col1, col2
From Deliveries
Where DeliveryDate = '2003-12-14'
UNION ALL
Select col1, col2
From Deliveries
Where DeliveryDate IS NULL
or you could try changing your index to clustered.
David Gugick
Imceda Software
www.imceda.com|||What does it do? A table/clustered index scan? We don't really have
enough info to answer - can you post the execution plans & the DDL
(table schema and schema for all indexes on the table)? To get the
execution plans just turn on showplan_text before the select statements
and off after the select statements. That is,
set showplan_text on
go
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS NU
LL
go
set showplan_text off
go
When I tried it on a tmp table (I had only a nonclustered index on DeliverDa
te) it chose:
a nonclustered index s for:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
a table scan for:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS NU
LL
a nonclustered index s for:
SELECT * FROM Deliveries WHERE DeliverDate IS NULL
I guess the optimiser decided that it was more efficient to scan the table o
nce than to do 2 s passes (+ associated lookups) in the DeliverDate noncl
ustered index for the query with the 2 where criteria. When I added a clust
ered index on OrderID, the
table scan operation turned into a scan on the DeliverDate nonclustered inde
x because the index suddenly became a covering index (since all nonclustered
indexes also contain the clustering key, thereby implicitly making the Deli
verDate index really (Deliv
erDate, OrderID)).
My quick tests only had 10 rows of data. It would make a difference if I wa
s running the SELECT queries against a bigger table.
*mike hodgson* |/ database administrator/ | mallesons stephen jaques
*T* +61 (2) 9296 3668 |* F* +61 (2) 9296 3885 |* M* +61 (408) 675 907
*E* mailto:mike.hodgson@.mallesons.nospam.com |* W* http://www.mallesons.com
Adrian Bezzina wrote:

>Hi All,
>I have the table structure as follows
>->Deliveries (table)
> |__OrderID (Guid)
> |__DeliverDate (DateTime) (ALLOW NULLS)
>Index is on DeliverDate (ASC)
>60000 records
>look at the query plan for the following and is using the proper index:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>If I now change the query to:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>NULL
>The query plan does not choose the DeliverDate index anymore?
>What is the reason for this?
>Regards,
>Adrian.
>
>|||Mike,
Thanks for the prompt reply.
after executing (i know the tables and fields have changed) the query on the
following table:
Consign (table)
|
|_ [UNIQUE] GUID Field - PrimaryKey.
|
|_ ORIGINAL_DATE DateTime Field - Clustered Index on this field alone.
|
|_ DELIVER_DATE DateTime Field - non clustered index on this field.
set showplan_text on
go
SELECT * FROM Consign WHERE Deliver_date = '2004-2-3'
SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS
NULL
go
set showplan_text off
go
the plan for the first query is doing a 'Index Scan' on the non-clustered
index on the deliver date field
the plan for the second query is doing a 'Clustered Index Scan' on a
clustered index - original_date and i suppose this is the equivalent to a
table scan as I have moved the clustered index off the primary key to a more
ordered field (in this case the original_date field).
I suppose what I am really asking is:
Are null values placed in an Index ? and if so does that effect the qeury if
IS NULL (or IS NOT NULL) affect the selection of which index to use (sorry
about changing the table structure, but i am showing you the indexes in
question).
Regards,
Adrian.
"Mike Hodgson" <mike.hodgson@.mallesons.nospam.com> wrote in message
news:e$tD6MWGFHA.1932@.TK2MSFTNGP14.phx.gbl...
What does it do? A table/clustered index scan? We don't really have enough
info to answer - can you post the execution plans & the DDL (table schema
and schema for all indexes on the table)? To get the execution plans just
turn on showplan_text before the select statements and off after the select
statements. That is,
set showplan_text on
go
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
NULL
go
set showplan_text off
go
When I tried it on a tmp table (I had only a nonclustered index on
DeliverDate) it chose:
a nonclustered index s for:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
a table scan for:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
NULL
a nonclustered index s for:
SELECT * FROM Deliveries WHERE DeliverDate IS NULL
I guess the optimiser decided that it was more efficient to scan the table
once than to do 2 s passes (+ associated lookups) in the DeliverDate
nonclustered index for the query with the 2 where criteria. When I added a
clustered index on OrderID, the table scan operation turned into a scan on
the DeliverDate nonclustered index because the index suddenly became a
covering index (since all nonclustered indexes also contain the clustering
key, thereby implicitly making the DeliverDate index really (DeliverDate,
OrderID)).
My quick tests only had 10 rows of data. It would make a difference if I
was running the SELECT queries against a bigger table.
mike hodgson | database administrator | mallesons stephen jaques
T +61 (2) 9296 3668 | F +61 (2) 9296 3885 | M +61 (408) 675 907
E mailto:mike.hodgson@.mallesons.nospam.com | W http://www.mallesons.com
Adrian Bezzina wrote:
Hi All,
I have the table structure as follows
->Deliveries (table)
|__OrderID (Guid)
|__DeliverDate (DateTime) (ALLOW NULLS)
Index is on DeliverDate (ASC)
60000 records
look at the query plan for the following and is using the proper index:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
If I now change the query to:
SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
NULL
The query plan does not choose the DeliverDate index anymore?
What is the reason for this?
Regards,
Adrian.|||Adrian,
The optimizer estimates the number of rows a query will return
as part of the optimization process. The estimated number of
rows for which (Deliver_date = '2004-2-3') is going to be fewer
than the number of rows for which (Deliver_date = '2004-2-3')
OR (Deliver_date IS NULL). While the nonclustered index
on Deliver_date can identify these rows in either case (yes, the
NULLs are in the index), the problem is that it can only identify
the PrimaryKey values of these rows. The query is SELECT *
so for every one of these rows, it is still necessary to look up
all the other column values.
Once there are a lot of rows in the result (according to the
optimizer estimate), there is no point gathering all the primary
key values and looking each one up in the main table. If you
will end up looking at most of the data pages of the main table
anyway, why not just scan the table and forget about using the index.
In your case, the Consign table has very narrow rows, with
about 24 bytes per row. As a result, about 300 rows fit on
every data page of the table. Even if only 0.5% of the rows
are expected to satisfy the query condition, that's 1 in 200 rows,
or at least one row per data page. The index doesn't help
in this case.
It's possible the index would help, if all those 0.5% of the rows
are all together on a few data pages, but the optimizer doesn't
have this kind of information available when it decides on the
best-estimate plan.
Steve Kass
Drew University
Adrian Bezzina wrote:

>Mike,
>Thanks for the prompt reply.
>after executing (i know the tables and fields have changed) the query on th
e
>following table:
>Consign (table)
> |
> |_ [UNIQUE] GUID Field - PrimaryKey.
> |
> |_ ORIGINAL_DATE DateTime Field - Clustered Index on this field alone.
> |
> |_ DELIVER_DATE DateTime Field - non clustered index on this field.
>set showplan_text on
>go
>SELECT * FROM Consign WHERE Deliver_date = '2004-2-3'
>SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS
>NULL
>go
>set showplan_text off
>go
>the plan for the first query is doing a 'Index Scan' on the non-clustered
>index on the deliver date field
>the plan for the second query is doing a 'Clustered Index Scan' on a
>clustered index - original_date and i suppose this is the equivalent to a
>table scan as I have moved the clustered index off the primary key to a mor
e
>ordered field (in this case the original_date field).
>I suppose what I am really asking is:
>Are null values placed in an Index ? and if so does that effect the qeury i
f
>IS NULL (or IS NOT NULL) affect the selection of which index to use (sorry
>about changing the table structure, but i am showing you the indexes in
>question).
>Regards,
>Adrian.
>--
>"Mike Hodgson" <mike.hodgson@.mallesons.nospam.com> wrote in message
>news:e$tD6MWGFHA.1932@.TK2MSFTNGP14.phx.gbl...
>What does it do? A table/clustered index scan? We don't really have enoug
h
>info to answer - can you post the execution plans & the DDL (table schema
>and schema for all indexes on the table)? To get the execution plans just
>turn on showplan_text before the select statements and off after the select
>statements. That is,
>set showplan_text on
>go
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>NULL
>go
>set showplan_text off
>go
>When I tried it on a tmp table (I had only a nonclustered index on
>DeliverDate) it chose:
>a nonclustered index s for:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>a table scan for:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>NULL
>a nonclustered index s for:
>SELECT * FROM Deliveries WHERE DeliverDate IS NULL
>I guess the optimiser decided that it was more efficient to scan the table
>once than to do 2 s passes (+ associated lookups) in the DeliverDate
>nonclustered index for the query with the 2 where criteria. When I added a
>clustered index on OrderID, the table scan operation turned into a scan on
>the DeliverDate nonclustered index because the index suddenly became a
>covering index (since all nonclustered indexes also contain the clustering
>key, thereby implicitly making the DeliverDate index really (DeliverDate,
>OrderID)).
>My quick tests only had 10 rows of data. It would make a difference if I
>was running the SELECT queries against a bigger table.
>--
>mike hodgson | database administrator | mallesons stephen jaques
>T +61 (2) 9296 3668 | F +61 (2) 9296 3885 | M +61 (408) 675 907
>E mailto:mike.hodgson@.mallesons.nospam.com | W http://www.mallesons.com
>
>Adrian Bezzina wrote:
>Hi All,
>I have the table structure as follows
>->Deliveries (table)
> |__OrderID (Guid)
> |__DeliverDate (DateTime) (ALLOW NULLS)
>Index is on DeliverDate (ASC)
>60000 records
>look at the query plan for the following and is using the proper index:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>If I now change the query to:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>NULL
>The query plan does not choose the DeliverDate index anymore?
>What is the reason for this?
>Regards,
>Adrian.
>
>
>
>|||Guys,
Thanks for the help (I can pass a paremeter as follows):
Instead of:
SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS
NULL
I can do this (work nicely):
SELECT * FROM Consign WHERE ISNULL(Deliver_date, '2004-2-3') = '2004-2-3'
This always uses the index on DELIVER_DATE
:)
BTW its actually called as follows:
DECLARE @.deliverDate AS DATETIME
SET @.deliverDate = '2004-2-3'
SELECT * FROM Consign WHERE ISNULL(Deliver_date, @.deliverDate) =
@.deliverDate
Works like a treat.
Thanks
Adrian.
"Adrian Bezzina" <Adrian@.tristalee.com.au> wrote in message
news:OnSmvgWGFHA.1740@.TK2MSFTNGP09.phx.gbl...
> Mike,
> Thanks for the prompt reply.
> after executing (i know the tables and fields have changed) the query on
> the following table:
> Consign (table)
> |
> |_ [UNIQUE] GUID Field - PrimaryKey.
> |
> |_ ORIGINAL_DATE DateTime Field - Clustered Index on this field alone.
> |
> |_ DELIVER_DATE DateTime Field - non clustered index on this field.
> set showplan_text on
> go
> SELECT * FROM Consign WHERE Deliver_date = '2004-2-3'
> SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS
> NULL
> go
> set showplan_text off
> go
> the plan for the first query is doing a 'Index Scan' on the non-clustered
> index on the deliver date field
> the plan for the second query is doing a 'Clustered Index Scan' on a
> clustered index - original_date and i suppose this is the equivalent to a
> table scan as I have moved the clustered index off the primary key to a
> more ordered field (in this case the original_date field).
> I suppose what I am really asking is:
> Are null values placed in an Index ? and if so does that effect the qeury
> if IS NULL (or IS NOT NULL) affect the selection of which index to use
> (sorry about changing the table structure, but i am showing you the
> indexes in question).
> Regards,
> Adrian.
> --
> "Mike Hodgson" <mike.hodgson@.mallesons.nospam.com> wrote in message
> news:e$tD6MWGFHA.1932@.TK2MSFTNGP14.phx.gbl...
> What does it do? A table/clustered index scan? We don't really have
> enough info to answer - can you post the execution plans & the DDL (table
> schema and schema for all indexes on the table)? To get the execution
> plans just turn on showplan_text before the select statements and off
> after the select statements. That is,
> set showplan_text on
> go
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
> NULL
> go
> set showplan_text off
> go
> When I tried it on a tmp table (I had only a nonclustered index on
> DeliverDate) it chose:
> a nonclustered index s for:
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
> a table scan for:
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
> NULL
> a nonclustered index s for:
> SELECT * FROM Deliveries WHERE DeliverDate IS NULL
> I guess the optimiser decided that it was more efficient to scan the table
> once than to do 2 s passes (+ associated lookups) in the DeliverDate
> nonclustered index for the query with the 2 where criteria. When I added
> a clustered index on OrderID, the table scan operation turned into a scan
> on the DeliverDate nonclustered index because the index suddenly became a
> covering index (since all nonclustered indexes also contain the clustering
> key, thereby implicitly making the DeliverDate index really (DeliverDate,
> OrderID)).
> My quick tests only had 10 rows of data. It would make a difference if I
> was running the SELECT queries against a bigger table.
> --
> mike hodgson | database administrator | mallesons stephen jaques
> T +61 (2) 9296 3668 | F +61 (2) 9296 3885 | M +61 (408) 675 907
> E mailto:mike.hodgson@.mallesons.nospam.com | W http://www.mallesons.com
>
> Adrian Bezzina wrote:
> Hi All,
> I have the table structure as follows
> ->Deliveries (table)
> |__OrderID (Guid)
> |__DeliverDate (DateTime) (ALLOW NULLS)
> Index is on DeliverDate (ASC)
> 60000 records
> look at the query plan for the following and is using the proper index:
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
> If I now change the query to:
> SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
> NULL
> The query plan does not choose the DeliverDate index anymore?
> What is the reason for this?
> Regards,
> Adrian.
>
>
>|||Indexes (clustered and nonclustered) include nulls (they're just another
value, albeit a special one). But, when you have the 2 criteria in your
WHERE clause the query engine needs to look through 2 separate ranges of
values in the index (the null values and the '2004-2-3' values)...sort
of. So that's 2 s operations (plus the associated lookups into the
clustered index to get the rest of the data you request in your column
list). So that's twice as much work as if you just have the single
condition in your WHERE clause (DeliverDate = '2004-2-3'). If you just
say "WHERE DeliverDate IS NULL" you get a nice quick plan using your
nonclustered index too. When the query optimiser comes up with an
execution plan (in your case) it figures out pretty quickly that it can
get the data much more efficiently by just scanning the whole clustered
index until it's got all the data it needs, rather than flicking through
the nonclustered index (twice) and looking up the matching leaf nodes in
the clustered index as well.
You'd have to test it out to be sure but I suspect the UNION ALL that
David suggested would not really be any more efficient just because on
the surface of it I'm guessing it would do pretty much the same
operations (just in 2 separate SELECT statements) as the single query
with the OR. The OR operator can be nasty and should be avoided if
possible but sometimes it has it's place.
The only way the nulls in the index affect the index selection process
is that they serve as yet another value to sort in the index. At least
I think that's the only affect they have. With the non-null values, the
optimiser can reorder the WHERE clauses so that it doesn't have to keep
reversing direction in the index to find the noncontinuous rows; not so
with the nulls it would seem.
If you have a big enough, fragmented enough index it will get avoided
all together. I repeated the exercise but included a 3rd column
(char(3000)), just to pad out the rows so I'd get lots of page splits &
index fragmentation, and inserted 3600 rows instead of the 10 rows I had
before (the 10 rows would have been on 1 page but the 3600 rows, each
3000+ bytes, would be split over at least 1800 pages - 2 rows per
page). Selectivity was still the same but the optimiser thought (in all
3 cases (where date = x, where date = x or date is null, and where date
is null)) that it would be better to just do a table scan. I even took
out the null values and made the DeliverDate column NOT NULL and it made
no difference.
Sometimes you just have to trust that the optimiser knows best (which it
usually does). Of course, this could all be crap (I'm getting tired)
but it sounds logical to me.
*mike hodgson* |/ database administrator/ | mallesons stephen jaques
*T* +61 (2) 9296 3668 |* F* +61 (2) 9296 3885 |* M* +61 (408) 675 907
*E* mailto:mike.hodgson@.mallesons.nospam.com |* W* http://www.mallesons.com
Adrian Bezzina wrote:

>Mike,
>Thanks for the prompt reply.
>after executing (i know the tables and fields have changed) the query on th
e
>following table:
>Consign (table)
> |
> |_ [UNIQUE] GUID Field - PrimaryKey.
> |
> |_ ORIGINAL_DATE DateTime Field - Clustered Index on this field alone.
> |
> |_ DELIVER_DATE DateTime Field - non clustered index on this field.
>set showplan_text on
>go
>SELECT * FROM Consign WHERE Deliver_date = '2004-2-3'
>SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS
>NULL
>go
>set showplan_text off
>go
>the plan for the first query is doing a 'Index Scan' on the non-clustered
>index on the deliver date field
>the plan for the second query is doing a 'Clustered Index Scan' on a
>clustered index - original_date and i suppose this is the equivalent to a
>table scan as I have moved the clustered index off the primary key to a mor
e
>ordered field (in this case the original_date field).
>I suppose what I am really asking is:
>Are null values placed in an Index ? and if so does that effect the qeury i
f
>IS NULL (or IS NOT NULL) affect the selection of which index to use (sorry
>about changing the table structure, but i am showing you the indexes in
>question).
>Regards,
>Adrian.
>--
>"Mike Hodgson" <mike.hodgson@.mallesons.nospam.com> wrote in message
>news:e$tD6MWGFHA.1932@.TK2MSFTNGP14.phx.gbl...
>What does it do? A table/clustered index scan? We don't really have enoug
h
>info to answer - can you post the execution plans & the DDL (table schema
>and schema for all indexes on the table)? To get the execution plans just
>turn on showplan_text before the select statements and off after the select
>statements. That is,
>set showplan_text on
>go
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>NULL
>go
>set showplan_text off
>go
>When I tried it on a tmp table (I had only a nonclustered index on
>DeliverDate) it chose:
>a nonclustered index s for:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>a table scan for:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>NULL
>a nonclustered index s for:
>SELECT * FROM Deliveries WHERE DeliverDate IS NULL
>I guess the optimiser decided that it was more efficient to scan the table
>once than to do 2 s passes (+ associated lookups) in the DeliverDate
>nonclustered index for the query with the 2 where criteria. When I added a
>clustered index on OrderID, the table scan operation turned into a scan on
>the DeliverDate nonclustered index because the index suddenly became a
>covering index (since all nonclustered indexes also contain the clustering
>key, thereby implicitly making the DeliverDate index really (DeliverDate,
>OrderID)).
>My quick tests only had 10 rows of data. It would make a difference if I
>was running the SELECT queries against a bigger table.
>--
>mike hodgson | database administrator | mallesons stephen jaques
>T +61 (2) 9296 3668 | F +61 (2) 9296 3885 | M +61 (408) 675 907
>E mailto:mike.hodgson@.mallesons.nospam.com | W http://www.mallesons.com
>
>Adrian Bezzina wrote:
>Hi All,
>I have the table structure as follows
>->Deliveries (table)
> |__OrderID (Guid)
> |__DeliverDate (DateTime) (ALLOW NULLS)
>Index is on DeliverDate (ASC)
>60000 records
>look at the query plan for the following and is using the proper index:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3'
>If I now change the query to:
>SELECT * FROM Deliveries WHERE DeliverDate = '2004-2-3' OR DeliverDate IS
>NULL
>The query plan does not choose the DeliverDate index anymore?
>What is the reason for this?
>Regards,
>Adrian.
>
>
>
>|||Again thanks Guys,
But i am still intriugued to know why the first query ( this table has now
been built from scratch) uses the pk or clustered index if there is one:
(In this case its the Clustered Index on OriginalDate)
SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS
NULL
And this uses the indexed created on Deliver_date:
SELECT * FROM Consign WHERE ISNULL(Deliver_date, '2004-2-3') = '2004-2-3'
Does the OR make this much of a difference?
Thanks,
Adrian.
"Steve Kass" <skass@.drew.edu> wrote in message
news:OtLB6qWGFHA.3648@.TK2MSFTNGP09.phx.gbl...
> Adrian,
> The optimizer estimates the number of rows a query will return
> as part of the optimization process. The estimated number of
> rows for which (Deliver_date = '2004-2-3') is going to be fewer
> than the number of rows for which (Deliver_date = '2004-2-3')
> OR (Deliver_date IS NULL). While the nonclustered index
> on Deliver_date can identify these rows in either case (yes, the
> NULLs are in the index), the problem is that it can only identify
> the PrimaryKey values of these rows. The query is SELECT *
> so for every one of these rows, it is still necessary to look up
> all the other column values.
> Once there are a lot of rows in the result (according to the
> optimizer estimate), there is no point gathering all the primary
> key values and looking each one up in the main table. If you
> will end up looking at most of the data pages of the main table
> anyway, why not just scan the table and forget about using the index.
> In your case, the Consign table has very narrow rows, with
> about 24 bytes per row. As a result, about 300 rows fit on
> every data page of the table. Even if only 0.5% of the rows
> are expected to satisfy the query condition, that's 1 in 200 rows,
> or at least one row per data page. The index doesn't help
> in this case.
> It's possible the index would help, if all those 0.5% of the rows
> are all together on a few data pages, but the optimizer doesn't
> have this kind of information available when it decides on the
> best-estimate plan.
> Steve Kass
> Drew University
> Adrian Bezzina wrote:
>|||Upon reflection, most of this sounds like crap to me now (not sure what
I was on at the time). As such, disregard most of it. (Not having a
good day.)
*mike hodgson* |/ database administrator/ | mallesons stephen jaques
*T* +61 (2) 9296 3668 |* F* +61 (2) 9296 3885 |* M* +61 (408) 675 907
*E* mailto:mike.hodgson@.mallesons.nospam.com |* W* http://www.mallesons.com
Mike Hodgson wrote:
> Indexes (clustered and nonclustered) include nulls (they're just
> another value, albeit a special one). But, when you have the 2
> criteria in your WHERE clause the query engine needs to look through 2
> separate ranges of values in the index (the null values and the
> '2004-2-3' values)...sort of. So that's 2 s operations (plus the
> associated lookups into the clustered index to get the rest of the
> data you request in your column list). So that's twice as much work
> as if you just have the single condition in your WHERE clause
> (DeliverDate = '2004-2-3'). If you just say "WHERE DeliverDate IS
> NULL" you get a nice quick plan using your nonclustered index too.
> When the query optimiser comes up with an execution plan (in your
> case) it figures out pretty quickly that it can get the data much more
> efficiently by just scanning the whole clustered index until it's got
> all the data it needs, rather than flicking through the nonclustered
> index (twice) and looking up the matching leaf nodes in the clustered
> index as well.
> You'd have to test it out to be sure but I suspect the UNION ALL that
> David suggested would not really be any more efficient just because on
> the surface of it I'm guessing it would do pretty much the same
> operations (just in 2 separate SELECT statements) as the single query
> with the OR. The OR operator can be nasty and should be avoided if
> possible but sometimes it has it's place.
> The only way the nulls in the index affect the index selection process
> is that they serve as yet another value to sort in the index. At
> least I think that's the only affect they have. With the non-null
> values, the optimiser can reorder the WHERE clauses so that it doesn't
> have to keep reversing direction in the index to find the
> noncontinuous rows; not so with the nulls it would seem.
> If you have a big enough, fragmented enough index it will get avoided
> all together. I repeated the exercise but included a 3rd column
> (char(3000)), just to pad out the rows so I'd get lots of page splits
> & index fragmentation, and inserted 3600 rows instead of the 10 rows I
> had before (the 10 rows would have been on 1 page but the 3600 rows,
> each 3000+ bytes, would be split over at least 1800 pages - 2 rows per
> page). Selectivity was still the same but the optimiser thought (in
> all 3 cases (where date = x, where date = x or date is null, and where
> date is null)) that it would be better to just do a table scan. I
> even took out the null values and made the DeliverDate column NOT NULL
> and it made no difference.
> Sometimes you just have to trust that the optimiser knows best (which
> it usually does). Of course, this could all be crap (I'm getting
> tired) but it sounds logical to me.
> --
> *mike hodgson* |/ database administrator/ | mallesons stephen jaques
> *T* +61 (2) 9296 3668 |* F* +61 (2) 9296 3885 |* M* +61 (408) 675 907
> *E* mailto:mike.hodgson@.mallesons.nospam.com |* W*
> http://www.mallesons.com
>
> Adrian Bezzina wrote:
>|||Can you specify which index operation it is using on the clustered index. Re
member that a clustered
index scan = table scan.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Adrian Bezzina" <AdrianRB@.bigpond.net.au> wrote in message
news:OqpQViXGFHA.1260@.TK2MSFTNGP12.phx.gbl...
> Again thanks Guys,
> But i am still intriugued to know why the first query ( this table has now
been built from
> scratch) uses the pk or clustered index if there is one:
> (In this case its the Clustered Index on OriginalDate)
> SELECT * FROM Consign WHERE Deliver_date = '2004-2-3' OR Deliver_date IS N
ULL
> And this uses the indexed created on Deliver_date:
> SELECT * FROM Consign WHERE ISNULL(Deliver_date, '2004-2-3') = '2004-2-3'
> Does the OR make this much of a difference?
> Thanks,
> Adrian.
> "Steve Kass" <skass@.drew.edu> wrote in message news:OtLB6qWGFHA.3648@.TK2MS
FTNGP09.phx.gbl...
>

No comments:

Post a Comment