Friday, March 9, 2012

Is SELECT MAX(ID) expensive when you have an index on ID?

Hi All,
As you can perhaps tell from the subject I have a need to frequently
find out the largest id value in a data table. The table is currently a
few million rows. It will grow to potentially 100 million or so before
being archived.
My question is, will doing frequent MAX(ID) operations really hurt the
db. I've looked at the exec plan and its doing an index scan and the IO
cost is 8.3. That sounds to me like it's scaning the entire ID index
which to my mind sounds like it might be really slow if I need to call
this every few seconds.
Would I be right in thinking this, or does the fact that its using an
index mean it can just jump to the end of the index and look at the last
value?
Any thoughts would be much appreciated
Kindest Regards
SimonSimon
Have you considered using an indexed view ?
"Simon Harvey" <nothanks@.hotmail.com> wrote in message
news:eyDY%23C8PHHA.3668@.TK2MSFTNGP02.phx.gbl...
> Hi All,
> As you can perhaps tell from the subject I have a need to frequently find
> out the largest id value in a data table. The table is currently a few
> million rows. It will grow to potentially 100 million or so before being
> archived.
> My question is, will doing frequent MAX(ID) operations really hurt the db.
> I've looked at the exec plan and its doing an index scan and the IO cost
> is 8.3. That sounds to me like it's scaning the entire ID index which to
> my mind sounds like it might be really slow if I need to call this every
> few seconds.
> Would I be right in thinking this, or does the fact that its using an
> index mean it can just jump to the end of the index and look at the last
> value?
> Any thoughts would be much appreciated
> Kindest Regards
> Simon|||A question for you: what does max(id) mean to you in business terms? If it's
a part of a proprietary method to calculate the next id and it needs to be a
number (int, bigint, etc.) then consider using IDENTITY - it's simpler to
implement and is generally less resource-hungry and requires less locking
(e.g. at inserts).
If, however, max(id) leads you to the last row inserted then you could
consider using another method without an aggregate function.
E.g.:
select top 1
..
order by RowCreatedDateTime desc
,id desc
ML
http://milambda.blogspot.com/|||Hi guys,
Thanks for your replies.
To answer your questions:
Uri - I haven't considered indexed views mainly because I don't know
enough about them to see how they would address the performance issue
ML - I know I can get the last value inserted into the table that way,
but I actually need the HIGHEST value, which unusually for an ID column
isn't necessarily the last value added. The column is actually called
downloadid, it's generated by an external system that I have no control
over. The values are guaranteed to be unique, but not within any order
unfortunately.
So is there a better option than MAX?
Many thanks
Simon|||No, not really.
I think, that if you take a good look at what's *really* going on, you
will be a bit less worried. ;-) SQL Server *is* scanning the index,
but it's just doing a ordered partial scan on a very small amount of
data, not an entire index scan.
Check this out:
create table foo (id int not null, asdfj varchar(500))
SET NOCOUNT ON
declare @.count int
set @.count = 0
while @.count < 10000 begin
insert into foo select @.count, @.count
end
create index ix_foo on foo(id)
set statistics io on
--Not a lot of reads!
select max(id) from foo
--Note that this returns:
---
--Table 'foo'. Scan count 1, logical reads 6,
--physical reads 0, read-ahead reads 0,
--lob logical reads 0, lob physical reads 0,
--lob read-ahead reads 0.
--A lot of reads!
select count(*) from foo
--Whereas this one actually scans the whole index.
--Table 'foo'. Scan count 1, logical reads 16991, physical reads 0,
--read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
--read-ahead reads 0.
-Dave
Simon Harvey wrote:
> Hi guys,
> Thanks for your replies.
> To answer your questions:
> Uri - I haven't considered indexed views mainly because I don't know
> enough about them to see how they would address the performance issue
> ML - I know I can get the last value inserted into the table that way,
> but I actually need the HIGHEST value, which unusually for an ID column
> isn't necessarily the last value added. The column is actually called
> downloadid, it's generated by an external system that I have no control
> over. The values are guaranteed to be unique, but not within any order
> unfortunately.
> So is there a better option than MAX?
> Many thanks
> Simon
-Dave Markle
http://www.markleconsulting.com/blog|||Oh thats good - I was hoping that it would be smart about it.
I just got a bit worried when I noticed that the estimated IO cost and
estimated CPU cost in the execution plan were 8.5 and 3.1 respectively.
I've always though that those values are measured in seconds - but that
can't be right given what you've shown me.
Thanks
Simon

No comments:

Post a Comment