Friday, March 30, 2012

Is there a smart set-based solution?

I have a column of starting times below (converted to seconds for
simplicity):
500
505
510
535
910
939
944
977
I need to assign a Group ID to the above values, based on a time
interval of 30 seconds, so the correct result would be:
Group ID
500, 1
505, 1
510, 1
535, 2
910, 3
939, 3
944, 4
977, 5
The value of Group ID isn't important (could be 100, 200, etc.) as long
as values in the same 30-second interval have the same Group ID. When a
new group is started, a new 30-second interval is also started.
I have created a solution using SQL loops (row-based processing) but
was wondering if there is an efficient/elegant set-based solution when
there are thousands of records?
Thanks
MowgliNo. Any set based solution will involve nested loops because the value of
the group id is dependent on the values of the starting time in the previous
rows. This is one of the rare cases in which a cursor will actually perform
faster than a set based solution because it can accomplish what you want in
a single pass through the table.
"Mowgli" <wmasterproxy@.yahoo.co.uk> wrote in message
news:1122677634.927113.124950@.g43g2000cwa.googlegroups.com...
> I have a column of starting times below (converted to seconds for
> simplicity):
> 500
> 505
> 510
> 535
> 910
> 939
> 944
> 977
> I need to assign a Group ID to the above values, based on a time
> interval of 30 seconds, so the correct result would be:
> Group ID
> 500, 1
> 505, 1
> 510, 1
> 535, 2
> 910, 3
> 939, 3
> 944, 4
> 977, 5
> The value of Group ID isn't important (could be 100, 200, etc.) as long
> as values in the same 30-second interval have the same Group ID. When a
> new group is started, a new 30-second interval is also started.
> I have created a solution using SQL loops (row-based processing) but
> was wondering if there is an efficient/elegant set-based solution when
> there are thousands of records?
>
> Thanks
> Mowgli
>|||If there are a lot of indexes on the table, you may want to cache the
results generated by the cursor in a table variable so that they can be
committed in a single set-based statement.
"Brian Selzer" <brian@.selzer-software.com> wrote in message
news:uf4PvFKlFHA.3260@.TK2MSFTNGP10.phx.gbl...
> No. Any set based solution will involve nested loops because the value of
> the group id is dependent on the values of the starting time in the
previous
> rows. This is one of the rare cases in which a cursor will actually
perform
> faster than a set based solution because it can accomplish what you want
in
> a single pass through the table.
> "Mowgli" <wmasterproxy@.yahoo.co.uk> wrote in message
> news:1122677634.927113.124950@.g43g2000cwa.googlegroups.com...
>|||>> I have created a solution using SQL loops (row-based processing) but was wonde
ring if there is an efficient/elegant set-based solution when there are thousands
of records [sic]? <<
Rows are not records. If you keep thinking in the wrong terms, you
will always think that you have to use loops. Create a simple table
for each group and then use a BETWEEN predicate. You can get an entire
year into about a million row table.
CREATE TABLE TimeGroups
(group_id INTEGER NOT NULL PRIMARY KEY
start_time INTEGER NOT NULL,
end_time INTEGER NOT NULL,
CHECK (start_time +30 = end_time));
You should find that this is about 10 times faster than a cursor as the
set gets larger.|||Unfortunately, your solution doesn't answer the original question. How do
you build the table in the first place? In this case a cursor is the
fastest way to find the group id's, because the alternative is a self-join,
which will always require more than one pass through the table. Not every
cursor is evil. Actually, you'll find that the time that it takes for the
cursor based solution increases linearly with the number of rows, whereas
the time that it takes for the set-based solution increases geometrically.
By the way, E. F. Codd used the term "record" in some of his papers. I
wonder, do you add [sic] when you refer to his work?
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1122689807.624405.127910@.g49g2000cwa.googlegroups.com...
was wondering if there is an efficient/elegant set-based solution when there
are thousands of records [sic]? <<
> Rows are not records. If you keep thinking in the wrong terms, you
> will always think that you have to use loops. Create a simple table
> for each group and then use a BETWEEN predicate. You can get an entire
> year into about a million row table.
> CREATE TABLE TimeGroups
> (group_id INTEGER NOT NULL PRIMARY KEY
> start_time INTEGER NOT NULL,
> end_time INTEGER NOT NULL,
> CHECK (start_time +30 = end_time));
> You should find that this is about 10 times faster than a cursor as the
> set gets larger.
>|||Hi Mr. Celko
You wrote... Rows are not records. If you keep thinking in the wrong terms,
you
will always think that you have to use loops.
Could you elaborate on that a bit.
many thanks
Shahriar
"--CELKO--" <jcelko212@.earthlink.net> wrote in message
news:1122689807.624405.127910@.g49g2000cwa.googlegroups.com...
> Rows are not records. If you keep thinking in the wrong terms, you
> will always think that you have to use loops. Create a simple table
> for each group and then use a BETWEEN predicate. You can get an entire
> year into about a million row table.
> CREATE TABLE TimeGroups
> (group_id INTEGER NOT NULL PRIMARY KEY
> start_time INTEGER NOT NULL,
> end_time INTEGER NOT NULL,
> CHECK (start_time +30 = end_time));
> You should find that this is about 10 times faster than a cursor as the
> set gets larger.
>|||Mowgli
I think this may just do it.
Table1 contains the field times.
Table2 contains 2 fields, Times and groupid
insert into table2 (times,groupid) (select times,times/30 from Table1)
Shahriar
"Mowgli" <wmasterproxy@.yahoo.co.uk> wrote in message
news:1122677634.927113.124950@.g43g2000cwa.googlegroups.com...
>I have a column of starting times below (converted to seconds for
> simplicity):
> 500
> 505
> 510
> 535
> 910
> 939
> 944
> 977
> I need to assign a Group ID to the above values, based on a time
> interval of 30 seconds, so the correct result would be:
> Group ID
> 500, 1
> 505, 1
> 510, 1
> 535, 2
> 910, 3
> 939, 3
> 944, 4
> 977, 5
> The value of Group ID isn't important (could be 100, 200, etc.) as long
> as values in the same 30-second interval have the same Group ID. When a
> new group is started, a new 30-second interval is also started.
> I have created a solution using SQL loops (row-based processing) but
> was wondering if there is an efficient/elegant set-based solution when
> there are thousands of records?
>
> Thanks
> Mowgli
>|||I tried that: it doesn't work correctly. The start time for each group
after the first depends on whether the start time of the first row for that
group is more than 30 seconds past the start time of the first row for the
previous group. If you use times/30 the interval that includes 500 runs
from 480 through 509, so the row with the 510 start time is in the next
interval. The expected results clearly show that the rows with 500 and 510
are in the same interval.
There is no set-based operation that can accurately calculate the group id
in a single pass. With a cursor, you can cache the start time of the
current group in a variable, and bump the current group id if the start time
for the current cursor row is more than 30 seconds past the cached start
time for the current group.
"Shahriar" <HelloShahriar@.hotmail.com> wrote in message
news:IqEGe.97$4e6.80@.trnddc04...
> Mowgli
> I think this may just do it.
> Table1 contains the field times.
> Table2 contains 2 fields, Times and groupid
>
> insert into table2 (times,groupid) (select times,times/30 from Table1)
> Shahriar
>
> "Mowgli" <wmasterproxy@.yahoo.co.uk> wrote in message
> news:1122677634.927113.124950@.g43g2000cwa.googlegroups.com...
>|||hi,
if u are not particular about the groupID, i have a solution for u.
but as per the requirement, group ID definately falls in 30sec intervals.
select num, (num-(select min(num)from #test))/30 GroupID from #test
please let me know if this answers your question
best Regards,
Chandra
http://chanduas.blogspot.com/
http://groups.msn.com/SQLResource/
---
"Mowgli" wrote:

> I have a column of starting times below (converted to seconds for
> simplicity):
> 500
> 505
> 510
> 535
> 910
> 939
> 944
> 977
> I need to assign a Group ID to the above values, based on a time
> interval of 30 seconds, so the correct result would be:
> Group ID
> 500, 1
> 505, 1
> 510, 1
> 535, 2
> 910, 3
> 939, 3
> 944, 4
> 977, 5
> The value of Group ID isn't important (could be 100, 200, etc.) as long
> as values in the same 30-second interval have the same Group ID. When a
> new group is started, a new 30-second interval is also started.
> I have created a solution using SQL loops (row-based processing) but
> was wondering if there is an efficient/elegant set-based solution when
> there are thousands of records?
>
> Thanks
> Mowgli
>|||Brian, Shahriar
Thanks for your input and for concentrating on the problem rather than
on my terminology!
The example I gave uses a time interval of 30 seconds but this can also
be variable (20, 40, 60, etc.).
Brian, perhaps you could demonstrate the inefficient set-based solution
to illustrate your point?
... As the issues may not be clear to anyone who hasn't attempted this
type of "real-life" problem.

No comments:

Post a Comment