Showing posts with label query. Show all posts
Showing posts with label query. Show all posts

Friday, March 30, 2012

Is there a simple way to get everything there is to get from an XmlReader?

THE TASK
Using Visual Studio .NET 2003 and SQL Server 2000, I want to create a .NET application that gets SQL Server data obtained from a SELECT query with a FOR XML clause. I have seen the solution a long time ago in the MSCD .NET book "Developing Windows-Based Applications" for 70-306 and 70-316 and I thought, this looks really simple, I don't need to try it out. Now that I need to do it I am surprised how difficult it is.

TO RUN THE EXAMPLE
The code below demonstrates the expected solution based on the book and the failure of that solution. To run this example, create a Windows Forms project, put a TextBox called textBox1 and two Buttons called button1 and button2 on the Form. Double-click each button to generate the methods that handle the click events. Replace "MySQLServer" and "MyWorkstation" with your own names. The pubs database should be present, otherwise modify the connection string and the command text as well. Compile it and click button1.

THE OUTCOME
The text in textBox1 shows four XML elements but there are eight records in the publishers database. The pub_ID values show that every second record has been omitted.

THE PROBLEM
The code should be self-explanatory, so let's look directly at the while loop and the ReadOuterXml method. That's where the problem occurs.
The framework documentation reveals a bad surprise. If the XmlReader is positioned on a node at the leaf level, ReadOuterXml behaves like Read. This means the reader advances to the next position. In combination with Read this means that the reader advances by 2 positions although the while loop has only looped once. This may or may not happen depending on whether the reader happens to be on a leaf node or node. Try to play with nested parent and child records and you will see how funny this can be.
Apparently, this means there is no easy way to reliably loop through everything.
I find this behavior of the ReadOuterXml method very disappointing especially since there seems to be no straightforward way to control this behavior. When I research the subject, I only find articles that simply say "//Do some parsing here" in the while loop. I disagree. One should not do any parsing there. One should not query the node type and then reconstruct every aspect of the XML for every theroetically possible node type and write a lot of code to that end. Instead, one should trust that SQL Server has produced correct data and go ahead and save the data to a file. Compare how well-behaved the SqlDataReader is!
I wish the XmlReader had a ReadEverythingThereIsToReadAndReturnItAsATextStream method. Is there a known and simple solution to accomplish what that method whoud do?

THE CODE
private void button1_Click(object sender, System.EventArgs e)
{
string connectionString = "data source=MySQLServer;initial catalog=pubs;integrated security=SSPI;persist security info=False;workstation id=MyWorkStation;packet size=4096";
System.Data.SqlClient.SqlConnection connection = new System.Data.SqlClient.SqlConnection(connectionString);
string cmdText = "SELECT * FROM publishers FOR XML RAW";
System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand(cmdText, connection);
this.textBox1.Text = String.Empty;
connection.Open();
System.Xml.XmlReader reader = cmd.ExecuteXmlReader();
while (reader.Read())
{
this.textBox1.AppendText(reader.ReadOuterXml());
}
reader.Close();
connection.Close();
}

private void button2_Click(object sender, System.EventArgs e)
{
string filename = Application.StartupPath + System.IO.Path.DirectorySeparatorChar + "publishers.xml";
System.Xml.XmlTextWriter writer = new System.Xml.XmlTextWriter(filename, System.Text.Encoding.UTF8);
writer.Formatting = System.Xml.Formatting.Indented;
writer.WriteStartDocument();
writer.WriteStartElement("Publishers");
writer.WriteRaw(this.textBox1.Text);
writer.WriteEndDocument();
writer.Close();
}

If you want to write out everything the XmlReader returns then open up an XmlTextWriter and simply do xmlTextWriter.WriteNode(xmlReader, false).|||

Thank you Martin Honnen. What you wrote is useful, indeed.

HOWEVER!

I figured out a much simpler solution.

If there is no need to do any parsing on the XML that SQL Server returns, don't use an XmlReader and a while loop at all. Instead, do this:

string xmlResult = (string)cmd.ExecuteScalar();

I still think that the ReadOuterXml method (ReadInnerXml as well) should behave in a more controlled way, because as it is it's quite useless. But anyway, we have learned enough about this. Let's move on to new problems.

Wednesday, March 28, 2012

Is There a Quick Way to Get a Total Match Count (estimate is OK) of a SQL FullText Query?

Hi,
I am using the June CTP release of SQL Server 2005 on Windows Sever 2003.
Is there a quick way to get a total natch count (estimate is OK)
of a SQL FullText query? I am working with about 10 million r
ows of simple character data.
The following SQL simply returns too slow due to disk IO on the disk
with the relational table when the number of hits is large (say half
millions)
select count(*) from CONTAINSTABLE(MyTableName, FTColumnName, '
"QueryPhrase" ')
Thanks,
Wenbin Zhang
Wenbin Zhang,
Since you're using SQL Server 2005, the best way to get this type of
statistical info is from the word list in the FT Catalog via the CIDump.exe
utility (see related thread subject: cidump documentation?) and import the
output back into a SQL Server table.
If you need to use a T-SQL to determine the word count of a SQL FTS query,
instead of just using a count(*), it is faster to use a stored proc, for
example:
create proc FTS_t1 (@.SearchWord varchar(7800))
as
select c1 from t1 where contains(c2, @.SearchWord)
go
create proc FTSCount_t1 (@.SearchWord varchar(7800))
as
set nocount on
create table #FTPrimaryKey(t1_UPK int)
insert into #FTPrimaryKey exec FTS_t1 @.SearchWord
select count(*) from #FTPrimaryKey
set nocount off
go
-- example of use:
exec FTSCount_t1 '"computer"'
Hope that helps!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Wenbin Zhang" <zhang_wenbin@.hotmail.com> wrote in message
news:%23arzmzVoFHA.3696@.TK2MSFTNGP10.phx.gbl...
> Hi,
> I am using the June CTP release of SQL Server 2005 on Windows Sever 2003.
> Is there a quick way to get a total natch count (estimate is OK)
> of a SQL FullText query? I am working with about 10 million r
> ows of simple character data.
> The following SQL simply returns too slow due to disk IO on the disk
> with the relational table when the number of hits is large (say half
> millions)
> select count(*) from CONTAINSTABLE(MyTableName, FTColumnName, '
> "QueryPhrase" ')
> Thanks,
> Wenbin Zhang
>

Is there a quick way to convert Xml into a Table?

Hello,

I have an Xml column containing some Xml downloaded off the web (which comes from a Sql Server 2000 FOR XML query). I want to quickly convert the Xml back into a table with the relevant columns. Is there a quick way to do this?

Many thanks!

Ben S.

SELECT ncol.value('@.someAttrib')

,ncol.value('../@.AttribOfParent')

FROM yourtable

CROSS APPLY yourtable.yourxmlcolumn.nodes('/xPathThatLeads/toWhatYouWant/') AS T(ncol)

This is the a quick way to shred your xml data back to relational data

|||

I did it this way in the end (using SQL Server 2005):

DECLARE @.xmlDoc XML, @.xmlDocHandle INT

--Get xml off the web and put it into the @.xmlDoc variable here

EXEC sp_xml_preparedocument @.xmlDocHandle OUTPUT, @.xmlDoc;

SELECT * INTO #temp_table FROM OPENXML(@.xmlDocHandle, N'/root/element', 2) WITH myTable;

EXEC sp_xml_removedocument @.xmlDocHandle;

This will put the contents of the @.xmlDoc variable into the #temp_table provided the xml matches the structure of the existing myTable table. The '/root/element' bit discribes the xml elements you want to put into the table. Eg.

<root>
<element>
<id>1</id>
<name>Fred Smith</name>
</element>
<element>
<id>2</id>
<name>Joe Bloggs</name>
</element>
</root>

The above example will return two records assuming myTable has id and name columns.

Hope this helps other people.

Is there a quick way to convert Xml into a Table?

Hello,

I have an Xml column containing some Xml downloaded off the web (which comes from a Sql Server 2000 FOR XML query). I want to quickly convert the Xml back into a table with the relevant columns. Is there a quick way to do this?

Many thanks!

Ben S.

SELECT ncol.value('@.someAttrib')

,ncol.value('../@.AttribOfParent')

FROM yourtable

CROSS APPLY yourtable.yourxmlcolumn.nodes('/xPathThatLeads/toWhatYouWant/') AS T(ncol)

This is the a quick way to shred your xml data back to relational data

|||

I did it this way in the end (using SQL Server 2005):

DECLARE @.xmlDoc XML, @.xmlDocHandle INT

--Get xml off the web and put it into the @.xmlDoc variable here

EXEC sp_xml_preparedocument @.xmlDocHandle OUTPUT, @.xmlDoc;

SELECT * INTO #temp_table FROM OPENXML(@.xmlDocHandle, N'/root/element', 2) WITH myTable;

EXEC sp_xml_removedocument @.xmlDocHandle;

This will put the contents of the @.xmlDoc variable into the #temp_table provided the xml matches the structure of the existing myTable table. The '/root/element' bit discribes the xml elements you want to put into the table. Eg.

<root>
<element>
<id>1</id>
<name>Fred Smith</name>
</element>
<element>
<id>2</id>
<name>Joe Bloggs</name>
</element>
</root>

The above example will return two records assuming myTable has id and name columns.

Hope this helps other people.

sql

Is there a query to get transactions in MSrepl_commands for a specific publication?

tia,
--oj.
Oliver,
have a look at sp_browsereplcmds in BOL.
HTH,
Paul Ibison
(recommended sql server 2000 replication book:
http://www.nwsu.com/0974973602p.html)

Is there a Query Analyzer

in the market place that will run a query on multiple DBs across
multiple servers? Without having to manually connect to each server?
If so, can anyone provide a link for it.

Thanks"it depends"

I might be able to write you a custom program for free to do it if you give me
more details.

There's ways to send a query to an sql server via script; but it's not pretty.

Tell us more about what you need to do; or email me off the NG if you want a
mini app

-j

On 20 Feb 2007 14:04:44 -0800, click37@.gmail.com wrote:

Quote:

Originally Posted by

>in the market place that will run a query on multiple DBs across
>multiple servers? Without having to manually connect to each server?
>If so, can anyone provide a link for it.
>
>Thanks


-- AntiSpam/harvest --
Remove X's to send email to me.|||Say I want to run this script:

Select * from company

I need it to run on DB1, DB2, DB3, DB4 and DB5

DB1 and DB2 are located on Server1
DB3 is on Server2
DB4 and DB5 is on Server3

The long and manual way I would have to do it, is copy the SQL smt.
Connect to Server1, paste and run it on DB1, then DB2. Connect to
Server 2, run it on DB3 and so on.

Is there a way to configure the script that will allow me to run the
query once, and it'll give me the results from all the DBs I want it
to hit ?

Thanks for helping.

On Feb 20, 8:57 pm, Josh Assing <Xjo...@.jAssing.comwrote:

Quote:

Originally Posted by

"it depends"
>
I might be able to write you a custom program for free to do it if you give me
more details.
>
There'sways to send aqueryto an sql server via script; but it's not pretty.
>
Tell us more about what you need to do; or email me off the NG if you want a
mini app
>
-j
>
On 20 Feb 2007 14:04:44 -0800, clic...@.gmail.com wrote:
>

Quote:

Originally Posted by

in the market place that will run aqueryon multiple DBs across
multiple servers? Without having to manually connect to each server?
If so, can anyone provide a link for it.


>

Quote:

Originally Posted by

Thanks


>
-- AntiSpam/harvest --
Remove X's to send email to me.

|||On 22.02.2007 16:04, click37@.gmail.com wrote:

Quote:

Originally Posted by

Say I want to run this script:
>
Select * from company
>
I need it to run on DB1, DB2, DB3, DB4 and DB5
>
DB1 and DB2 are located on Server1
DB3 is on Server2
DB4 and DB5 is on Server3
>
The long and manual way I would have to do it, is copy the SQL smt.
Connect to Server1, paste and run it on DB1, then DB2. Connect to
Server 2, run it on DB3 and so on.
>
Is there a way to configure the script that will allow me to run the
query once, and it'll give me the results from all the DBs I want it
to hit ?


Whatever you do, you will have to at least once authenticate to each
machine. With SQL 2005 Management Studio you can store credentials - or
you use Windows authentication. Then no additional login is required.

HTH

robert

Is there a need to put an N infront of an numeric value?

for example, in a query,
select * from abc where emp_no = N'1234567';
I get the same results if I use this:
select * from abc where emp_no = '1234567';
so what's the point of using the 'N' in front of the numeric value?Well, it doesn't seem to be a numeric value! The quotes specify it's a string variable and on top of that the N says it's to be (explicitly) converted to a unicode string variable.

Check what data type emp_no has. If it's INT (or another numeric type) then leave out the N and the quotes altogether.|||the original type from the source is character, that's why I need the quotes. so N means "unicode strong variable"? May I know under what circumstances would it be compulsory?|||The explicit casting of the string to unicode (which the N does) is needed when emp_no is of type NCHAR, NVARCHAR of NTEXT (which are unicode datatypes).

Even then it's not compulsory because SQL Server will implicitly cast the string to the right type. But it's recommended because of performance reasons. Implicit conversions are slower than explicit conversions (and consistency in your code).|||Suppose you have a table like this:

create table employee
(emp_no varchar(10) not null primary key,
name varchar(50)
other fields as necessary)

When you query the table Like this:

select * from abc where emp_no = N'1234567';

You will always get a table scan (or clustered index scan, which is the same thing). This is because SQL Server has to do an implicit conversion to match the data up. Since nvarchar values are not guaranteed to translate to varchar, all of the values in the table are converted to varchar. Depending on how many employees you have, this will take some time.

The short of it is, always query the underlying table with the same datatype as the column in the table.|||You will always get a table scan (or clustered index scan, which is the same thing).
Yeah, that too :D

Is there a log of activity as compared to the Tran log?

I am trying to identify if I have unwanted guests gaining access to my
system or it's a run away query by a user on my network. I am thinking of
recording SPIDs as one way.
Via Spotlight I can see that I'm having beyond normal usage. Granted I have
a new PHB who likes to run queries off his laptop to show off to other
managers. Real scary when he has a little understanding of the data.
The box running Spotlight is WAY under powered, but can it keep that logging
data instead of the server itself?You can create your own log with profiler - it's a good tool
to track down performance issues in a database. If you use
it and performance is a concern, don't trace to the database
or from the database. Trace from a client and if tracing to
a file, have it go on that client.
Or use a server side trace. But you'd probably want to play
around with profiler from a client first.
-Sue
On Fri, 23 Sep 2005 08:30:52 -0500, "Stephen Russell"
<srussell@.transactiongraphics.com> wrote:

>I am trying to identify if I have unwanted guests gaining access to my
>system or it's a run away query by a user on my network. I am thinking of
>recording SPIDs as one way.
>Via Spotlight I can see that I'm having beyond normal usage. Granted I hav
e
>a new PHB who likes to run queries off his laptop to show off to other
>managers. Real scary when he has a little understanding of the data.
>The box running Spotlight is WAY under powered, but can it keep that loggin
g
>data instead of the server itself?
>|||2 things here really
1. Tracking "unwanted guests" gaining access to your SQL Server is easily
tracked with no perf hit by setting "Audit level" to "All". This is strictly
for login attempts and can be se via EM. Right click on your server, select
properties and click on the security tab. This puts an entry in the logs
everytime a login attempt is successfully or failed. At the very minimum,
you should log failures.
2. Runaway queries are a little harder to do with SQL Server 2000. First,
you need to define what you mean by runaway queries. CPU consumption?
Memory? Length of query? SQL Profiler and a bunch of 3rd party tools from
folks like Quest, BMC, etc... let's you capture the appropriate data to
"look at" so you can figure out who's got a runaway query. Depending on what
you use, you may have to do extra work. For example, if you capture trace
info with Profiler or SQL Trace, you'll need a way to track the data and
raise alerts based on thresholds that you set. Life gets a LOT better in
2005 with the default "reports" (it's a cool dashboard really) but in 2000,
there's a little bit of work needed to get you there. :-)
Btw, one thing you might want to consider is the Query Governor cost limit
option. It's not the most "precise" tool you can have since it works based
on estimates but it's a pretty good start with minimal effort. Look it up in
BOL. It's well documented.
joe.
"Stephen Russell" <srussell@.transactiongraphics.com> wrote in message
news:u5%23LlMEwFHA.664@.tk2msftngp13.phx.gbl...
>I am trying to identify if I have unwanted guests gaining access to my
>system or it's a run away query by a user on my network. I am thinking of
>recording SPIDs as one way.
> Via Spotlight I can see that I'm having beyond normal usage. Granted I
> have a new PHB who likes to run queries off his laptop to show off to
> other managers. Real scary when he has a little understanding of the
> data.
> The box running Spotlight is WAY under powered, but can it keep that
> logging data instead of the server itself?
>
>

Monday, March 26, 2012

Is there a correct syntax for writing a query?

Is there a best syntax for writing a query (specifically in sql server with
t-sql)? I've written a few with the JOIN keywords but I mostly write them
using '=', '*=', etc... Is there a more 'sql-compliant' way or is it just a
matter of preference?
THanks.*= and =* are old syntaxes and might lead to ambiguous queries.
SQL Server 2005 does not support this syntax.
Use the ansi-92 syntax of left outer and right outer join.
You can refer to BOL for more information on this.
Hope this helps.|||Use the Join keyword.
JOIN is part of the ANSI syntax.
BOL 2005 says:
The outer join operators (*= and =*) are not supported when the
compatibility level of the database is set to 90.
David Lundell
Principal Consultant and Trainer
www.MutuallyBeneficial.com
David@.MutuallyBeneficial.com
"VMI" <VMI@.discussions.microsoft.com> wrote in message
news:243B3E9F-5935-46BE-A6A6-A1C51DF6B1F7@.microsoft.com...
> Is there a best syntax for writing a query (specifically in sql server
> with
> t-sql)? I've written a few with the JOIN keywords but I mostly write them
> using '=', '*=', etc... Is there a more 'sql-compliant' way or is it just
> a
> matter of preference?
> THanks.|||> Is there a best syntax for writing a query (specifically in sql server with
> t-sql)? I've written a few with the JOIN keywords but I mostly write them
> using '=', '*=', etc... Is there a more 'sql-compliant' way or is it just
a
> matter of preference?
>
There are situations where *= can actually give you bad results. Stay away
from it.|||Can you give an example. Say you have the same value repeated in 3 rows for
a
column, if you want to remove duplicates and still get 3 rows. Then what
value do you want to have in that column?|||Oops.. wrong post :(

Friday, March 23, 2012

Is there a better way to write this?

I have a set base query which is causing me problems... the query look like
this
insert into TableA
(columnA1)
select
columnB1 = (select columnB1 from TableB where columnB2 = a.columnC2 and
columnB3 = a.columnC3)
from TableC as a
It's actaully population of a fact table from dimension tables. My problem
is that the subquery is failing because it is returning more than 1 value
from TableB, but I'm unable to pull out the records that are really
returning me the error. Anyone got any suggestions?Why the subquery? Why not use something like this (untested):
insert into TableA (columnA1)
select b.columnB1 from TableB b
inner join TableC c
on b.columnB2 = c.columnC2
and b.columnB3 = c.columnC3
mike hodgson
http://sqlnerd.blogspot.com
"Nestor" <n3570r@.yahoo.com> wrote in message
news:%23rxneaJHGHA.344@.TK2MSFTNGP09.phx.gbl...
>I have a set base query which is causing me problems... the query look
>like this
> insert into TableA
> (columnA1)
> select
> columnB1 = (select columnB1 from TableB where columnB2 = a.columnC2 and
> columnB3 = a.columnC3)
> from TableC as a
> It's actaully population of a fact table from dimension tables. My problem
> is that the subquery is failing because it is returning more than 1 value
> from TableB, but I'm unable to pull out the records that are really
> returning me the error. Anyone got any suggestions?
>|||Thanks for the quick respond.. primarily the reason is that the actual query
is actaully a lot more complicated than the example, something like this
insert into TableA
(columnA1, columnA2, columnA3, ...)
select
columnB1 = (select columnB1 from TableB where columnB2 = a.columnC2 and
columnB3 = a.columnC3)
columnB2 = (select columnB2 from TableD where ...)
columnB3 = (other conditions)
from TableC as a
where ...
and not exists (...)
I don't think I can really write it using join statements considering the
tables and conditions involved?
"Mike Hodgson" <e1minst3r@.gmail.com> wrote in message
news:OzbZFlJHGHA.2444@.TK2MSFTNGP11.phx.gbl...
> Why the subquery? Why not use something like this (untested):
> insert into TableA (columnA1)
> select b.columnB1 from TableB b
> inner join TableC c
> on b.columnB2 = c.columnC2
> and b.columnB3 = c.columnC3
>
> --
> mike hodgson
> http://sqlnerd.blogspot.com
>
> "Nestor" <n3570r@.yahoo.com> wrote in message
> news:%23rxneaJHGHA.344@.TK2MSFTNGP09.phx.gbl...
>|||Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications.
A wild guess not supported in any way by your posting:
INSERT INTO TableA (columnA1)
SELECT B.columnB1
FROM TableB AS B
WHERE B.columnB2 = A.columnC2
AND B.columnB3 = A.columnC3);
The syntax was wrong afte that point. This can still blow up since we
have no DDL.|||to find the rows that are causing you fits, try something like
SELECT B2, B3, COUNT(*)
FROM TableB
GROUP BY B2, B3
HAVING COUNT(*) > 1
See if that points you in the right direction.
Stu|||Surely you can still do it without all the subqueries (unless the values in
different columns for a single row are unrelated), you just need to figure
out the appropriate SELECT statement. It's hard to offer helpful
suggestions in your case as you've provided us with very little info -
posting some relevant schema & test data, as Joe suggested, would make it
easier for others to help you.
mike hodgson
http://sqlnerd.blogspot.com
"Nestor" <n3570r@.yahoo.com> wrote in message
news:%23d$1SwJHGHA.3036@.tk2msftngp13.phx.gbl...
> Thanks for the quick respond.. primarily the reason is that the actual
> query is actaully a lot more complicated than the example, something like
> this
> insert into TableA
> (columnA1, columnA2, columnA3, ...)
> select
> columnB1 = (select columnB1 from TableB where columnB2 = a.columnC2 and
> columnB3 = a.columnC3)
> columnB2 = (select columnB2 from TableD where ...)
> columnB3 = (other conditions)
> from TableC as a
> where ...
> and not exists (...)
> I don't think I can really write it using join statements considering the
> tables and conditions involved?
> "Mike Hodgson" <e1minst3r@.gmail.com> wrote in message
> news:OzbZFlJHGHA.2444@.TK2MSFTNGP11.phx.gbl...
>|||Is the problem that you have duplicate rows which you need to clean up?
Stu suggested and approach to identify the duplicates. A unique constraint
will prevent them from occurring in the future.
Or do you want to have the SQL ignore these duplicates, and just take the
first value that it finds?
Try using max(columnB1) or Min(columnB1) in your sub select
OR try this...
columnB1 = (select top 1 columnB1 from TableB where columnB2 =
a.columnC2 and columnB3 = a.columnC3)
Or is your join incorrect in the subquery and you need another key field in
there?
Maybe the data is perfect and you are just not being specific enough?
As others have said, the DDL will help make some of this clear, as will a
more in depth explanation.
"Nestor" <n3570r@.yahoo.com> wrote in message
news:%23rxneaJHGHA.344@.TK2MSFTNGP09.phx.gbl...
> I have a set base query which is causing me problems... the query look
like
> this
> insert into TableA
> (columnA1)
> select
> columnB1 = (select columnB1 from TableB where columnB2 = a.columnC2 and
> columnB3 = a.columnC3)
> from TableC as a
> It's actaully population of a fact table from dimension tables. My problem
> is that the subquery is failing because it is returning more than 1 value
> from TableB, but I'm unable to pull out the records that are really
> returning me the error. Anyone got any suggestions?
>

Is there a better way to do this very simple query(s)?

I' m almost certain that my approach is wrong but it works! Could someone
tell me the correct way to achieve the following?
-- Start Code --
INSERT INTO
Categories (Description)
SELECT DISTINCT CategoryDescription
FROM Import
UPDATE Categories SET StatusID = 1
UPDATE Categories SET DateAdded = GETDATE()
-- End Code --
Also could someone suggest a good book for SQL programming? As you can tell
I am lost!
Thanks,
TomThis can be done during the INSERT (assuming the table is empty when you
begin):
INSERT INTO
Categories (Description, StatusID, DateAdded)
SELECT DISTINCT CategoryDescription, 1, GETDATE()
FROM Import
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
.
"Tom" <Tom@.discussions.microsoft.com> wrote in message
news:F747781D-B975-4664-AAF8-90C5A0EFB536@.microsoft.com...
I' m almost certain that my approach is wrong but it works! Could someone
tell me the correct way to achieve the following?
-- Start Code --
INSERT INTO
Categories (Description)
SELECT DISTINCT CategoryDescription
FROM Import
UPDATE Categories SET StatusID = 1
UPDATE Categories SET DateAdded = GETDATE()
-- End Code --
Also could someone suggest a good book for SQL programming? As you can tell
I am lost!
Thanks,
Tom|||Since you did not bother to expalin "Import", I am making some
assumptions.
INSERT INTO Categories (cat_description, foobar_status, start_date,
end_date)
SELECT DISTINCT cat_description, 1, start_date, end_date
FROM Import ;
There cannot such a thing as a "status_id" -- the data element is
either an identifier and belongs to one and only one entity. If it is a
status, it is a value that shows the status of some non-key attribute.
Status of what'
People who never learned RDBMS sometimes make fools of themselves by
using IDENTITY as the key for everything, just like they were still in
file systems and had to have a record number.
You do know that a temporal data element is modeled in durations, not
in points, don't you?|||Tom,
I'm sorry I left out the fact that the table is indeed empty.
That is exactly what I was looking for! Thank You! You brought up a good
point however, next w when I get an updated category list from my vendor,
how would I go about adding only the new items to the Categories table and
setting the StatusID=1, DateAdded=GetDate()?
Thanks again, I really appreciate the help!!!
Tom
"Tom Moreau" wrote:

> This can be done during the INSERT (assuming the table is empty when you
> begin):
> INSERT INTO
> Categories (Description, StatusID, DateAdded)
> SELECT DISTINCT CategoryDescription, 1, GETDATE()
> FROM Import
>
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON Canada
> www.pinpub.com
> ..
> "Tom" <Tom@.discussions.microsoft.com> wrote in message
> news:F747781D-B975-4664-AAF8-90C5A0EFB536@.microsoft.com...
> I' m almost certain that my approach is wrong but it works! Could someone
> tell me the correct way to achieve the following?
> -- Start Code --
> INSERT INTO
> Categories (Description)
> SELECT DISTINCT CategoryDescription
> FROM Import
> UPDATE Categories SET StatusID = 1
> UPDATE Categories SET DateAdded = GETDATE()
> -- End Code --
> Also could someone suggest a good book for SQL programming? As you can tel
l
> I am lost!
> Thanks,
> Tom
>|||Actually, it dawned on me that the query would work if the table weren't
already populated. I keyed in on that UPDATE of yours and thought to myself
that it would set the entire table's statuses to 1. I then wrote the query
but forgot to post a follow-up on that.
Enjoy. :-)
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
.
"Tom" <Tom@.discussions.microsoft.com> wrote in message
news:F8872EC5-DA5B-4A42-AA16-F1AE13A9A476@.microsoft.com...
Tom,
I'm sorry I left out the fact that the table is indeed empty.
That is exactly what I was looking for! Thank You! You brought up a good
point however, next w when I get an updated category list from my vendor,
how would I go about adding only the new items to the Categories table and
setting the StatusID=1, DateAdded=GetDate()?
Thanks again, I really appreciate the help!!!
Tom
"Tom Moreau" wrote:

> This can be done during the INSERT (assuming the table is empty when you
> begin):
> INSERT INTO
> Categories (Description, StatusID, DateAdded)
> SELECT DISTINCT CategoryDescription, 1, GETDATE()
> FROM Import
>
> --
> Tom
> ----
> Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
> SQL Server MVP
> Columnist, SQL Server Professional
> Toronto, ON Canada
> www.pinpub.com
> ..
> "Tom" <Tom@.discussions.microsoft.com> wrote in message
> news:F747781D-B975-4664-AAF8-90C5A0EFB536@.microsoft.com...
> I' m almost certain that my approach is wrong but it works! Could someone
> tell me the correct way to achieve the following?
> -- Start Code --
> INSERT INTO
> Categories (Description)
> SELECT DISTINCT CategoryDescription
> FROM Import
> UPDATE Categories SET StatusID = 1
> UPDATE Categories SET DateAdded = GETDATE()
> -- End Code --
> Also could someone suggest a good book for SQL programming? As you can
> tell
> I am lost!
> Thanks,
> Tom
>|||
"--CELKO--" wrote:

> Since you did not bother to expalin "Import", I am making some
> assumptions.
> INSERT INTO Categories (cat_description, foobar_status, start_date,
> end_date)
> SELECT DISTINCT cat_description, 1, start_date, end_date
> FROM Import ;
> There cannot such a thing as a "status_id" -- the data element is
> either an identifier and belongs to one and only one entity. If it is a
> status, it is a value that shows the status of some non-key attribute.
> Status of what'
> People who never learned RDBMS sometimes make fools of themselves by
> using IDENTITY as the key for everything, just like they were still in
> file systems and had to have a record number.
> You do know that a temporal data element is modeled in durations, not
> in points, don't you?
>|||CELKO,
Go to bed! If I wanted to be honored by your intelligence I would have
contacted you directly. My question was simple and answered in a prompt,
precise and professional manner.
Tom Moreau, understood my question, thanks Tom.
Your question, and "solution" for that matter, is far from what I was
asking. My question dealt with 3 (three) columns, your solution 4 (four)!
"Import" is a table, although I agree it is a bad name, I didn't name it!
And finally I disagree with you on the "StatusID", the value actually comes
from a lookup table that makes perfect sense to me, my company and most of
the rest of the world! I simplified the sample query to get my question
answered! StatusID id NOT an IDENTITY but a foreign key.
Thanks Again Tom!|||>> If I wanted to be honored by your intelligence I would have contacted you
directly. <<
You might want to learn how newsgroups work.
No, you just posted it without any clean up. And without any DDL,
either.
Fine, but it makes no sense to anyone who uses ISO-11179 standards. It
might not be your company, but it is the rest of the world. But those
who go fishing for quick kludges are probably not going to learn such
things.
And the reason that I gave you four columns is that they model three
data elements. Again, time is modeled in durations of helf-open
intervals. Look up the work done for the past few decades by Rick
Snodgrass at the University of Arizona. Looking at my answer, I think
I would change it to at least this:
INSERT INTO Categories (cat_description, foobar_status, start_date,
end_date)
SELECT DISTINCT cat_description, 1, CURRENT_TIMESTAMP, NULL
FROM ImportStagingTable;
But I would squeeze out blanks, watch the case of the description
string, etc.|||CELKO,
Go away already will you?
"--CELKO--" wrote:

> You might want to learn how newsgroups work.
>
I know how newsgroups work, that's why I posted the question here. Most
people offer useable suggestions and solutions to the QUESTION ASKED not
thier own take as to what it should be.

> No, you just posted it without any clean up. And without any DDL,
> either.
>
Sorry, there was no cleanup to do! It was a simple question in need of a
simple answer! I'm sorry it was beneath you. I agree there was no DDL but
my question was so SIMPLE I didn't feel it was necessary. I simply asked a
SIMPLE question and you felt the need to show off your intelligence, actuall
y
in this case LACK of intelligence.

> Fine, but it makes no sense to anyone who uses ISO-11179 standards. It
> might not be your company, but it is the rest of the world. But those
> who go fishing for quick kludges are probably not going to learn such
> things.
>
In perticular what subsection of ISO-11179 are you talking about? You don't
know this database and therefore you have no business telling what it does
and does not comply to!

> And the reason that I gave you four columns is that they model three
> data elements. Again, time is modeled in durations of helf-open
> intervals. Look up the work done for the past few decades by Rick
> Snodgrass at the University of Arizona. Looking at my answer, I think
> I would change it to at least this:
> INSERT INTO Categories (cat_description, foobar_status, start_date,
> end_date)
> SELECT DISTINCT cat_description, 1, CURRENT_TIMESTAMP, NULL
> FROM ImportStagingTable;
> But I would squeeze out blanks, watch the case of the description
> string, etc.
I agree with your modeling statement but you are missing the point!
Why are you forcing me an end date when I don't need one? Not that it is
either important OR any of your business, I am tracking fish being released
and I need to know the "description", "status", and "date" of the release.
Very simple!!!

>
"--CELKO--" wrote:

> You might want to learn how newsgroups work.
>
> No, you just posted it without any clean up. And without any DDL,
> either.
>
> Fine, but it makes no sense to anyone who uses ISO-11179 standards. It
> might not be your company, but it is the rest of the world. But those
> who go fishing for quick kludges are probably not going to learn such
> things.
> And the reason that I gave you four columns is that they model three
> data elements. Again, time is modeled in durations of helf-open
> intervals. Look up the work done for the past few decades by Rick
> Snodgrass at the University of Arizona. Looking at my answer, I think
> I would change it to at least this:
> INSERT INTO Categories (cat_description, foobar_status, start_date,
> end_date)
> SELECT DISTINCT cat_description, 1, CURRENT_TIMESTAMP, NULL
> FROM ImportStagingTable;
> But I would squeeze out blanks, watch the case of the description
> string, etc.
>

Is there a better (faster) way?

I am running this query on MS SQL 2000 that is deleting records on an AS400
table. Extremely slow. Is there a different way I should/could do this?
\
Background - Data comes into COMMON. I read the data into #LoadSwipes,
process the data, and then want to delete the processed records from COMMON.
delete KRONOS.RSIGL.TIMECUSTOM.COMMON
from #LoadSwipes ls
inner join KRONOS.RSIGL.TIMECUSTOM.COMMON k
on ls.Area = k.Area and
ls.TimeSt = k.TimeSt and
ls.EmpNum = k.EmpNum and
ls.Clock = k.Clock and
ls.EmpNu9 = k.EmpNu9 and
ls.ActIn = k.ActIn
Thanks
EricHi Eric,
This might not apply to you situation, but one thing that you may consider
is copying the record that you want to keep into a seperate temp table, then
truncate the original table, and finally copy the records from the temp table
back to the original.
This would be useful if the number of records that you want to keep is
dramatically less than the number of records that you want to delete.
--
MG
"Eric Stewart" wrote:
> I am running this query on MS SQL 2000 that is deleting records on an AS400
> table. Extremely slow. Is there a different way I should/could do this?
> \
> Background - Data comes into COMMON. I read the data into #LoadSwipes,
> process the data, and then want to delete the processed records from COMMON.
> delete KRONOS.RSIGL.TIMECUSTOM.COMMON
> from #LoadSwipes ls
> inner join KRONOS.RSIGL.TIMECUSTOM.COMMON k
> on ls.Area = k.Area and
> ls.TimeSt = k.TimeSt and
> ls.EmpNum = k.EmpNum and
> ls.Clock = k.Clock and
> ls.EmpNu9 = k.EmpNu9 and
> ls.ActIn = k.ActIn
>
> Thanks
> Eric
>
>|||Thanks for the suggestion MG. However, I don't think that applies here. The
table on the AS400 is continually receiving data. I read data from that
table into a SQL temp table and process the records in the temp table.
Meantime, more records are flowing into the AS400 table. The temp table
serves as my means of knowing which records can be deleted from the AS400
table.
Eric
"MGeles" <michael.geles@.thomson.com> wrote in message
news:34A0DE2D-FE23-4484-918C-F32E52A09DA6@.microsoft.com...
> Hi Eric,
> This might not apply to you situation, but one thing that you may consider
> is copying the record that you want to keep into a seperate temp table,
then
> truncate the original table, and finally copy the records from the temp
table
> back to the original.
> This would be useful if the number of records that you want to keep is
> dramatically less than the number of records that you want to delete.
> --
> MG
>
> "Eric Stewart" wrote:
> > I am running this query on MS SQL 2000 that is deleting records on an
AS400
> > table. Extremely slow. Is there a different way I should/could do this?
> >
> > \
> >
> > Background - Data comes into COMMON. I read the data into #LoadSwipes,
> > process the data, and then want to delete the processed records from
COMMON.
> >
> > delete KRONOS.RSIGL.TIMECUSTOM.COMMON
> > from #LoadSwipes ls
> > inner join KRONOS.RSIGL.TIMECUSTOM.COMMON k
> > on ls.Area = k.Area and
> > ls.TimeSt = k.TimeSt and
> > ls.EmpNum = k.EmpNum and
> > ls.Clock = k.Clock and
> > ls.EmpNu9 = k.EmpNu9 and
> > ls.ActIn = k.ActIn
> >
> >
> > Thanks
> > Eric
> >
> >
> >

Is there a better (faster) way?

I am running this query on MS SQL 2000 that is deleting records on an AS400
table. Extremely slow. Is there a different way I should/could do this?
\
Background - Data comes into COMMON. I read the data into #LoadSwipes,
process the data, and then want to delete the processed records from COMMON.
delete KRONOS.RSIGL.TIMECUSTOM.COMMON
from #LoadSwipes ls
inner join KRONOS.RSIGL.TIMECUSTOM.COMMON k
on ls.Area = k.Area and
ls.TimeSt = k.TimeSt and
ls.EmpNum = k.EmpNum and
ls.Clock = k.Clock and
ls.EmpNu9 = k.EmpNu9 and
ls.ActIn = k.ActIn
Thanks
Eric
Hi Eric,
This might not apply to you situation, but one thing that you may consider
is copying the record that you want to keep into a seperate temp table, then
truncate the original table, and finally copy the records from the temp table
back to the original.
This would be useful if the number of records that you want to keep is
dramatically less than the number of records that you want to delete.
MG
"Eric Stewart" wrote:

> I am running this query on MS SQL 2000 that is deleting records on an AS400
> table. Extremely slow. Is there a different way I should/could do this?
> \
> Background - Data comes into COMMON. I read the data into #LoadSwipes,
> process the data, and then want to delete the processed records from COMMON.
> delete KRONOS.RSIGL.TIMECUSTOM.COMMON
> from #LoadSwipes ls
> inner join KRONOS.RSIGL.TIMECUSTOM.COMMON k
> on ls.Area = k.Area and
> ls.TimeSt = k.TimeSt and
> ls.EmpNum = k.EmpNum and
> ls.Clock = k.Clock and
> ls.EmpNu9 = k.EmpNu9 and
> ls.ActIn = k.ActIn
>
> Thanks
> Eric
>
>
|||Thanks for the suggestion MG. However, I don't think that applies here. The
table on the AS400 is continually receiving data. I read data from that
table into a SQL temp table and process the records in the temp table.
Meantime, more records are flowing into the AS400 table. The temp table
serves as my means of knowing which records can be deleted from the AS400
table.
Eric
"MGeles" <michael.geles@.thomson.com> wrote in message
news:34A0DE2D-FE23-4484-918C-F32E52A09DA6@.microsoft.com...
> Hi Eric,
> This might not apply to you situation, but one thing that you may consider
> is copying the record that you want to keep into a seperate temp table,
then
> truncate the original table, and finally copy the records from the temp
table[vbcol=seagreen]
> back to the original.
> This would be useful if the number of records that you want to keep is
> dramatically less than the number of records that you want to delete.
> --
> MG
>
> "Eric Stewart" wrote:
AS400[vbcol=seagreen]
COMMON.[vbcol=seagreen]

Is there a better (faster) way?

I am running this query on MS SQL 2000 that is deleting records on an AS400
table. Extremely slow. Is there a different way I should/could do this?
\
Background - Data comes into COMMON. I read the data into #LoadSwipes,
process the data, and then want to delete the processed records from COMMON.
delete KRONOS.RSIGL.TIMECUSTOM.COMMON
from #LoadSwipes ls
inner join KRONOS.RSIGL.TIMECUSTOM.COMMON k
on ls.Area = k.Area and
ls.TimeSt = k.TimeSt and
ls.EmpNum = k.EmpNum and
ls.Clock = k.Clock and
ls.EmpNu9 = k.EmpNu9 and
ls.ActIn = k.ActIn
Thanks
EricHi Eric,
This might not apply to you situation, but one thing that you may consider
is copying the record that you want to keep into a seperate temp table, then
truncate the original table, and finally copy the records from the temp tabl
e
back to the original.
This would be useful if the number of records that you want to keep is
dramatically less than the number of records that you want to delete.
--
MG
"Eric Stewart" wrote:

> I am running this query on MS SQL 2000 that is deleting records on an AS40
0
> table. Extremely slow. Is there a different way I should/could do this?
> \
> Background - Data comes into COMMON. I read the data into #LoadSwipes,
> process the data, and then want to delete the processed records from COMMO
N.
> delete KRONOS.RSIGL.TIMECUSTOM.COMMON
> from #LoadSwipes ls
> inner join KRONOS.RSIGL.TIMECUSTOM.COMMON k
> on ls.Area = k.Area and
> ls.TimeSt = k.TimeSt and
> ls.EmpNum = k.EmpNum and
> ls.Clock = k.Clock and
> ls.EmpNu9 = k.EmpNu9 and
> ls.ActIn = k.ActIn
>
> Thanks
> Eric
>
>|||Thanks for the suggestion MG. However, I don't think that applies here. The
table on the AS400 is continually receiving data. I read data from that
table into a SQL temp table and process the records in the temp table.
Meantime, more records are flowing into the AS400 table. The temp table
serves as my means of knowing which records can be deleted from the AS400
table.
Eric
"MGeles" <michael.geles@.thomson.com> wrote in message
news:34A0DE2D-FE23-4484-918C-F32E52A09DA6@.microsoft.com...
> Hi Eric,
> This might not apply to you situation, but one thing that you may consider
> is copying the record that you want to keep into a seperate temp table,
then
> truncate the original table, and finally copy the records from the temp
table[vbcol=seagreen]
> back to the original.
> This would be useful if the number of records that you want to keep is
> dramatically less than the number of records that you want to delete.
> --
> MG
>
> "Eric Stewart" wrote:
>
AS400[vbcol=seagreen]
COMMON.[vbcol=seagreen]

Wednesday, March 21, 2012

Is the query using the cache

How do i find out whether the query is picking the data and query plan from
the cache.Use in your trace Events - Stored Procedures - SP:CacheInsert, SP:CacheHit,
SP:CacheMiss and SP:CacheRemove. See BOL for more information.
AMB
"Neeraj" wrote:

> How do i find out whether the query is picking the data and query plan fro
m
> the cache.sql

Is the anyway to run a query that will show free disk space on dat

Is the anyway to run a query that will show free disk space of all hard
drives on the
server?
-Dmaster..xp_fixeddrives
--
HTH, Jens Suessmeyer.
--
http://www.sqlserver2005.de
--
"D'Animal" wrote:
> Is the anyway to run a query that will show free disk space of all hard
> drives on the
> server?
> -D|||Also sp_diskspace http://www.sqldbatips.com/showcode.asp?ID=4
This procedure returns the amount of free space (in MB) on all fixed disks
on the SQL Server using xp_fixeddrives but builds on that by using the
FileSystemObject to get the total drive space and calculate the free space
percentage
--
HTH
Jasper Smith (SQL Server MVP)
http://www.sqldbatips.com
I support PASS - the definitive, global
community for SQL Server professionals -
http://www.sqlpass.org
"D'Animal" <D'Animal@.discussions.microsoft.com> wrote in message
news:75D859FA-0379-4885-A26B-2D84F3C7E207@.microsoft.com...
> Is the anyway to run a query that will show free disk space of all hard
> drives on the
> server?
> -D

Is the anyway to run a query that will show free disk space on dat

Is the anyway to run a query that will show free disk space of all hard
drives on the
server?
-D
master..xp_fixeddrives
HTH, Jens Suessmeyer.
http://www.sqlserver2005.de
"D'Animal" wrote:

> Is the anyway to run a query that will show free disk space of all hard
> drives on the
> server?
> -D
|||Also sp_diskspace http://www.sqldbatips.com/showcode.asp?ID=4
This procedure returns the amount of free space (in MB) on all fixed disks
on the SQL Server using xp_fixeddrives but builds on that by using the
FileSystemObject to get the total drive space and calculate the free space
percentage
HTH
Jasper Smith (SQL Server MVP)
http://www.sqldbatips.com
I support PASS - the definitive, global
community for SQL Server professionals -
http://www.sqlpass.org
"D'Animal" <D'Animal@.discussions.microsoft.com> wrote in message
news:75D859FA-0379-4885-A26B-2D84F3C7E207@.microsoft.com...
> Is the anyway to run a query that will show free disk space of all hard
> drives on the
> server?
> -D
sql

Is the anyway to run a query that will show free disk space on dat

Is the anyway to run a query that will show free disk space of all hard
drives on the
server?
-Dmaster..xp_fixeddrives
--
HTH, Jens Suessmeyer.
http://www.sqlserver2005.de
--
"D'Animal" wrote:

> Is the anyway to run a query that will show free disk space of all hard
> drives on the
> server?
> -D|||Also sp_diskspace http://www.sqldbatips.com/showcode.asp?ID=4
This procedure returns the amount of free space (in MB) on all fixed disks
on the SQL Server using xp_fixeddrives but builds on that by using the
FileSystemObject to get the total drive space and calculate the free space
percentage
HTH
Jasper Smith (SQL Server MVP)
http://www.sqldbatips.com
I support PASS - the definitive, global
community for SQL Server professionals -
http://www.sqlpass.org
"D'Animal" <D'Animal@.discussions.microsoft.com> wrote in message
news:75D859FA-0379-4885-A26B-2D84F3C7E207@.microsoft.com...
> Is the anyway to run a query that will show free disk space of all hard
> drives on the
> server?
> -D

Friday, February 24, 2012

Is Order By affect The Query Speed

hello,
I have a query that insert insert into new table , and then i select from this table,
if i add ORDER BY in the INSERT INTO script , does it affect the speed of the SELECT
i have big table that take about 70 secyes, it will either be ignored, or it will slow the INSERT down|||I'm confused. ORDER BY is not allowed in INSERT statements, is it? Unless part of a sub-query or something along those lines. And why would you do an order-by on an insert, since it makes no logical sense to try to instill order on a table that is not ordered.

The ORDER BY has to be on the select OUT from the table.|||I use ORDER BY in INSERT INTO cause I use INSERT INTO to a SELECT Statment, i have SELECT nested into INSERT

My Question is if i add the data in order in the table do i have better SELECT query performance|||So your ORDER by is not in the INSERT itself, but rather in the SELECT that is gathering the input. That is what I meant.

You're wasting your time and typing fingers. You cannot instill order when inserting to a SQL table, because there is no valid concept of that type of order in a SQL table.

SO, the short answer is NO. It will not affect later query speed. It is also a direct affront to all things good and beautiful. Most probably it would cause a Smiting Blow from the SQL Gods were it to be implemented in a production database.

If order is important, do it on your SELECT when you are pulling data out of the table.

The only time I would put an ORDER BY in an insert statement is if it was necessary in the SUB-SELECT to assure that the right data is pulled from the source table to be slapped into the INSERT table. OR if you need to limit the input from the select...as in: INSERT dbo.MyDestinationTable
SELECT TOP 100
Myfield1,
Myfield2,
MyDate
FROM MySourceTable
ORDER BY MyDate DESC and in this case, the ORDER BY is relative to the SELECT, not the INSERT.|||Thanks man|||You're welcome, dude.|||cowboy,

all of that is straight from the orthodox canon and that's fine and good, yet before I knew any better i wrote a server side pagination thingie that does an INSERT with a SELECT and an ORDER BY into a temp table and it does the row numbering with an identity column. It worked and still works and everytime I read this thing about it not being reliable, I have never seen it not work in practice.|||SO, the short answer is NO. It will not affect later query speed. please, sir, i must disagree

i think i gave the correct answer already

this --

INSERT INTO ... SELECT ... FROM ... ORDER BY ...

is going to be slower than this --

INSERT INTO ... SELECT ... FROM ...

simply because the ORDER BY will take extra time

(and, as we all know, the ORDER BY isn't guaranteed to actually insert them in the correct order anyway)|||OK Sean, I see your point, I knew I should have loaded my Orthodox Sidearm with birdshot so it scattered better. I still think that your example is another situation where the ORDER BY is related to the SELECT rather than the INSERT. Maybe it's semantics, but an order by in your situation is actually used to populate one of the INSERT table's columns, which is the ID column. Conceptually (in my mind, anyway) this applies to the gathering of the data to be inserted. NOT the ordering of the insert table.

In your example, the insertion of the data has nothing (reliably) to do with the actual physical order of the data in the table, nor in and of itself anything to do with the speed of a subsequent select from the table, correct? Of course the speed to be gained, if any, on the select has to do with the construction of the SELECT statement, and any applicable indicies that may be in place. NOT, per se, by anything you did on the insert itself.|||But the original question was whether subsequent SELECTS from the table will be faster, which they are not. There was no question as to which method of inserting was going to be faster. Presumably that is beyond debate.

If you have some free time, and a busy multi-CPU server, do a sp_helptext sp_who2, and scroll down to the bottom. I always loved the comment there:

-- (Seems always auto sorted.) order by spid_sort

On a busy server, parallelism takes over, and you get the results in every which way. I wonder how much processing time is shaved off by commenting out the order by statement. I mean, you might even have 200 or 500 rows to order!|||please, sir, i must disagree

i think i gave the correct answer already

this --

INSERT INTO ... SELECT ... FROM ... ORDER BY ...

is going to be slower than this --

INSERT INTO ... SELECT ... FROM ...

simply because the ORDER BY will take extra time

(and, as we all know, the ORDER BY isn't guaranteed to actually insert them in the correct order anyway)Well then, I guess you've forced my hand. Unfortunately, I've got to disagree with your disagreement then. Your answer was correct in your interpretation of the original question. That is, if he/she (or is that HeShe? I get that confused sometimes, which really has cause problems in the bar at times at closing) was asking if the ORDER BY would case the INSERT to be slower. If that is the case, I am behind you (at a respectable distance, of course) 100%.

However, if heshe asked if using ORDER BY on the INSERT would have any effect on later and independent SELECTs, then I stand by my answer. Nope.|||Daft I know but I have always* wondered whether ordering an insert by the clustered index definition is better\ no different to not ordering an insert. I have a feeling I have read somewhere one way or the other but I can't remember what the upshot was. The thing is - unless SS does implicitly order the insert anyway then it will be slower than an unordered insert as there will be page splits. When using BCP you can tell BCP if the order of data in the file & clustered index order are the same and the bulk insert is faster.

It is this sort of thing that gets me to sleep at night.

*not literally|||But the original question was whether subsequent SELECTS from the table will be faster

Oops.
There was no question as to which method of inserting was going to be faster. Presumably that is beyond debate.
Double oops.|||I have a feeling I have read somewhere one way or the other but I can't remember what the upshot was. perhaps this may help:
INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted

-- Ordering guarantees in SQL Server... (http://blogs.msdn.com/sqltips/archive/2005/07/20/441053.aspx)|||perhaps this may help:
INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted

-- Ordering guarantees in SQL Server... (http://blogs.msdn.com/sqltips/archive/2005/07/20/441053.aspx) Yeah, what he said.|||Thanks Rudy. That does actually fit with my observations - truncate a table\ clustered index, pump in several million rows in a single statement, check the fragmentation and the table turns out to be totally fragmented. I think I had always hoped that SS would order the input to fill the pages up to the fillfactor. I guess it must insert them in whatever order is most efficient for the select irrespective of what might be the most effecient insert order (assuming ordering the insert < splitting pages all over the place).

I appreciate all the logical "no order in relations" stuff - I am thinking solely about the physical aspect.


Did anyone let mgsn know that the physical order of the data (i.e. the clustered index) may affect many of the queries he\ she might perform on the table?|||Yeah, what he said.Thanks to you too Sean :)|||.
.
.
On a busy server, parallelism takes over, and you get the results in every which way. ...
So then; limiting the INSERT SP to a single processor might help reduce page splitting? Not "absolute guarantee", but reduce it.

My (unwarranted?) *impression* with reading about order-by in sub-queries not necessarily returning ordered sets to the outer queries, a symptom I've seen happen on rare occasion, was that it had to do with query plans.

Multi-processing affecting this also makes perfect sense.

However; since Ordering a Clustered Insert is relevant would it follow that setting the "Insert ... from select .. ordered by ..." SP to single processor would help matters?

Again; not looking for perfection, but in the real world, achieving an ordered insert 99.9% of the time.

I'd like to comment that, if the nature of the data is that ajoining records are often the objects of the same select, then having stuff together (in the same page) would be a performance gain.

For example: Let's say you have a non-clustered Child table that's ordered by ParentID. Let's say every "Family" has about 20 children on average, and the pages just happen to fit about 30 or 40 children rows each. Now let's say the child table has 30 million rows (to keep the Query Planner from invoking a full table scan). So; selecting a given 50 families would naturally use the ParentID key to go get the children. You would wind up reading maybe 25 pages on average if the related children (for each family) are all together, and therefore always fitting in either 1 or 2pages. By contrast; if they're all hap-hazzard you would get 50 x 20 page reads in the child table. Of course; at least some of the WHERE would have to be resolved outside of the child data records.

As a 2nd comment. For similar reasons; wouldn't it make sense to put stuff together for head-movement purposes? Not relevant for nearly all tables, but the situation may come when some table is so huge and so key that you put it on it's own HDD, and due to it's nature (let's just say it's financial transactions that are Usually queried within a range of perhaps 1 month but you don't archive until data is 8 months old), then the disks could predominately stay in the same area. Same would apply if lots of ordered transfers are done, so if you built an index that resolves the Where Clause but the actual transfer usually has large amounts of pages to read (so, of 100 million rows, let's say it's selecting a group of 5 million), if the data was all over the place, the heads would thrash. I believe that EVEN IF THE TABLE weren't on it's own drive, this COULD be a thrashing-saving quality.

I'm just saying that although unordered heaps and clusters are logically equivalent to ordered ones, that there are special cases where there's a cost to being unordered and that the cost can be severe for totally random cases. While I'm no expert in Relational databases - the basic concept of "reduce physical reads" surely remains valid.|||Hi vich

I've read this a couple of times. I think I know where you are going but there are a couple of lines that don't work.
Let's say you have a non-clustered Child table that's ordered by ParentID. This is an oxymoron. Did you mean clustered child table?

I'm just saying that although unordered heaps and clusters are logically equivalent to ordered onesAre you saying that a clustered index can be unordered? The definition of a clustered index has ordering right at the centre of it so again I don't quite get it.

Just to be clear - there are two types of table in the sql server world - Steers and Que... oops not them... Heaps & Clustered Indexes. Heaps are unordered and have no B-Tree structure. CIs are ordered (at the leaf level) by the index column(s) and have a B-Tree.

Overall though you are spot on - selected data can be more efficiently retrieved if it is physically contiguous and occupies the minimum number of pages.|||Hi vich

I've read this a couple of times. I think I know where you are going but there are a couple of lines that don't work.
This is an oxymoron. Did you mean clustered child table?

Are you saying that a clustered index can be unordered? The definition of a clustered index has ordering right at the centre of it so again I don't quite get it.

Just to be clear - there are two types of table in the sql server world - Steers and Que... oops not them... Heaps & Clustered Indexes. Heaps are unordered and have no B-Tree structure. CIs are ordered (at the leaf level) by the index column(s) and have a B-Tree.

Overall though you are spot on - selected data can be more efficiently retrieved if it is physically contiguous and occupies the minimum number of pages.
On point 1, by "ordered" I meant "contiguously in sequence" and no, I did mean heap, not clustered.

x1x2x3x4x5
not x3x1x5x4x2

So; "ordered heap" is an oxymoron in strict logical terms, but the records in the heap do physically reside somewhere. My point was that IF you know what groups of rows your will often read together, then physically grouping them will reduce physical reads in direct proportion to rows-per-page. Even if it's only "sometimes", it's still an improvement over being totally random since that will gaurentee worst case page hits.

To rephrase; one aspect of maximizing page hits is through placing rows into the same page if they can predictably be read together.

The phrase "Ordered Heap" must be to a DBA as "Perpetual Motion Machine" is to an engineer so lol; hence my clarification.

On the 2nd point, I was redundant. It should have read:
I'm just saying that although unordered heaps and clusters are logically equivalent.|||Ah - I see. However, in the heap the data being physically contigious would be useless - the engine would still need to scan every page in the table. It doesn't matter how many rows there are - heaps -> scans. Or have I missed your point again :D|||Ah - I see. However, in the heap the data being physically contigious would be useless - the engine would still need to scan every page in the table. It doesn't matter how many rows there are - heaps -> scans. Or have I missed your point again :D
What about indexes? Direct Access? Heap -> Full Scan is only occassionally true, correct? Often even, but not always.

With all do respect sir; in my case of 30 million rows when it's going after about 1000 of them, wouldn't you agree we should shoot the query plan that reads them all?

EDIT - added section below:
I was thinking about what bearing .mdf file's fragmentation has. My 2:30AM conclusion is; not too much. It's all about page hits. If the pages are scattered around, then seek ahead buffering, head movement, etc would come into play, but those ramifications (I think) would be far less than finding as much as possible per page (block really). It would probably come down to row size vs. block size (what is a "Page" anyway, at least as compared with a "block").

Anyway; this all boggels my uninformed mind and I submit to the engineer's recommendations, but I also want to understand the pieces so I can make good tuning decisions some day.

This thread's complete dismissal about physical placement of indexed heap data just struck me as "file this under, question this one'".|||Do you have a NC index on ParentID in this case? And we are still talking about a heap yes (the oxymoron-ordered heap ;))?|||Do you have a NC index on ParentID in this case? And we are still talking about a heap yes (the oxymoron-ordered heap ;))?
Yes, an indexed heap. But if the indexes tell you "go get these 1000 rows", I'm just saying that if prior inserts managed to maximize paging efficiency, then obviously 50 reads are better than 1000.

Sorry; don't know what "NC" is, but I guess I assumed that from the parent/child relationship. However; it wouldn't matter for the point being made. Even if the child's indexes required a full scan; if the DB Engine determined "go get these 1000 rows", it still holds true that 50 page reads trumps 1000. The scenerio did say that all children would be read as part of the "family".

Again; I'm just saying there are special cases, and not all that uncommon ones.|||NC = nonclustered. You have a nonclustered index on parentID?|||Even if the child's indexes required a full scan; if the DB Engine determined "go get these 1000 rows", it still holds true that 50 page reads trumps 1000. I think that is my point. Without any useable index on ParentID then the engine will scan the pages. Remember that scanning the pages involves getting them into memory and checking for the particular ParentID value. It is the number of pages scanned not the number of pages that the data is on that counts. As such, having all the relevent data on a small number of pages is only beneficial if the engine can know this and limit its retrieval to these pages only.|||So then; limiting the INSERT SP to a single processor might help reduce page splitting? Not "absolute guarantee", but reduce it.

Absolutely not. Limiting the number of CPUs that process an insert has no relevance to how data is stored on disk. And creating an "ordered heap" is a bit of a fantasy in 99% of cases, since you expect some deletes to happen on occasion. What do you think SQL Server is going to do with those gaps? Use 'em, that's what.

Besides which, you have eloquently explained in your first scenario EXACTLY why you would want a clustered index on the ParentID column. So the children are all clustered (pun intended) on the same set of pages.|||Besides which, you have eloquently explained in your first scenario EXACTLY why you would want a clustered index on the ParentID column. So the children are all clustered (pun intended) on the same set of pages.Just what I have been trying to say. The crucial difference between this clustered index and the magically ordered heap is that SS also can guarentee that these values are only on these pages and not on any others AND can get there quickly via the B-Tree so it does not need to hunt through the entire table looking for them.|||Absolutely not. Limiting the number of CPUs that process an insert has no relevance to how data is stored on disk. And creating an "ordered heap" is a bit of a fantasy in 99% of cases, since you expect some deletes to happen on occasion. What do you think SQL Server is going to do with those gaps? Use 'em, that's what.

Besides which, you have eloquently explained in your first scenario EXACTLY why you would want a clustered index on the ParentID column. So the children are all clustered (pun intended) on the same set of pages.
Drrrr (que anvel falling on my head). That actually did dawn on me last night when thinking of it after posting.

If you know Parent ID will determine a lot of read clusters during future queries, and being a "parent ID" (presumably auto sequenced), it will tend to be a good cluster key (adding to End).

However; if all families tend to grow (ie: Parent table rarely grows) then would page splitting occur a lot? That's where padded indexing and fill factors would come into play, I imagine.

Example: A scientific statistical program that maintains thousands of counters and is available to the scientific community on the internet (therefore; gets LOTS of reads). For sake of arguement, let's say that different scientists are only interested in their pet statistic but when they get it, all data for that statistic type (child rows for that parent) need to be read. So; the child table (the gathered statistic data) is constantly inserted into and the inserts are for all parent IDs (the statistic type).

This would benifit greatly from physical clustering by Statistic Type (parentID) but inserts would require enough padding in each statistic type (index padding?) to prevent page splits during Insert until the next index reorg can refresh the padding amount.

Thank you. I know it's painful teaching stubborn beginners.|||Broadly yes. Don't be too scared of page splits though. If you set a fill factor whereby you never get page splits then it is set too high. One of the problems with page splits is the increase in the number of pages needing to be read as they are not "full". Fill factor has exactly the same effect so putting in such a low fill factor pages never get full is counterproductive as you are increasing the number of pages that need to be read to satisfy queries.

Is OR so bad?

Hi all !
I have the following query which runs ok:
SELECT
estates.l_memberID,
member.l_memberID
FROM
estates
left join companies on estates.l_companyid=companies.l_companyid
left join multimedia on multimedia.l_estateid=estates.l_estateid and
isnull(N_INDEX,1)=1
left join member (NOLOCK) on estates.l_memberid=member.l_memberid
WHERE
isnull(estates.B_ONLYCOMPANIES,0) = 0 AND
isnull(estates.B_FIRSTHAND,0) = 0 AND
estates.n_state = 0 AND
estates.n_type = 7 AND
isnull(dat_show,getdate()) <= getdate() AND
(ISNULL(MULTIMEDIA.N_INDEX,0) = 0 AND
ISNULL(MULTIMEDIA.N_TYPE,0) = 0)
The thing is if I add the following row:
AND (@.RentMin = -1) OR (estates.L_RENT >= @.RentMin)
It goes from running under a second to a 20 second query! How is this
possible? I'm just asking SQL to ignore the parameter if it's -1.
Any light shed greatly appreciated,
Niclastonicvodka skrev:

> Hi all !
> I have the following query which runs ok:
> SELECT
> estates.l_memberID,
> member.l_memberID
> FROM
> estates
> left join companies on estates.l_companyid=companies.l_companyid
> left join multimedia on multimedia.l_estateid=estates.l_estateid and
> isnull(N_INDEX,1)=1
> left join member (NOLOCK) on estates.l_memberid=member.l_memberid
> WHERE
> isnull(estates.B_ONLYCOMPANIES,0) = 0 AND
> isnull(estates.B_FIRSTHAND,0) = 0 AND
> estates.n_state = 0 AND
> estates.n_type = 7 AND
> isnull(dat_show,getdate()) <= getdate() AND
> (ISNULL(MULTIMEDIA.N_INDEX,0) = 0 AND
> ISNULL(MULTIMEDIA.N_TYPE,0) = 0)
> The thing is if I add the following row:
> AND (@.RentMin = -1) OR (estates.L_RENT >= @.RentMin)
> It goes from running under a second to a 20 second query! How is this
> possible? I'm just asking SQL to ignore the parameter if it's -1.
> Any light shed greatly appreciated,
> Niclas
Do you mean to OR the last condition to everything else, or do you want
to OR it only to the condition on the same row? If the latter, add a
parenthesis:
AND ((@.RentMin = -1) OR (estates.L_RENT >= @.RentMin))
Mind you, I didn't really analyze the rest of the query.
/impslayer, aka Birger Johansson|||>
> The thing is if I add the following row:
> AND (@.RentMin = -1) OR (estates.L_RENT >= @.RentMin)
> It goes from running under a second to a 20 second query! How is this
> possible? I'm just asking SQL to ignore the parameter if it's -1.
> Any light shed greatly appreciated,
> Niclas
>
You are not actually just asking SQL to ignore the paramenter if it's -1.
Your statement comes out like:
condition1 AND
condition2 AND
...
OR Condition#.
You probably need another set of parenthesis. Try the following:
AND ( (@.RentMin = -1) OR (estates.L_RENT >= @.RentMin) )
Rick Sawtell
MCT, MCSD, MCDBA|||Thanks both for quick response!
Very true I was quick and sloppy...|||Though, while I have your attention, which is to prefer;
A stored proc with many rows of e.g.
AND ( (@.RentMin = -1) OR (estates.L_RENT >= @.RentMin) )
or
building the query in code, being able to filter away those rows, and
then sending it to SQL-server?|||"tonicvodka" <tonicvodka@.hotmail.com> wrote in message
news:1137157131.575494.295570@.g44g2000cwa.googlegroups.com...
> Though, while I have your attention, which is to prefer;
> A stored proc with many rows of e.g.
> AND ( (@.RentMin = -1) OR (estates.L_RENT >= @.RentMin) )
> or
> building the query in code, being able to filter away those rows, and
> then sending it to SQL-server?
>
The sproc will probably be faster as it is a series of AND statements. Each
one will continue to limit the data returned.
Rick Sawtell
MCT, MCSD, MCDBA|||On 13 Jan 2006 04:58:51 -0800, tonicvodka wrote:

>Though, while I have your attention, which is to prefer;
>A stored proc with many rows of e.g.
>AND ( (@.RentMin = -1) OR (estates.L_RENT >= @.RentMin) )
>or
>building the query in code, being able to filter away those rows, and
>then sending it to SQL-server?
Hi tonicvodka,
http://www.sommarskog.se/dynamic_sql.html
Hugo Kornelis, SQL Server MVP