游标循环与基于集合的查询 [英] Cursor looping versus set-based queries

查看:71
本文介绍了游标循环与基于集合的查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题已经被问到了。通常的答案是不要使用

游标或任何其他循环方法。相反,尝试找到一个使用基于集合的查询的解决方案




但这会带来几个问题/句子:

*我创建了几个存储过程,这些过程接受参数并将数据插入到相应的表中。这样做是为了方便访问/使用

来自客户端应用程序(即基于网络)。


正确的开发策略说要尝试做代码重用" ;.所以,如果我已经有存储过程来执行我的逻辑,那么我应该写一个处理数据的第二种方式吗?如果我需要更改

数据的处理方式,我现在必须在两个(或更多)

的位置进行相同的更改。


*需要将同一行中的不同数据插入多个

表中。 常识 (也许gut instinct更好)说每行处理
作为单位。对于一个

表处理整个集合似乎很奇怪,然后为另一个表处理整个集合AGAIN,然后

再次表示第三个表,依此类推。


*异常处理。基于集合的处理意味着如果一行失败

则整个集合失败。循环允许您连续失败,但

允许其他所有内容正确处理。它还允许您

收集统计数据。 (有多少人失败了,有多少人工作过,有多少人跳过了b $ b,等等。)


??好主意 ??另一种方法是创建一个临时表(沙箱

或工作空间类型的东西),将数据复制到那里以及状态。或者

valdation列,遍历集合多次查找

任何可能失败的行,将它们标记为这样,然后在最后只处理那些行中的那些行的
;通过"测试。当然,为了使这个工作起作用,你必须知道(并复制)所有约束,这样你才能知道在测试中要找什么。

解决方案



免责声明:如果没有更具体的信息,很难回答。

Ja ******* @ hotmail.com 写道:

我知道这个问题已被问到。通常的答案是不要使用游标或任何其他循环方法。相反,尝试找到一个使用基于集合的查询的解决方案。

但这会带来几个问题/句子:

*我创建了几个存储过程,获取参数并将数据插入适当的表中。这样做是为了方便访问/使用来自客户端应用程序(即基于Web的)。

正确的开发策略说要尝试做代码重用。那么,如果我已经存储了执行逻辑的存储过程,那么我应该编写第二种处理数据的方法吗?如果我需要改变
数据的处理方式,我现在必须在两个(或更多)
位置进行相同的更改。


如果SP封装了一个普通的INSERT,我就把它丢弃了。如果有更多

复杂逻辑涉及我可能会留下它。

*来自同一行的不同数据需要插入多个
表中。 常识 (可能是直觉本能更好)说要将每一行作为单位处理。对于一个
表处理整个集合,然后处理整个集合AGAIN用于另一个表,然后再次为第三个表格表示,似乎很奇怪,依此类推。


这闻起来就像一个设计问题。如果你必须将完全相同的

数据插入到多个表中,那么表格布局可能存在缺陷。

*异常处理。基于集合的处理意味着如果一行失败则整个集合失败。循环允许您连续失败,但允许其他所有内容正确处理。它还允许您收集统计信息。 (有多少人失败,有多少人工作,有多少人被跳过等等)


通常你想要全部或全部。这听起来好像你把
业务逻辑放到了数据库中。这可能是也可能不是一个好主意

但似乎有可能在中间层做得更好

(例如因为它可以减少移植到另一个RDBMS)。但这取决于你在SP'中的错误处理和操作上的


??好主意 ??另一种方法是创建一个临时表(沙箱
或工作空间类型的东西),将数据复制到那里以及状态。或者说valdation列,遍历集合多次查找可能失败的任何行,标记它们,然后最后只处理那些通过的行。测试。当然,为了使其工作,您必须知道(并复制)所有约束,以便您知道在测试中要查找的内容。




我不推荐一个或另一个方向,只有这么少的信息。


亲切的问候


罗伯特


2006年3月28日12:06:31 -0800, Ja ***** **@hotmail.com 写道:


(snip)

正确的开发策略说要尝试做代码重用。


你好Jay,


处理数据库时并不总是如此。你必须权衡代码重用(更便宜的维护)的好处,而不是使用

代码重复(更好的性能)。


在性能很重要的数据库中,我不会犹豫一秒钟复制存储过程并稍微改变以优化

特定情况。

*需要将同一行中的不同数据插入多个
表中。 常识 (可能是直觉本能更好)说要将每一行作为单位处理。对于一个
表处理整个集合,然后处理整个集合AGAIN用于另一个表,然后再次为第三个表格表示,似乎很奇怪,依此类推。


我同意Robert对这种设计的有效性的反应。但是如果我们假设这确实有效,那么你肯定会发现第二次,第三次等数据的成本比第二次,第三次等等。首先

访问,因为数据都可以从缓存中获取,并且不需要物理

磁盘I / O.


当然,你总是可以互相测试两个版本 - 我认为你会很难找到一个使用

光标优于使用的场景

a行中两次或三次相同的SELECT语句。

*异常处理。基于集合的处理意味着如果一行失败则整个集合失败。循环允许您连续失败,但允许其他所有内容正确处理。它还允许您收集统计信息。 (有多少失败,有多少工作,有多少被跳过等等)


在RDBMS中,事务应该是原子的(ACID属性中的A

交易) - 它们要么作为一个整体成功,要么它们以整个

失败。


如果你需要排除行这将违反约束,检查WHERE子句中的

约束。

另一种方法是创建一个临时表(沙箱
或工作空间类型的东西),将数据与状态一起复制到那里。或者说valdation列,遍历集合多次查找可能会失败的任何行,标记它们,


无需多次循环数据。在大多数情况下,您只需要一个带有CASE的单个UPDATE

来检查各种约束并设置

" status"相应的列。

然后最后只处理那些通过的行。测试。当然,为了使其工作,您必须知道(并复制)所有约束,以便您知道在测试中要查找的内容。




是的。你需要知道你的约束。


这不是你职业的正常部分吗?


-

Hugo Kornelis,SQL Server MVP


Ja **** ***@hotmail.com 写道:

我知道这个问题已经被问到了。通常的答案是不要使用游标或任何其他循环方法。相反,尝试找到一个使用基于集合的查询的解决方案。

但这会带来几个问题/句子:

*我创建了几个存储过程,获取参数并将数据插入适当的表中。这样做是为了方便访问/使用来自客户端应用程序(即基于Web的)。

正确的开发策略说要尝试做代码重用。那么,如果我已经存储了执行逻辑的存储过程,那么我应该编写第二种处理数据的方法吗?如果我需要改变
数据的处理方式,我现在必须在两个(或更多)
位置进行相同的更改。

*不同的数据来自需要将同一行插入多个表中。 常识 (可能是直觉本能更好)说要将每一行作为单位处理。对于一个
表处理整个集合,然后处理整个集合AGAIN为另一个表格,然后再次为第三个表格,等等,似乎很奇怪。
*异常处理。基于集合的处理意味着如果一行失败则整个集合失败。循环允许您连续失败,但允许其他所有内容正确处理。它还允许您收集统计信息。 (多少次失败,有多少人工作,有多少人被跳过等等)

??好主意 ??另一种方法是创建一个临时表(沙箱
或工作空间类型的东西),将数据复制到那里以及状态。或者说valdation列,遍历集合多次查找可能失败的任何行,标记它们,然后最后只处理那些通过的行。测试。当然,为了使其工作,您必须知道(并复制)所有约束,以便您知道在测试中要查找的内容。



另一个与一般编程语言相比,在SQL中重复使用代码不再是一个问题的原因是,从数据库自动生成脚本

非常容易的元数据。对于CRUD脚本,转换脚本,归档,审计,

等,这尤其是

案例。如果你有良好的约束和命名约定,那么你可以

可在几秒钟内自动生成数千行代码。那个

可能没有Robert和Hugo的其他明智的评论那么重要但是

这是一个明显的优势。


-

David Portas,SQL Server MVP


只要有可能,请发布足够的代码来重现您的问题。

包括CREATE TABLE和INSERT语句通常有帮助。

说明您正在使用的SQL Server版本,并指定任何错误消息的内容




SQL Server联机丛书:
http://msdn2.microsoft.com/library/m...S,SQL.90).aspx

-


I know this question has been asked. And the usual answer is don''t use
cursors or any other looping method. Instead, try to find a solution
that uses set-based queries.

But this brings up several questions / senarios:

* I created several stored procedures that take parameters and inserts
the data into the appropriate tables. This was done for easy access/use
from client side apps (i.e. web-based).

Proper development tactics says to try and do "code reuse". So, if I
already have stored procs that do my logic, should I be writing a
second way of handling the data? If I ever need to change the way the
data is handled, I now have to make the same change in two (or more)
places.

* Different data from the same row needs to be inserted into multiple
tables. "Common sense" (maybe "gut instinct" is better) says to handle
each row as a "unit". Seems weird to process the entire set for one
table, then to process the entire set AGAIN for another table, and then
YET AGAIN for a third table, and so on.

* Exception handling. Set based processing means that if one row fails
the entire set fails. Looping through allows you to fail a row but
allow everything else to be processed properly. It also allows you to
gather statistics. (How many failed, how many worked, how many were
skipped, etc.)

?? Good idea ?? The alternative is to create a temporary table (sandbox
or workspace type thing), copy the data to there along with "status" or
"valdation" columns, run through the set many times over looking for
any rows that may fail, marking them as such, and then at the end only
dealing with those rows which "passed" the testing. Of course, in order
for this to work you must know (and duplicate) all constraints so you
know what to look for in your testing.

解决方案


Disclaimer: difficult to answer without more specific info.

Ja*******@hotmail.com wrote:

I know this question has been asked. And the usual answer is don''t use
cursors or any other looping method. Instead, try to find a solution
that uses set-based queries.

But this brings up several questions / senarios:

* I created several stored procedures that take parameters and inserts
the data into the appropriate tables. This was done for easy access/use
from client side apps (i.e. web-based).

Proper development tactics says to try and do "code reuse". So, if I
already have stored procs that do my logic, should I be writing a
second way of handling the data? If I ever need to change the way the
data is handled, I now have to make the same change in two (or more)
places.
If a SP encapsulates a plain INSERT I''d throw it out. If there is more
complex logic involved I''d probable leave it in.
* Different data from the same row needs to be inserted into multiple
tables. "Common sense" (maybe "gut instinct" is better) says to handle
each row as a "unit". Seems weird to process the entire set for one
table, then to process the entire set AGAIN for another table, and then
YET AGAIN for a third table, and so on.
This smells like a design issue. If you have to insert the exact same
data into multiple tables chances are that your table layout is flawed.
* Exception handling. Set based processing means that if one row fails
the entire set fails. Looping through allows you to fail a row but
allow everything else to be processed properly. It also allows you to
gather statistics. (How many failed, how many worked, how many were
skipped, etc.)
Normally you want all or nothing. This one sounds as if you placed
business logic into to database. This might or might not be a good idea
but there seems to be a chance that this is better done in a middle tier
(for example because it eases porting to another RDBMS). But it depends
on the error handling and operations you do in SP''s.
?? Good idea ?? The alternative is to create a temporary table (sandbox
or workspace type thing), copy the data to there along with "status" or
"valdation" columns, run through the set many times over looking for
any rows that may fail, marking them as such, and then at the end only
dealing with those rows which "passed" the testing. Of course, in order
for this to work you must know (and duplicate) all constraints so you
know what to look for in your testing.



I wouldn''t recommend one or the other direction with so few info.

Kind regards

robert


On 28 Mar 2006 12:06:31 -0800, Ja*******@hotmail.com wrote:

(snip)

Proper development tactics says to try and do "code reuse".
Hi Jay,

This is not always true when dealing with databases. You have to weigh
the benefit of code reuse (cheaper maintenance) against the benefit of
code duplication (better performance).

In databases where performance matters, I won''t hesitate a single second
to duplicate a stored procedure and change it slightly to optimize for
specific situations.
* Different data from the same row needs to be inserted into multiple
tables. "Common sense" (maybe "gut instinct" is better) says to handle
each row as a "unit". Seems weird to process the entire set for one
table, then to process the entire set AGAIN for another table, and then
YET AGAIN for a third table, and so on.
I agree with Robert''s reaction about the validitiy of such a design. But
if we assume that this really is valid, then you''ll probablly find the
cost of the second, third, etc. access to the data faster than the first
access, because the data can all be fetched from cache and no physical
disk I/O is needed.

Of course, you can always test both versions against each other - I
think that you''ll be hard-pressed to find a scenario where using a
cursor outperforms using the same SELECT statement two or three times in
a row.
* Exception handling. Set based processing means that if one row fails
the entire set fails. Looping through allows you to fail a row but
allow everything else to be processed properly. It also allows you to
gather statistics. (How many failed, how many worked, how many were
skipped, etc.)
In an RDBMS, transactions should be atomic (the A in the ACID properties
of transactions) - they either succeed as a whole, or they fail as a
whole.

If you need to exclude rows that would violate a constraint, check the
constraint in the WHERE clause.
The alternative is to create a temporary table (sandbox
or workspace type thing), copy the data to there along with "status" or
"valdation" columns, run through the set many times over looking for
any rows that may fail, marking them as such,
No need to loop over the data many times. In most cases, you only need a
single UPDATE with a CASE to check the variuous constraints and set the
"status" column accordingly.
and then at the end only
dealing with those rows which "passed" the testing. Of course, in order
for this to work you must know (and duplicate) all constraints so you
know what to look for in your testing.



Yes. You need to know your constraints.

Isn''t that a normal part of your job?

--
Hugo Kornelis, SQL Server MVP


Ja*******@hotmail.com wrote:

I know this question has been asked. And the usual answer is don''t use
cursors or any other looping method. Instead, try to find a solution
that uses set-based queries.

But this brings up several questions / senarios:

* I created several stored procedures that take parameters and inserts
the data into the appropriate tables. This was done for easy access/use
from client side apps (i.e. web-based).

Proper development tactics says to try and do "code reuse". So, if I
already have stored procs that do my logic, should I be writing a
second way of handling the data? If I ever need to change the way the
data is handled, I now have to make the same change in two (or more)
places.

* Different data from the same row needs to be inserted into multiple
tables. "Common sense" (maybe "gut instinct" is better) says to handle
each row as a "unit". Seems weird to process the entire set for one
table, then to process the entire set AGAIN for another table, and then
YET AGAIN for a third table, and so on.

* Exception handling. Set based processing means that if one row fails
the entire set fails. Looping through allows you to fail a row but
allow everything else to be processed properly. It also allows you to
gather statistics. (How many failed, how many worked, how many were
skipped, etc.)

?? Good idea ?? The alternative is to create a temporary table (sandbox
or workspace type thing), copy the data to there along with "status" or
"valdation" columns, run through the set many times over looking for
any rows that may fail, marking them as such, and then at the end only
dealing with those rows which "passed" the testing. Of course, in order
for this to work you must know (and duplicate) all constraints so you
know what to look for in your testing.


Another reason why code re-use is less of an issue in SQL compared to
general programming languages is that it''s so easy to generate scripts
automatically from your database''s metadata. That''s particularly the
case for CRUD scripts, transformation scripts, archiving, auditing,
etc. If you have good constraints and naming conventions then you can
automate the production of thousands of lines of code in seconds. That
maybe not as important as Robert and Hugo''s other sensible comments but
it is a distinct advantage.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/m...S,SQL.90).aspx
--


这篇关于游标循环与基于集合的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆