存储过程一般来说比现代RDBMS的内联语句更高效吗? [英] Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's?

查看:104
本文介绍了存储过程一般来说比现代RDBMS的内联语句更高效吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

传统的智慧表明,存储过程总是更快。因此,由于他们总是更快,请使用所有时间



我确信这是基于一些历史背景下,曾经是这样。现在,我不主张不需要存储过程,但是我想知道在现代数据库(如MySql,SqlServer,Oracle或者)中存储过程是什么情况。

解决方案


注意这是一般看看存储过程没有规定的特定
DBMS。一些DBMS(甚至是不同的
版本的相同的DBMS!)可能操作
与此相反,所以你想要
与你的目标DBMS
假设所有这一切仍然成立。



我已经是一个Sybase ASE,MySQL和SQL Server DBA开启和关闭了近十年(以及应用程序开发在C,PHP,PL / SQL,C#.NET和Ruby)。所以,我在这个(有时)神圣战争中没有特别的斧头。


存储过程的历史性能优势一般是




  • 预先解析的SQL

  • 预先生成查询执行计划

  • 减少网络延迟

  • 潜在的缓存优势


$ b b

预先解析的SQL - 与编译或解释代码类似的优点,除非是微观级别。



仍然有优势吗?
在现代CPU上并不是很明显,但如果您发送一个SQL语句每秒大小十亿亿次,解析开销可以累加。



预生成的查询执行计划
如果你有很多JOIN,排列可以变得相当难以管理(现代优化器由于性能原因有限制和截止)。非常复杂的SQL具有不同的,可测量的(我已经看到一个复杂的查询需要10多秒,只是为了生成一个计划,在我们调整DBMS)延迟,由于优化器试图找出最好的执行计划。



仍然是一个优势?
大多数DBMS的存储过程将会存储在内存中, (最新版本)将缓存INDIVIDUAL SQL语句的查询计划,大大降低了存储过程和临时SQL之间的性能差异。有一些注意事项和情况不是这样,所以你需要测试你的目标DBMS。



此外,越来越多的DBMS允许你提供优化器路径计划(抽象查询计划)以显着减少优化时间(对于特别和存储过程SQL !!)。


警告缓存查询计划不是性能灵活性。有时生成的查询计划不是最佳的。
例如,如果您发送 SELECT *
FROM table WHERE id BETWEEN 1 AND
99999999
,DBMS可能会选择
全表扫描而不是索引
扫描,因为你正在抓取表中的每一行
(所以说
统计)。如果这是缓存的
版本,那么当您稍后发送
SELECT * FROM table WHERE id BETWEEN
1 AND 2
。这背后的原因是
超出了本文的范围,但是
进一步阅读:
http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx

http://msdn.microsoft.com/en-us/library/ms181055.aspx
http://www.simple-talk。 com / sql / performance / execution-plan-basics /



总之,他们确定
提供除$ b $当执行编译或
重新编译时,b通用值导致
优化器编译和缓存
该特定
值的查询计划。然而,当查询计划是
重用于随后执行的
同一查询的通用值
('M','R'或'T'),它导致
次优性能。这个
次优性能问题
存在,直到查询是
重新编译。此时,基于
提供的@ P1参数值,
查询可能有也可能没有
性能问题。


减少网络延迟
A)如果您反复运行相同的SQL,而且SQL累积了许多KB的代码 - 用一个简单的exec foobar替换它可以真正地加起来
B)存储的procs可以用于将过程代码移动到DBMS中,这样可以将大量的数据转移到客户端,类似于在DBMS中与在代码中执行JOIN(每个人都喜欢的WTF!)



仍然一个优势?
A)现代1Gb(和10Gb及以上)以太网真的让这个可以忽略
B)取决于你的网络饱和度 - 为什么推几兆字节的数据

潜在的缓存优势
如果您有需要,执行服务器端数据转换可能会更快



仍然有优势吗?
除非您的应用程式有共享内存访问DBMS数据,边缘总是存储过程。



当然,没有讨论存储过程优化将没有讨论参数化和ad hoc SQL。



参数化/准备的SQL

存储过程和ad hoc SQL之间的交叉,是使用查询值的参数的主机语言嵌入的SQL语句,例如:

  SELECT .. FROM yourtable WHERE foo =? AND bar =? 

这些提供了一个更普遍的查询,现代优化器可以使用它来缓存 - use)查询执行计划,导致存储过程的许多性能优势。



Ad Hoc SQL
一个控制台窗口到您的DBMS并键入一个SQL语句。在过去,这些是最差的执行者(平均),因为DBMS没有预先优化查询,如在参数化/存储proc方法。



仍然有缺点吗?
不一定。大多数DBMS都能够将特定SQL抽象为参数化版本 - 因此或多或少地消除了两者之间的差异。有些操作隐式执行或必须使用命令设置启用(SQL服务器: http:/ /msdn.microsoft.com/en-us/library/ms175037.aspx ,Oracle: http: //www.praetoriate.com/oracle_tips_cursor_sharing.htm )。



经验教训?
摩尔定律继续行军和DBMS优化器,与每个发布,得到更复杂。当然,你可以把每一个愚蠢的少年SQL语句放在一个存储过程中,但只是知道,工作在优化器的程序员是非常聪明,并不断寻找提高性能的方法。最终(如果它不在这里)临时SQL性能将与存储过程性能无法区分(平均!),因此任何种类的大规模存储过程使用**仅为性能原因**确实听起来像是对我过早优化。



无论如何,我认为如果你避免边缘情况和相当的香草SQL,你不会注意到ad hoc和存储过程。


Conventional wisdom states that stored procedures are always faster. So, since they're always faster, use them ALL THE TIME.

I am pretty sure this is grounded in some historical context where this was once the case. Now, I'm not advocating that Stored Procs are not needed, but I want to know in what cases stored procs are necessary in modern databases such as MySql, SqlServer, Oracle, or . Is it overkill to have ALL access through stored procs?

解决方案

NOTE that this is a general look at stored procedures not regulated to a specific DBMS. Some DBMS (and even, different versions of the same DBMS!) may operate contrary to this, so you'll want to double-check with your target DBMS before assuming all of this still holds.

I've been a Sybase ASE, MySQL, and SQL Server DBA on-and off since for almost a decade (along with application development in C, PHP, PL/SQL, C#.NET, and Ruby). So, I have no particular axe to grind in this (sometimes) holy war.

The historical performance benefit of stored procs have generally been from the following (in no particular order):

  • Pre-parsed SQL
  • Pre-generated query execution plan
  • Reduced network latency
  • Potential cache benefits

Pre-parsed SQL -- similar benefits to compiled vs. interpreted code, except on a very micro level.

Still an advantage? Not very noticeable at all on the modern CPU, but if you are sending a single SQL statement that is VERY large eleventy-billion times a second, the parsing overhead can add up.

Pre-generated query execution plan. If you have many JOINs the permutations can grow quite unmanageable (modern optimizers have limits and cut-offs for performance reasons). It is not unknown for very complicated SQL to have distinct, measurable (I've seen a complicated query take 10+ seconds just to generate a plan, before we tweaked the DBMS) latencies due to the optimizer trying to figure out the "near best" execution plan. Stored procedures will, generally, store this in memory so you can avoid this overhead.

Still an advantage? Most DBMS' (the latest editions) will cache the query plans for INDIVIDUAL SQL statements, greatly reducing the performance differential between stored procs and ad hoc SQL. There are some caveats and cases in which this isn't the case, so you'll need to test on your target DBMS.

Also, more and more DBMS allow you to provide optimizer path plans (abstract query plans) to significantly reduce optimization time (for both ad hoc and stored procedure SQL!!).

WARNING Cached query plans are not a performance panacea. Occasionally the query plan that is generated is sub-optimal. For example, if you send SELECT * FROM table WHERE id BETWEEN 1 AND 99999999, the DBMS may select a full-table scan instead of an index scan because you're grabbing every row in the table (so sayeth the statistics). If this is the cached version, then you can get poor performance when you later send SELECT * FROM table WHERE id BETWEEN 1 AND 2. The reasoning behind this is outside the scope of this posting, but for further reading see: http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx and http://msdn.microsoft.com/en-us/library/ms181055.aspx and http://www.simple-talk.com/sql/performance/execution-plan-basics/

"In summary, they determined that supplying anything other than the common values when a compile or recompile was performed resulted in the optimizer compiling and caching the query plan for that particular value. Yet, when that query plan was reused for subsequent executions of the same query for the common values (‘M’, ‘R’, or ‘T’), it resulted in sub-optimal performance. This sub-optimal performance problem existed until the query was recompiled. At that point, based on the @P1 parameter value supplied, the query might or might not have a performance problem."

Reduced network latency A) If you are running the same SQL over and over -- and the SQL adds up to many KB of code -- replacing that with a simple "exec foobar" can really add up. B) Stored procs can be used to move procedural code into the DBMS. This saves shuffling large amounts of data off to the client only to have it send a trickle of info back (or none at all!). Analogous to doing a JOIN in the DBMS vs. in your code (everyone's favorite WTF!)

Still an advantage? A) Modern 1Gb (and 10Gb and up!) Ethernet really make this negligible. B) Depends on how saturated your network is -- why shove several megabytes of data back and forth for no good reason?

Potential cache benefits Performing server-side transforms of data can potentially be faster if you have sufficient memory on the DBMS and the data you need is in memory of the server.

Still an advantage? Unless your app has shared memory access to DBMS data, the edge will always be to stored procs.

Of course, no discussion of Stored Procedure optimization would be complete without a discussion of parameterized and ad hoc SQL.

Parameterized / Prepared SQL
Kind of a cross between stored procedures and ad hoc SQL, they are embedded SQL statements in a host language that uses "parameters" for query values, e.g.:

SELECT .. FROM yourtable WHERE foo = ? AND bar = ?

These provide a more generalized version of a query that modern-day optimizers can use to cache (and re-use) the query execution plan, resulting in much of the performance benefit of stored procedures.

Ad Hoc SQL Just open a console window to your DBMS and type in a SQL statement. In the past, these were the "worst" performers (on average) since the DBMS had no way of pre-optimizing the queries as in the parameterized/stored proc method.

Still a disadvantage? Not necessarily. Most DBMS have the ability to "abstract" ad hoc SQL into parameterized versions -- thus more or less negating the difference between the two. Some do this implicitly or must be enabled with a command setting (SQL server: http://msdn.microsoft.com/en-us/library/ms175037.aspx , Oracle: http://www.praetoriate.com/oracle_tips_cursor_sharing.htm).

Lessons learned? Moore's law continues to march on and DBMS optimizers, with every release, get more sophisticated. Sure, you can place every single silly teeny SQL statement inside a stored proc, but just know that the programmers working on optimizers are very smart and are continually looking for ways to improve performance. Eventually (if it's not here already) ad hoc SQL performance will become indistinguishable (on average!) from stored procedure performance, so any sort of massive stored procedure use ** solely for "performance reasons"** sure sounds like premature optimization to me.

Anyway, I think if you avoid the edge cases and have fairly vanilla SQL, you won't notice a difference between ad hoc and stored procedures.

这篇关于存储过程一般来说比现代RDBMS的内联语句更高效吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆