udf vs直接sql性能 [英] udf vs direct sql performance

查看:162
本文介绍了udf vs直接sql性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用MSSQL 2005

Using MSSQL 2005

我今天在Scalar UDF中的where语句中玩耍,以查看与进行呼叫和io差异等相关的一些费用.

I was playing around today with a Scalar UDF in a where statement to see some of the costs associated with making the call and io differences etc.

我从2个基本表开始.具有一百万行的客户.和购买量为100,000的商品.两者都有一个自动标识列作为主键.没有定义其他索引.

I'm starting with 2 basic tables. Customer which has 1 million rows. and Purchases which has 100,000. Both have an auto identity column as Primary key. No other indexes defined.

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SET STATISTICS IO ON
    SELECT * FROM Customer C 
    INNER JOIN Purchases P on C.[IDENTITY] = P.CustomerID
    WHERE P.Amount > 1000
SET STATISTICS IO OFF

这将返回

Table 'Customer'. Scan count 0, logical reads 3295, physical reads 1, read-ahead reads 32, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Purchases'. Scan count 1, logical reads 373, physical reads 1, read-ahead reads 370, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

因此,仅查看标量UDF的影响,然后将P.Amount> 1000移至UDF.功能如下:

So just to see the impact of a scalar UDF I then just moved the P.Amount > 1000 to a UDF. Function is as follows:

CREATE FUNCTION [dbo].[HighValuePurchase]
(
    @value int
)
RETURNS bit
AS
BEGIN
    DECLARE @highValue bit
    SET @highValue = '0'

    IF @value > 1000
    BEGIN
        SET @highValue = '1'
    END
    RETURN @highValue
END

因此,我随后运行以下查询:

So I then ran the following query:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SET STATISTICS IO ON      
    SELECT * FROM Customer C 
    INNER JOIN Purchases P on C.[IDENTITY] = P.CustomerID
    WHERE dbo.HighValuePurchase(P.Amount) = '1'
SET STATISTICS IO OFF

我原以为这会变得更糟.该查询返回以下IO统计信息:

I was expecting this to run worse. This query returned the following IO statistics:

Table 'Purchases'. Scan count 1, logical reads 373, physical reads 1, read-ahead reads 370, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 1, logical reads 35, physical reads 3, read-ahead reads 472, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

此查询的返回速度也比> 1000查询的查询速度快.当返回相同的行时,调用UDF的行的顺序由C. [IDENTITY]自动排序,而其他查询似乎未排序.这可能是由于执行计划中合并的方式所致.计划概要如下.

This also returned faster than the > 1000 query. While the same rows were returned the ordering of the one calling the UDF was automatically sorted by C.[IDENTITY] where the other query appeared unsorted. This is likely due to the way the combines were done in the execution plans. Outline of the plans is below.

非UDF的执行计划显示了针对购买的聚集索引扫描和针对嵌套联接相结合的客户的聚集索引查找.

Execution plan for the non UDF shows a Clustered Index scan for Purchases and a Clustered Index seek for Customers combined at a nested join.

UDF版本的执行计划显示了对采购的聚集索引扫描,然后是过滤器,然后是排序.客户上有一个聚集的索引扫描.然后将结果合并到合并联接中.

Execution plan for the UDF version shows a clustered index scan for purchases, then a filter, then a sort. There's a clustered Index scan on Customer. Then the results are combined in a Merge Join.

我确定这与缺少索引等有关,但是我不确定为什么这些结果是正确的.我经历了UDF运行缓慢的痛苦,每个人都说使用它们通常不是一个好主意,这就是为什么我将这个测试放在一起进行测试的原因.我目前无法解释为什么UDF版本似乎要好得多.

I'm sure this has to do with lack of indexes etc, but I'm unsure why these results are the way they are. I've experienced UDF's running painfully slow and everyone says using them is usually a bad idea, which is why I threw this test together. I can't explain currently why the UDF version seems to be so much better.

推荐答案

  • 如果要加入Purchases.CustomerID,则应在其上添加索引.
  • 如果您经常查询值范围,则也应该在其上添加索引.
    • If you want to join on Purchases.CustomerID you should put an index on it.
    • If you often query on value ranges you should put an index on that too.
    • 您正在要求SQL Server在两个错误的计划之间进行选择.

      As it is you are asking SQL server to choose between two bad plans.

      SQL Server可以大致猜测> 1000查询将涵盖多少采购,并根据此选择计划.

      SQL Server can guess roughly how many purchases will be covered by the > 1000 query, and will pick a plan based on that.

      但是,它无法猜测UDF查询将覆盖多少个,因此可以选择其他计划.因为它一直处于无知状态,所以根据其猜测的好坏,它可能比其他计划更好或更坏.

      However it can't guess how many will be covered by the UDF query, so may pick a different plan. Because it is going on ignorance it might be better or worse than the other plan depending on how good it's guess is.

      您可以看到生成的计划,它将告诉您每个计划中的估计行数以及实际行数.在每种情况下,这些估计的数字都说明了计划的选择.

      You can see the plans generated and it will tell you the estimated number of rows in each plan, and also the actual number. Those estimated numbers account for the choice of plan in each case.

      这篇关于udf vs直接sql性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆