在许多条件下,基于集合的计划运行速度比标量值函数慢 [英] Set based plan runs slower than scalar valued function with many conditions

查看:24
本文介绍了在许多条件下,基于集合的计划运行速度比标量值函数慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题更像是一个假设而不是实际的代码问题.但是我提供了一个简化版的代码来说明这个问题.请不要评论代码本身的愚蠢.实际的代码太复杂(而且是专有的),所以这是最好的处理方式.

我有一个标量值函数,如下所示.

创建函数 [dbo].[Compute_value](@alpha 浮动,@bravo 浮动,@查尔斯·弗洛特,@delta 浮动)回报浮动作为开始如果@alpha 为空或@alpha = 0 或@delta 为空或@delta = 0返回 0如果@bravo 为空或@bravo <= 0返回 100IF (@charle + @delta)/@bravo <= 0返回 100声明 @x = DATEDIFF(GETDATE(),'1/1/2000')返回@alpha * POWER((100/@delta), (-2 * POWER(@charle * @bravo, @x/365)))结尾

我听说表值函数通常比标量值函数运行得快得多,因为它们不是 RBAR.因此,我将逻辑转换为使用 #temp_table 构造来对其进行基准测试.我将拥有与#temp_table 相同数量的更新,而不是大约一打 IF 语句,并且它的运行速度是标量 UDF 的两倍.

我认为这可能是因为 UDF 可以在前几个条件下快速返回,从而导致大部分标量 UDF 无操作,但事实并非如此.检查#temp_table 解决方案的查询执行计划似乎表明更新导致了大部分计划成本.

我在这里可能遗漏了什么?如果我将其转换为表值函数,我是否一直在为每个条件语句更新整个表变量?有没有办法避免这种情况,这似乎大大减慢了速度?我在这里遗漏了一些明显的东西吗?

解决方案

这里的关键词是INLINE TABLE VALUED FUNCTIONS.您有两种类型的 T-SQL 表值函数:多语句和内联.如果您的 T-SQL 函数以 BEGIN 语句开头,那么它将是废话——标量或其他.您无法将临时表放入 内联 表值函数中,因此我假设您从标量变为多语句表值函数,这可能会更糟.

您的内联表值函数 (iTVF) 应如下所示:

创建函数 [dbo].[Compute_value](@alpha 浮动,@bravo 浮动,@查尔斯·弗洛特,@delta 浮动)带有架构绑定的返回表作为返回选择新值 =当@alpha 为空或@alpha = 0 或@delta 为空或@delta = 0 THEN 0 时的情况当@bravo 为空或@bravo <= 0 THEN 100否则@alpha * 功率((100/@delta),(-2 * POWER(@charle * @bravo, DATEDIFF(<计量单位>,GETDATE(),'1/1/2000')/365)))结尾走;

请注意,在您发布的代码中,您的 DATEDIFF 语句缺少 datepart 参数.如果应该看起来像:

@x int = DATEDIFF(DAY, GETDATE(),'1/1/2000')

更进一步 - 了解为什么 iTVF 优于 T-SQL 标量值用户定义函数很重要.这不是因为表值函数比标量值函数快,而是因为 Microsoft 对 T-SQL 内联函数的实现比非内联 T-SQL 函数的实现快.请注意以下三个执行相同操作的函数:

-- 标量版本创建函数 dbo.Compute_value_scalar(@alpha 浮动,@bravo 浮动,@查尔斯·弗洛特,@delta 浮动)回报浮动作为开始如果@alpha 为空或@alpha = 0 或@delta 为空或@delta = 0返回 0如果@bravo 为空或@bravo <= 0返回 100IF (@charle + @delta)/@bravo <= 0返回 100声明@x int = DATEDIFF(dd, GETDATE(),'1/1/2000')返回@alpha * POWER((100/@delta), (-2 * POWER(@charle * @bravo, @x/365)))结尾走-- 多语句表值函数创建函数 dbo.Compute_value_mtvf(@alpha 浮动,@bravo 浮动,@查尔斯·弗洛特,@delta 浮动)返回@sometable TABLE (newValue float) AS开始插入@sometable 值(当@alpha 为空或@alpha = 0 或@delta 为空或@delta = 0 THEN 0 时的情况当@bravo 为空或@bravo <= 0 THEN 100否则@alpha * 功率((100/@delta),(-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))结尾)返回;结尾走-- INLINE 表值函数创建函数 dbo.Compute_value_itvf(@alpha 浮动,@bravo 浮动,@查尔斯·弗洛特,@delta 浮动)带有架构绑定的返回表作为返回选择新值 =当@alpha 为空或@alpha = 0 或@delta 为空或@delta = 0 THEN 0 时的情况当@bravo 为空或@bravo <= 0 THEN 100否则@alpha * 功率((100/@delta),(-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))结尾走

现在进行一些示例数据和性能测试:

SET NOCOUNT ON;CREATE TABLE #someTable (alpha FLOAT, bravo FLOAT, charle FLOAT, delta FLOAT);插入#someTable选择顶部 (100000)abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1,abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1FROM sys.all_columns a, sys.all_columns b;PRINT char(10)+char(13)+'scalar'+char(10)+char(13)+replicate('-',60);走声明@st datetime = getdate(), @z float;SELECT @z = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)从#someTable t;PRINT DATEDIFF(ms, @st, getdate());走PRINT char(10)+char(13)+'mtvf'+char(10)+char(13)+replicate('-',60);走声明@st datetime = getdate(), @z float;选择@z = f.newValue从#someTable t交叉应用 dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f;PRINT DATEDIFF(ms, @st, getdate());走PRINT char(10)+char(13)+'itvf'+char(10)+char(13)+replicate('-',60);走声明@st datetime = getdate(), @z float;选择@z = f.newValue从#someTable t交叉应用 dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f;PRINT DATEDIFF(ms, @st, getdate());走

结果:

标量------------------------------------------------------------2786电视转播------------------------------------------------------------41536电视转播------------------------------------------------------------153

标量 udf 运行了 2.7 秒,mtvf 运行了 41 秒,iTVF 运行了 0.153 秒.要了解为什么让我们看一下估计的执行计划:

当您查看实际执行计划时,您不会看到这一点,但是,对于标量 udf 和 mtvf,优化器会为每一行调用一些执行不佳的子例程;iTVF 没有.引用

您看到的 iTVF 执行计划的箭头是并行性 - 所有 CPU(或 SQL 实例的 MAXDOP 设置允许的数量)协同工作.T-SQL 标量和 mtvf UDF 无法做到这一点.当 Microsoft 引入内联标量 UDF 时,我会建议那些用于您正在做的事情,但在那之前:如果您正在寻找性能,那么内联是唯一的出路,为此,iTVF 是唯一的游戏城里.

请注意,我在谈到函数时一直强调 T-SQL... CLR 标量函数和表值函数也可以,但这是一个不同的话题.>

This question is a bit more of a hypothetical than an actual code problem. But I have provided a dumbed down version of the code to illustrate the question. Please no comments about the silliness of the code itself. The actual code is too complicated (and proprietary) so this is the best way to proceed.

I have a scalar valued function as follows.

CREATE FUNCTION [dbo].[Compute_value]
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS FLOAT
AS
BEGIN
    IF @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 
    RETURN 0

    IF @bravo IS NULL OR @bravo <= 0
        RETURN 100

    IF (@charle + @delta) / @bravo <= 0
        RETURN 100
    DECLARE @x = DATEDIFF(GETDATE(),'1/1/2000')     
    RETURN @alpha * POWER((100 / @delta), (-2 * POWER(@charle * @bravo, @x/365)))
END

I have heard that table valued functions typically run much faster than scalar valued functions because they aren't RBAR. So I converted the logic to use a #temp_table construct just to benchmark it. Instead of about a dozen IF statements, I would have an equal number of UPDATEs to the #temp_table, and it ran twice as SLOWLY as the scalar UDF.

I thought perhaps this was happening because the UDF could return quickly on the first few conditions, thus causing much of the scalar UDF to be no-ops, but this is not the case. Examining the query execution plans for the #temp_table solution seem to indicate that the updates are causing most of the plan cost.

What might I be missing here? If I convert this to a table value function, am I stuck doing updates to the entire table variable for each conditional statement? Is there a way to avoid this, which seems to be slowing things down considerably? Am I missing something obvious here?

解决方案

The keyword term here is INLINE TABLE VALUED FUNCTIONS. You have two types of T-SQL tabled valued functions: multi-statement and inline. If your T-SQL function starts with a BEGIN statement then it's going to be crap - scalar or otherwise. You can't get a temp table into an inline table valued function so I'm assuming you went from scalar to mutli-statement table valued function which will probably be worse.

Your inline table valued function (iTVF) should look something like this:

CREATE FUNCTION [dbo].[Compute_value]
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue = 
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(<unit of measurement>,GETDATE(),'1/1/2000')/365)))
  END
GO;

Note that, in the code you posted, your DATEDIFF statement is missing the datepart parameter. If should look something like:

@x int = DATEDIFF(DAY, GETDATE(),'1/1/2000')   

Going a little further - it's important to understand why iTVF's are better than T-SQL scalar valued user-defined functions. It's not because table valued functions are faster than scalar valued functions, it's because Microsoft's implementation of T-SQL inline functions are faster than their implementation of T-SQL functions that are not inline. Note the following three functions that do the same thing:

-- Scalar version
CREATE FUNCTION dbo.Compute_value_scalar
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS FLOAT
AS
BEGIN
    IF @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 
    RETURN 0

    IF @bravo IS NULL OR @bravo <= 0
        RETURN 100

    IF (@charle + @delta) / @bravo <= 0
        RETURN 100
    DECLARE @x int = DATEDIFF(dd, GETDATE(),'1/1/2000')     
    RETURN @alpha * POWER((100 / @delta), (-2 * POWER(@charle * @bravo, @x/365)))
END
GO

-- multi-statement table valued function 
CREATE FUNCTION dbo.Compute_value_mtvf
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS  @sometable TABLE (newValue float) AS 
    BEGIN
    INSERT @sometable VALUES
(
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
  END
)
RETURN;
END
GO

-- INLINE table valued function
CREATE FUNCTION dbo.Compute_value_itvf
(
  @alpha FLOAT,
  @bravo FLOAT,
  @charle FLOAT,
  @delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue = 
  CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
       WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
       ELSE @alpha * POWER((100 / @delta), 
             (-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
  END
GO

Now for some sample data and performance test:

SET NOCOUNT ON;
CREATE TABLE #someTable (alpha FLOAT, bravo FLOAT, charle FLOAT, delta FLOAT);
INSERT #someTable
SELECT TOP (100000)
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a, sys.all_columns b;

PRINT char(10)+char(13)+'scalar'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'mtvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'itvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

Results:

scalar
------------------------------------------------------------
2786

mTVF
------------------------------------------------------------
41536

iTVF
------------------------------------------------------------
153

The scalar udf ran for 2.7 seconds, 41 seconds for the mtvf and 0.153 seconds for the iTVF. To understand why let's look at the estimated execution plans:

You don't see this when you look at the actual execution plan but, with the scalar udf and mtvf, the optimizer calls some poorly executed subroutine for each row; the iTVF does not. Quoting Paul White's career changing article about APPLY Paul writes:

You might find it useful to think of an iTVF as a view that accepts parameters. Just as for views, SQL Server expands the definition of an iTVF directly into the query plan of an enclosing query, before optimization is performed.

The effect is that SQL Server is able to apply its full range of optimizations, considering the query as a whole. It is just as if you had written the expanded query out by hand....

In other words, iTVF's enable to optimizer to optimize the query in ways that just aren't possible when all that other code needs to be executed. One of many other examples of why iTVFs are superior is they are the only one of the three aforementioned function types that allow parallelism. Let's run each function one more time, this time with the Actual Execution plan turned on and with traceflag 8649 (which forces a parallel execution plan):

-- don't need so many rows for this test
TRUNCATE TABLE #sometable;
INSERT #someTable 
SELECT TOP (10)
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1, 
  abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a;

DECLARE @x float;

SELECT TOP (10) @x = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t
ORDER BY dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
OPTION (QUERYTRACEON 8649);

SELECT TOP (10)  @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

SELECT @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

Execution plans:

Those arrows you see for the iTVF's execution plan is parallelism - all your CPU's (or as many as your SQL instance's MAXDOP settings allow) working together. T-SQL scalar and mtvf UDFs can't do that. When Microsoft introduces inline scalar UDFs then I'd suggest those for what you're doing but, until then: if performance is what you're looking for then inline is the only way to go and, for that, iTVFs are the only game in town.

Note that I continuously emphasized T-SQL when talking about functions... CLR Scalar and Table valued functions can be just fine but that's a different topic.

这篇关于在许多条件下,基于集合的计划运行速度比标量值函数慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆