CTE对T-SQL的不良表现 [英] T-SQL bad performance with CTE

查看:70
本文介绍了CTE对T-SQL的不良表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对SQL Server中的通用表表达式有一个性能问题。在我们的开发人员团队中,我们在构建查询时会使用许多链接CTE。
我目前正在处理性能糟糕的查询。但是我发现,如果我在链的中间,将直到该CTE的所有记录插入一个临时表中,然后继续执行,但是从该临时表中进行选择,则可以显着提高性能。
现在,我想获得一些帮助,以了解这种类型的更改是否仅适用于此特定查询,以及为什么您将在下面看到的两种情况的性能差异如此之大。还是我们可能会过度使用团队中的CTE,并且可以通过学习这种情况来总体上提高绩效?

I have a performance question about Common table expressions in SQL Server. In our developer team we use a lot of chaining CTEs when building our queries. I am currently working on a query which had terrible performance. But I found out that if I in the middle of the chain inserted all the records up to that CTE in a temporary table instead and then continued but selecting from that temp table I improved performance significantly. Now I would like to get some help to understand if this type of change only applies to this specific query and why the two cases you will see below differ so much in performance. Or could we possibly overuse CTEs in our team and can we gain performance generally by learning from this case?

请尝试向我解释一下这里到底发生了什么...

Please try to explain to me exactly what is happening here...

代码已完成,您将能够在SQL Server 2008以及2005上运行它。一部分被注释掉了,我的想法是,您可以通过注释掉另一种情况来切换两种情况。您可以看到要在其中添加块注释的位置,我已经在这些位置用-在此处注释块-在这里结束块注释

The code is complete and you will be able to run it on SQL Server 2008 and probably 2005 too. One part is commented out and my idea is that you can switch the two cases by comment out one or the other. You can see where to put your block comments, I have marked these places with --block comment here and --end block comment here

这是运行缓慢的情况,没有注释默认值。您在这里:

It is the slow performing case that is uncommented default. Here you are:

--Declare tables to use in example.
CREATE TABLE #Preparation 
(
    Date DATETIME NOT NULL
    ,Hour INT NOT NULL
    ,Sales NUMERIC(9,2)
    ,Items INT
);

CREATE TABLE #Calendar
(
    Date DATETIME NOT NULL
)

CREATE TABLE #OpenHours
(
    Day INT NOT NULL,
    OpenFrom TIME NOT NULL,
    OpenTo TIME NOT NULL
);

--Fill tables with sample data.
INSERT INTO #OpenHours (Day, OpenFrom, OpenTo)
VALUES
    (1, '10:00', '20:00'),
    (2, '10:00', '20:00'),
    (3, '10:00', '20:00'),
    (4, '10:00', '20:00'),
    (5, '10:00', '20:00'),
    (6, '10:00', '20:00'),
    (7, '10:00', '20:00')

DECLARE @CounterDay INT = 0, @CounterHour INT = 0, @Sales NUMERIC(9, 2), @Items INT;

WHILE @CounterDay < 365
BEGIN
    SET @CounterHour = 0;
    WHILE @CounterHour < 5
    BEGIN
        SET @Items = CAST(RAND() * 100 AS INT);
        SET @Sales = CAST(RAND() * 1000 AS NUMERIC(9, 2));
        IF @Items % 2 = 0
        BEGIN
            SET @Items = NULL;
            SET @Sales = NULL;
        END

        INSERT INTO #Preparation (Date, Hour, Items, Sales)
        VALUES (DATEADD(DAY, @CounterDay, '2011-01-01'), @CounterHour + 13, @Items, @Sales);

        SET @CounterHour += 1;
    END
    INSERT INTO #Calendar (Date) VALUES (DATEADD(DAY, @CounterDay, '2011-01-01'));
    SET @CounterDay += 1;
END

--Here the query starts.
;WITH P AS (
    SELECT DATEADD(HOUR, Hour, Date) AS Hour
        ,Sales
        ,Items
    FROM #Preparation
),
O AS (
        SELECT DISTINCT DATEADD(HOUR, SV.number, C.Date) AS Hour
        FROM #OpenHours AS O
            JOIN #Calendar AS C ON O.Day = DATEPART(WEEKDAY, C.Date)
            JOIN master.dbo.spt_values AS SV ON SV.number BETWEEN DATEPART(HOUR, O.OpenFrom) AND DATEPART(HOUR, O.OpenTo)
),
S AS (
    SELECT O.Hour, P.Sales, P.Items
    FROM O
        LEFT JOIN P ON P.Hour = O.Hour
)

--block comment here case 1 (slow performing)
--With this technique it takes about 34 seconds.
,N AS (
        SELECT  
            A.Hour
            ,A.Sales AS SalesOrg
            ,CASE WHEN COALESCE(B.Sales, C.Sales, 1) < 0
                THEN 0 ELSE COALESCE(B.Sales, C.Sales, 1) END AS Sales
            ,A.Items AS ItemsOrg
            ,COALESCE(B.Items, C.Items, 1) AS Items
        FROM S AS A
        OUTER APPLY (SELECT TOP 1 *
                     FROM S
                     WHERE Hour <= A.Hour
                        AND Sales IS NOT NULL
                        AND DATEDIFF(DAY, Hour, A.Hour) = 0                      
                     ORDER BY Hour DESC) B
        OUTER APPLY (SELECT TOP 1 *
                     FROM S
                     WHERE Sales IS NOT NULL
                        AND DATEDIFF(DAY, Hour, A.Hour) = 0
                     ORDER BY Hour) C
    )
--end block comment here case 1 (slow performing)

/*--block comment here case 2 (fast performing)
--With this technique it takes about 2 seconds.
SELECT * INTO #tmpS FROM S;

WITH
N AS (
        SELECT  
            A.Hour
            ,A.Sales AS SalesOrg
            ,CASE WHEN COALESCE(B.Sales, C.Sales, 1) < 0
                THEN 0 ELSE COALESCE(B.Sales, C.Sales, 1) END AS Sales
            ,A.Items AS ItemsOrg
            ,COALESCE(B.Items, C.Items, 1) AS Items
        FROM #tmpS AS A
        OUTER APPLY (SELECT TOP 1 *
                     FROM #tmpS
                     WHERE Hour <= A.Hour
                        AND Sales IS NOT NULL
                        AND DATEDIFF(DAY, Hour, A.Hour) = 0                      
                     ORDER BY Hour DESC) B
        OUTER APPLY (SELECT TOP 1 *
                     FROM #tmpS
                     WHERE Sales IS NOT NULL
                        AND DATEDIFF(DAY, Hour, A.Hour) = 0
                     ORDER BY Hour) C
    )
--end block comment here case 2 (fast performing)*/
SELECT * FROM N ORDER BY Hour


IF OBJECT_ID('tempdb..#tmpS') IS NOT NULL DROP TABLE #tmpS;

DROP TABLE #Preparation;
DROP TABLE #Calendar;
DROP TABLE #OpenHours;

如果您想尝试并了解我在最后一步所做的事情,我有一个SO问题关于它此处

If you would like to try and understand what I am doing in the last step I have a SO question about it here.

对我来说,案例1大约需要34秒,案例2大约需要2秒。区别在于,在情况2中,我将S的结果存储在临时表中,在情况1中,我直接在下一个CTE中使用了S。

For me case 1 takes about 34 seconds and case 2 takes about 2 seconds. The difference is that I store the result from S in a temp table in case 2, in case 1 I use S in my next CTE directly.

推荐答案

CTE 本质上只是一个一次性视图。除了将 CTE 代码作为表表达式放入 FROM 子句外,它几乎永远不会使查询更快。

A CTE is essentially just a disposable view. It will pretty much never make a query any faster than just putting the CTE code into a FROM clause as a table expression.

在您的示例中,真正的问题是我相信的日期函数。

In your example, the real issue is the date functions I believe.

您的第一个(慢) case需要为每一行运行日期函数。

Your first (slow) case requires the date functions to be run for every row.

对于您的第二个(更快)情况,它们只运行一次并存储到表中。

For your second (faster) case they are run once and stored into a table.

这不是通常如此引人注目,除非您对函数派生的字段执行某种逻辑。在您的情况下,您要对小时进行 ORDER BY ,这非常昂贵。在第二个示例中,它是对字段的简单排序,但是在第一个示例中,您正在为每一行运行该函数,然后进行排序。

This is not normally so noticeable unless you do some sort of logic on the function-derived field. In your case you are doing an ORDER BY on Hour, which is very costly. In your second example it's a simple sort on a field, but in the first you are running that function for each row, THEN sorting.

更多有关CTE的深入阅读,请参见有关DBA.SE的问题

这篇关于CTE对T-SQL的不良表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆