典型发布环境中的T-SQL MERGE性能 [英] T-SQL MERGE Performance in typical publishing context

查看:73
本文介绍了典型发布环境中的T-SQL MERGE性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到的情况是,发布者"应用程序本质上是通过查询非常复杂的视图,然后使用单独的插入,更新和删除操作将结果合并到非规范化的视图模型表中来保持视图模型的最新状态.

I have situation where a "publisher" application essentially keeps a view model up to date by querying a VERY complex view and then merging the results into a denormalized view model table, using separate insert, update, and delete operations.

现在我们已经升级到SQL 2008,我认为现在是使用SQL MERGE语句更新它们的好时机.但是,编写查询后,MERGE语句的子树成本为1214.54!使用旧方法,插入/更新/删除"的总和仅为0.104!

Now that we have upgraded to SQL 2008 I figured it would be a great time to update these with the SQL MERGE statement. However after writing the query, the subtree cost of the MERGE statement is 1214.54! With the old way, the sum of the Insert/Update/Delete was only 0.104!!

我无法弄清楚描述相同确切操作的更直接方法会变得如此糟糕.也许您会看到我无法做到的错误.

I can't figure out how a more straightforward way of describing the same exact operation could be so much crappier. Perhaps you can see the error of my ways where I cannot.

表中的某些统计信息:它有190万行,每个MERGE操作都会插入,更新或删除其中的100多个.在我的测试案例中,只有1个受影响.

Some stats on the table: It has 1.9 million rows, and each MERGE operation inserts, updates, or deletes more than 100 of them. In my test case, only 1 is affected.

-- This table variable has the EXACT same structure as the published table
-- Yes, I've tried a temp table instead of a table variable, and it makes no difference
declare @tSource table
(
    Key1 uniqueidentifier NOT NULL,
    Key2 int NOT NULL,
    Data1 datetime NOT NULL,
    Data2 datetime,
    Data3 varchar(255) NOT NULL, 
    PRIMARY KEY 
    (
        Key1, 
        Key2
    )
)

-- Fill the temp table with the desired current state of the view model, for
-- only those rows affected by @Key1.  I'm not really concerned about the
-- performance of this.  The result of this; it's already good.  This results
-- in very few rows in the table var, in fact, only 1 in my test case
insert into @tSource
select *
from vw_Source_View with (nolock)
where Key1 = @Key1

-- Now it's time to merge @tSource into TargetTable

;MERGE TargetTable as T
USING tSource S
    on S.Key1 = T.Key1 and S.Key2 = T.Key2

-- Only update if the Data columns do not match
WHEN MATCHED AND T.Data1 <> S.Data1 OR T.Data2 <> S.Data2 OR T.Data3 <> S.Data3 THEN
    UPDATE SET
        T.Data1 = S.Data1,
        T.Data2 = S.Data2,
        T.Data3 = S.Data3

-- Insert when missing in the target
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Key1, Key2, Data1, Data2, Data3)
    VALUES (Key1, Key2, Data1, Data2, Data3)

-- Delete when missing in the source, being careful not to delete the REST
-- of the table by applying the T.Key1 = @id condition
WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN
    DELETE
;

那么,这如何达到1200的子树成本?从表本身进行数据访问似乎非常有效.实际上,MERGE的成本的87%似乎来自于链末尾的Sort操作:

So how does this get to 1200 subtree cost? The data access from the tables themselves seems to be quite efficient. In fact, 87% of the cost of the MERGE seems to be from a Sort operation near the end of the chain:

MERGE(0%)<-索引更新(12%)<-排序(87%)<-(...)

MERGE (0%) <- Index Update (12%) <- Sort (87%) <- (...)

该排序有0行从中输入和输出.为什么要对0行进行排序需要87%的资源?

And that sort has 0 rows feeding into and out from it. Why does it take 87% of the resources to sort 0 rows?

更新

我在Gist中发布了仅用于MERGE操作的实际执行计划./p>

I posted the actual (not estimated) execution plan for just the MERGE operation in a Gist.

推荐答案

子树成本应以大量的盐来承担(尤其是当您遇到严重的基数错误时). SET STATISTICS IO ON; SET STATISTICS TIME ON;输出可以更好地指示实际性能.

Subtree costs should be taken with a large grain of salt (and especially so when you have huge cardinality errors). SET STATISTICS IO ON; SET STATISTICS TIME ON; output is a better indicator of actual performance.

零行排序不会占用87%的资源.您计划中的此问题是统计估计之一.实际计划中显示的成本仍然是估计成本.它不会调整它们以考虑实际发生的情况.

The zero row sort doesn't take 87% of the resources. This problem in your plan is one of statistics estimation. The costs shown in the actual plan are still estimated costs. It doesn't adjust them to take account of what actually happened.

计划中有一个要点,过滤器将1,911,721行减少为0,但以后的估计行为1,860,310.此后,所有成本都是伪造的,最终达到了3,348,560行估算的87%成本.

There is a point in the plan where a filter reduces 1,911,721 rows to 0 but the estimated rows going forward are 1,860,310. Thereafter all costs are bogus culminating in the 87% cost estimated 3,348,560 row sort.

基数估计错误可以在Merge语句之外重现,方法是查看具有相等谓词的Full Outer Join的估计计划(给出相同的1,860,310行估计).

The cardinality estimation error can be reproduced outside the Merge statement by looking at the estimated plan for the Full Outer Join with equivalent predicates (gives same 1,860,310 row estimate).

SELECT * 
FROM TargetTable T
FULL OUTER JOIN  @tSource S
    ON S.Key1 = T.Key1 and S.Key2 = T.Key2
WHERE 
CASE WHEN S.Key1 IS NOT NULL 
     /*Matched by Source*/
     THEN CASE WHEN T.Key1 IS NOT NULL  
               /*Matched by Target*/
               THEN CASE WHEN  [T].[Data1]<>S.[Data1] OR 
                               [T].[Data2]<>S.[Data2] OR 
                               [T].[Data3]<>S.[Data3]
                         THEN (1) 
                     END 
                /*Not Matched by Target*/     
                ELSE (4) 
           END 
       /*Not Matched by Source*/     
      ELSE CASE WHEN  [T].[Key1]=@id 
                THEN (3) 
            END 
END IS NOT NULL

但这表示,根据过滤器本身的计划确实看起来不太理想.当您可能想要一个具有2个聚集索引范围搜索的计划时,它将执行完整的聚集索引扫描.一种是从源联接上检索与主键匹配的单行,另一种是检索T.Key1 = @id范围(尽管这是为了避免以后需要按簇键顺序进行排序?)

That said however the plan up to the filter itself does look quite sub optimal. It is doing a full clustered index scan when perhaps you want a plan with 2 clustered index range seeks. One to retrieve the single row matched by the primary key from the join on source and the other to retrieve the T.Key1 = @id range (though maybe this is to avoid the need to sort into clustered key order later?)

也许您可以尝试执行此重写,看看它的工作原理是好是坏

Perhaps you could try this rewrite and see if it works any better or worse

;WITH FilteredTarget AS
(
SELECT T.*
FROM TargetTable  AS T WITH (FORCESEEK)
JOIN @tSource S
    ON (T.Key1 = S.Key1
    AND S.Key2 = T.Key2)
    OR T.Key1 = @id
)
MERGE FilteredTarget AS T
USING @tSource S
ON (T.Key1 = S.Key1
   AND S.Key2 = T.Key2)


-- Only update if the Data columns do not match
WHEN MATCHED AND S.Key1 = T.Key1 AND S.Key2 = T.Key2 AND 
                                         (T.Data1 <> S.Data1 OR
                                          T.Data2 <> S.Data2 OR 
                                          T.Data3 <> S.Data3) THEN
  UPDATE SET T.Data1 = S.Data1,
             T.Data2 = S.Data2,
             T.Data3 = S.Data3

-- Note from original poster: This extra "safety clause" turned out not to
-- affect the behavior or the execution plan, so I removed it and it works
-- just as well without, but if you find yourself in a similar situation
-- you might want to give it a try.
-- WHEN MATCHED AND (S.Key1 <> T.Key1 OR S.Key2 <> T.Key2) AND T.Key1 = @id THEN
--   DELETE

-- Insert when missing in the target
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Key1, Key2, Data1, Data2, Data3)
    VALUES (Key1, Key2, Data1, Data2, Data3)

WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN
    DELETE;

这篇关于典型发布环境中的T-SQL MERGE性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆