如何清理 SSISDB? [英] How can I clean up the SSISDB?

查看:24
本文介绍了如何清理 SSISDB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我进行设置时,我忽略了保留期.我的数据库变得非常大,所以我想减小它的大小.如果我只是更改保留期(它是 365),则会导致 SSIS 运行我的包时出现问题.我什至以很小的增量更改它,但删除语句会创建锁,从而阻止新作业运行.

任何想法如何解决这个问题?我想过创建一个新的 SSISDB.

解决方案

Phil Brammer 遇到了这个问题以及许多其他与 SSIS 目录的维护和供给相关的事情,他在他的帖子 目录索引建议.

根本问题

根本问题是 MS 试图在设计 SSIS 时考虑到 RI,但他们很懒惰并且允许发生级联删除而不是显式处理它们.

<块引用>

开箱即用,新的 SSIS 2012 目录数据库 (SSISDB) 应用了一些基本索引,参照完整性设置为在大多数表之间执行级联删除.

<块引用>

输入 SQL 代理作业SSIS 服务器维护作业".默认情况下,此作业设置为每天午夜运行,并使用两个目录参数来运行:定期清理日志"和保留期(天)".设置这些后,维护作业会清除指定保留期之外的所有数据.

<块引用>

此维护作业从 internal.operations 中一次循环删除 10 条记录,然后级联到下游的许多表中.在我们的例子中,我们每天有大约 3000 条操作记录要删除(一次 10 条!),从 internal.operation_messages 转换为 160 万行.那只是一张下游表!这整个过程完全,完全锁定了任何 SELECT/INSERT 数据的 SSISDB 数据库

分辨率

在 MS 改变工作方式之前,支持的选项是

<块引用>

将维护工作计划移至更适合您环境的时间

我知道我现在的客户只在凌晨加载数据,所以 SSISDB 在工作时间很安静.

如果在安静时期运行维护作业不是一种选择,那么您正在考虑编写自己的删除语句,以尝试使级联删除减少.

在我当前的客户中,在过去 10 个月中,我们每晚运行了大约 200 个软件包,并且也有 365 天的历史记录.我们最大的表,按数量级是.

Schema Table RowCount内部 event_message_context 1,869,028内部操作_消息 1,500,811内部事件消息 1,500,803

所有这些数据的驱动因素 internal.operations 只有 3300 行,这与 Phil 关于这些数据呈指数增长的评论一致.

因此,确定要清除的 operation_id 并从叶表中删除返回到核心的 internal.operations 表.

使用 SSISDB;设置无计数;如果 object_id('tempdb..#DELETE_CANDIDATES') 不是 NULL开始删除表#DELETE_CANDIDATES;结尾;创建表#DELETE_CANDIDATES(operation_id bigint NOT NULL PRIMARY KEY);声明@DaysRetention int = 100;插入#DELETE_CANDIDATES(操作编号)选择IO.operation_id从内部操作 AS IO在哪里IO.start_time 

适用通常的警告

  • 不要相信互联网上的随机代码
  • 使用 ssistalk 和/或系统表中的图表来识别所有依赖项
  • 您可能只需要将删除操作细分为更小的操作
  • 删除操作的 RI 可能会让您受益,但一定要使用检查选项重新启用它们,以便它们受到信任.
  • 如果操作持续时间超过 4 小时,请咨询您的 dba

2020 年 7 月编辑

Tim Mitchell 在 SSIS 目录自动清理清理 SSIS 目录数据库的更好方法 和他的新书 SSIS 目录:安装、管理、保护和监控您的企业 ETL 基础设施

@Yong Jun Kim 在评论中注明

<块引用>

现在 SSIS DB 可能有不同的表名,并在末尾扩展.而不是 internal.event_message_context 它可以是 internal.event_message_context_scaleout.代替 internal.operations_messages,它可以是 internal.operations_messages_scaleout.只需相应地修改代码中的表名,它应该可以正常运行

如果您在 Azure 数据工厂中使用 SSIS IR,则肯定是这种情况.你会发现正常"表仍然存在但为空,*_scaleout 版本包含所有数据.

参考文献

When I set this up I overlooked the retention period. My database has become pretty large so I want to decrease it's size. If I simply change the retention period (it was 365) it causes issues with SSIS running my packages. I even changed it in small increments but the deletion statement would create locks which would prevent new jobs from running.

Any ideas how to work around this? I've thought about just creating a new SSISDB.

解决方案

Phil Brammer ran into this and a host of other things related to the care and feeding of the SSIS catalog, which he covers on his post Catalog Indexing Recommendations.

Root problem

The root problem is that MS attempted to design the SSIS with RI in mind but they were lazy and allowed the cascading deletes to happen versus explicitly handling them.

Out of the box, the new SSIS 2012 catalog database (SSISDB) has some basic indexing applied, with referential integrity set to do cascade deletes between most tables.

Enter the SQL Agent job, "SSIS Server Maintenance Job." This job by default is set to run at midnight daily, and uses two catalog parameters to function: "Clean Logs Periodically" and "Retention Period (days)." When these are set, the maintenance job purges any data outside of the noted retention period.

This maintenance job deletes, 10 records at a time in a loop, from internal.operations and then cascades into many tables downstream. In our case, we have around 3000 operations records to delete daily (10 at a time!) that translates into 1.6 million rows from internal.operation_messages. That’s just one downstream table! This entire process completely, utterly locks up the SSISDB database from any SELECT/INSERT data

Resolution

Until MS changes up how things work, the supported option is

move the maintenance job schedule to a more appropriate time for your environment

I know at my current client, we only load data in the wee hours so the SSISDB is quiet during business hours.

If running the maintenance job during a quiet period isn't an option, then you're looking at crafting your own delete statements to try to get the cascading deletes to suck less.

At my current client, we've been running a about 200 packages nightly for the past 10 months and are also at 365 days of history. Our biggest tables, by an order of magnitude are.

Schema    Table                   RowCount
internal  event_message_context   1,869,028
internal  operation_messages      1,500,811
internal  event_messages          1,500,803

The driver of all of that data, internal.operations only has 3300 rows in it, which aligns with Phil's comment about how exponentially this data grows.

So, identify the operation_id to be purged and the delete from the leaf tables working back to the core, internal.operations table.

USE SSISDB;
SET NOCOUNT ON;
IF object_id('tempdb..#DELETE_CANDIDATES') IS NOT NULL
BEGIN
    DROP TABLE #DELETE_CANDIDATES;
END;

CREATE TABLE #DELETE_CANDIDATES
(
    operation_id bigint NOT NULL PRIMARY KEY
);

DECLARE @DaysRetention int = 100;
INSERT INTO
    #DELETE_CANDIDATES
(
    operation_id
)
SELECT
    IO.operation_id
FROM
    internal.operations AS IO
WHERE
    IO.start_time < DATEADD(day, -@DaysRetention, CURRENT_TIMESTAMP);

DELETE T
FROM
    internal.event_message_context AS T
    INNER JOIN
        #DELETE_CANDIDATES AS DC
        ON DC.operation_id = T.operation_id;

DELETE T
FROM
    internal.event_messages AS T
    INNER JOIN
        #DELETE_CANDIDATES AS DC
        ON DC.operation_id = T.operation_id;

DELETE T
FROM
    internal.operation_messages AS T
    INNER JOIN
        #DELETE_CANDIDATES AS DC
        ON DC.operation_id = T.operation_id;

-- etc
-- Finally, remove the entry from operations

DELETE T
FROM
    internal.operations AS T
    INNER JOIN
        #DELETE_CANDIDATES AS DC
        ON DC.operation_id = T.operation_id;

Usual caveats apply

  • don't trust code from randoms on the internet
  • use the diagrams from ssistalk and/or system tables to identify all the dependencies
  • you might need to only segment your delete operations into smaller operations
  • you might benefit by dropping RI for operations but be certain to re-enable them with the check option so they are trusted.
  • consult your dba if operations last longer than 4 hours

July 2020 edit

Tim Mitchell has a good set of articles on SSIS Catalog Automatic Cleanup and A better way to Clean up the SSIS Catalog Database and his fancy new book The SSIS Catalog: Install, Manage, Secure and Monitor Your Enterprise ETL Infrastructure

@Yong Jun Kim noted in the comments

There is a chance SSIS DB might have different table names with scaleout at the end now. Instead of internal.event_message_context it can be internal.event_message_context_scaleout. Instead of internal.operations_messages, it can be internal.operations_messages_scaleout. Just modify the table names in the code accordingly, and it should run fine

This is certainly the case if you are using an SSIS IR within Azure Data Factory. You will find the "normal" tables still present but empty, with the *_scaleout versions containing all the data.

References

这篇关于如何清理 SSISDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆