如何在 SSIS 包中优化 Upsert(更新和插入)操作? [英] How do I optimize Upsert (Update and Insert) operation within SSIS package?

查看:157
本文介绍了如何在 SSIS 包中优化 Upsert(更新和插入)操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是 DBA,但我确实在一家小公司工作,担任 IT 人员.我必须将数据库从登台复制到生产.我创建了一个 SSIS 包来执行此操作,但运行需要数小时.这也不是大型数据仓库类型的项目,它是一个非常简单的Upsert.我假设我是我设计它的薄弱环节.

I am not a DBA but I do work for a small company as the IT person. I have to replicate a database from staging to production. I have created an SSIS package to do this but it takes hours to run. This isn't a large data warehouse type of project, either, it's a pretty straightforward Upsert. I'm assuming that I am the weak link in how I designed it.

这是我的程序:

  1. 截断临时表(EXECUTE SQL TASK)
  2. 将开发表中的数据拉入暂存(Data Flow Task)
  3. 运行数据流任务
  1. Truncate staging tables (EXECUTE SQL TASK)
  2. Pull data from a development table into staging (Data Flow Task)
  3. Run a data flow task
  1. OLE DB 源
  2. 条件分割变换(使用的条件:[!]ISNULL(is_new_flag))
  3. 如果是新插入,如果现有更新

多次模拟数据流任务以更改表/值,但流程相同.我已经阅读了几篇关于 OLE DB 组件更新缓慢的文章,并尝试了一些方法,但没有让它快速运行.

The data flow task is mimicked a few times to change tables/values but the flow is the same. I've read several things about OLE DB components being slow to updates being slow and have tried a few things but haven't gotten it to run very quickly.

我不确定要提供哪些其他详细信息,但我可以提供任何要求的信息.

I'm not sure what other details to give, but I can give anything that's asked for.

推荐答案

使用 SSIS 2008 R2 使用批量操作插入或更新的示例包:

这是一个用 SSIS 2008 R2 编写的示例包,它说明了如何使用批处理操作在两个数据库之间执行插入、更新.

Sample package using SSIS 2008 R2 that inserts or updates using batch operation:

Here is a sample package written in SSIS 2008 R2 that illustrates how to perform insert, update between two databases using batch operations.

  • 使用OLE DB 命令 会减慢对包的更新操作,因为它执行批处理操作.每一行都单独更新.
  • Using OLE DB Command will slow down the update operations on your package because it does not perform batch operations. Every row is updated individually.

示例使用两个数据库,分别是SourceDestination.在我的示例中,两个数据库都驻留在服务器上,但逻辑仍然可以应用于驻留在不同服务器和位置的数据库.

The sample uses two databases namely Source and Destination. In my example, both the databases reside on the server but the logic can still be applied for databases residing on different servers and locations.

我在我的源数据库 Source 中创建了一个名为 dbo.SourceTable 的表.

I created a table named dbo.SourceTable in my source database Source.

CREATE TABLE [dbo].[SourceTable](
    [RowNumber] [bigint] NOT NULL,
    [CreatedOn] [datetime] NOT NULL,
    [ModifiedOn] [datetime] NOT NULL,
    [IsActive] [bit] NULL
)

此外,在我的目标数据库中创建了两个名为 dbo.DestinationTabledbo.StagingTable 的表 目的地.

Also, created two tables named dbo.DestinationTable and dbo.StagingTable in my destination database Destination.

CREATE TABLE [dbo].[DestinationTable](
    [RowNumber] [bigint] NOT NULL,
    [CreatedOn] [datetime] NOT NULL,
    [ModifiedOn] [datetime] NOT NULL
) 
GO

CREATE TABLE [dbo].[StagingTable](
    [RowNumber] [bigint] NOT NULL,
    [CreatedOn] [datetime] NOT NULL,
    [ModifiedOn] [datetime] NOT NULL
) 
GO

在表 dbo.SourceTable 中插入了大约 140 万行,并在 RowNumber 列中插入了唯一值.表 dbo.DestinationTabledbo.StagingTable 一开始是空的.表中的所有行 dbo.SourceTable 都将标志 IsActive 设置为 false.

Inserted about 1.4 million rows in the table dbo.SourceTable with unique values into RowNumber column. The tables dbo.DestinationTable and dbo.StagingTable were empty to begin with. All the rows in the table dbo.SourceTable have the flag IsActive set to false.

使用两个 OLE DB 连接管理器创建了一个 SSIS 包,每个连接管理器都连接到 SourceDestination 数据库.设计控制流如下图所示:

Created an SSIS package with two OLE DB connection managers, each connecting to Source and Destination databases. Designed the Control Flow as shown below:

  • 第一个 Execute SQL Task 对目标数据库执行语句 TRUNCATE TABLE dbo.StagingTable 以截断临时表.

  • First Execute SQL Task executes the statement TRUNCATE TABLE dbo.StagingTable against the destination database to truncate the staging tables.

下一部分说明如何配置 Data Flow Task.

Next section explains how the Data Flow Task is configured.

第二个 Execute SQL Task 执行以下给定的 SQL 语句,该语句使用 dbo.DestinationTable 中的可用数据更新数据em>dbo.StagingTable,假设在这两个表之间存在匹配的唯一键.在这种情况下,唯一键是列 RowNumber.

Second Execute SQL Task executes the below given SQL statement that updates data in dbo.DestinationTable using the data available in dbo.StagingTable, assuming that there is a unique key that matches between those two tables. In this case, the unique key is the column RowNumber.

UPDATE      D 
SET         D.CreatedOn = S.CreatedOn
        ,   D.ModifiedOn = S.ModifiedOn 
FROM        dbo.DestinationTable D 
INNER JOIN  dbo.StagingTable S 
ON          D.RowNumber = S.RowNumber

我设计了如下所示的数据流任务.

I have designed the Data Flow Task as shown below.

  • OLE DB Source 使用 SQL 命令从 dbo.SourceTable 读取数据SELECT RowNumber,CreatedOn, ModifiedOn FROM Source.dbo.SourceTable WHERE IsActive = 1

Lookup transformation 用于检查表中是否已经存在 RowNumber 值dbo.DestinationTable

如果记录存在,它将被重定向到名为OLE DB Destination>插入目标表,将行插入到dbo.DestinationTable

If the record does not exist, it will be redirected to the OLE DB Destination named as Insert into destination table, which inserts the row into dbo.DestinationTable

如果记录存在,它将被重定向到名为Insert的OLE DB Destination进入临时表,将行插入dbo.StagingTable.临时表中的这些数据将用于第二个`执行 SQL 任务以执行批量更新.

If the record exists, it will be redirected to the OLE DB Destination named as Insert into staging table, which inserts the row into dbo.StagingTable. This data in staging table will be used in the second `Execute SQL Task to perform batch update.

为了激活 OLE DB 源的更多行,我运行了以下查询以激活一些记录

To activate few more rows for OLE DB Source, I ran the below query to activate some records

UPDATE  dbo.SourceTable 
SET     IsActive = 1 
WHERE   (RowNumber % 9 = 1) 
OR      (RowNumber % 9 = 2)

包的第一次执行如下所示.所有行都被定向到目标表,因为它是空的.在我的机器上执行包大约需要 3 秒.

First execution of the package looked as shown below. All the rows were directed to destination table because it was empty. The execution of the package on my machine took about 3 seconds.

再次运行行计数查询以查找所有三个表中的行计数.

Ran the row count query again to find the row counts in all three table.

为了激活 OLE DB 源的更多行,我运行了以下查询以激活一些记录

To activate few more rows for OLE DB Source, I ran the below query to activate some records

UPDATE  dbo.SourceTable 
SET     IsActive = 1 
WHERE   (RowNumber % 9 = 3) 
OR      (RowNumber % 9 = 5) 
OR      (RowNumber % 9 = 6) 
OR      (RowNumber % 9 = 7)

包的第二次执行如下所示.之前在第一次执行期间插入的 314,268 行 被重定向到临时表.628,766 新行 直接插入到目标表中.在我的机器上执行包大约需要 12 秒.314,268 行 目标表中的 314,268 行 在第二个执行 SQL 任务中使用临时表的数据更新.

Second execution of the package looked as shown below. 314,268 rows that were previously inserted during first execution were redirected to staging table. 628,766 new rows were directly inserted into the destination table. The execution of the package on my machine took about 12 seconds. 314,268 rows in destination table were updated in the second Execute SQL Task with the data using staging table.

再次运行行计数查询以查找所有三个表中的行计数.

Ran the row count query again to find the row counts in all three table.

我希望这能给您一个实施解决方案的想法.

I hope that gives you an idea to implement your solution.

这篇关于如何在 SSIS 包中优化 Upsert(更新和插入)操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆