如何在 SSIS 包中优化 Upsert(更新和插入)操作? [英] How do I optimize Upsert (Update and Insert) operation within SSIS package?
问题描述
我不是 DBA,但我确实在一家小公司工作,担任 IT 人员.我必须将数据库从登台复制到生产.我创建了一个 SSIS 包来执行此操作,但运行需要数小时.这也不是大型数据仓库类型的项目,它是一个非常简单的Upsert
.我假设我是我设计它的薄弱环节.
I am not a DBA but I do work for a small company as the IT person. I have to replicate a database from staging to production. I have created an SSIS package to do this but it takes hours to run. This isn't a large data warehouse type of project, either, it's a pretty straightforward Upsert
. I'm assuming that I am the weak link in how I designed it.
这是我的程序:
- 截断临时表(
EXECUTE SQL TASK
) - 将开发表中的数据拉入暂存(
Data Flow Task
) - 运行数据流任务
- Truncate staging tables (
EXECUTE SQL TASK
) - Pull data from a development table into staging (
Data Flow Task
) - Run a data flow task
OLE DB 源
条件分割变换
(使用的条件:[!]ISNULL(is_new_flag)
)- 如果是新插入,如果现有更新
多次模拟数据流任务以更改表/值,但流程相同.我已经阅读了几篇关于 OLE DB 组件更新缓慢的文章,并尝试了一些方法,但没有让它快速运行.
The data flow task is mimicked a few times to change tables/values but the flow is the same. I've read several things about OLE DB components being slow to updates being slow and have tried a few things but haven't gotten it to run very quickly.
我不确定要提供哪些其他详细信息,但我可以提供任何要求的信息.
I'm not sure what other details to give, but I can give anything that's asked for.
推荐答案
使用 SSIS 2008 R2 使用批量操作插入或更新的示例包:
这是一个用 SSIS 2008 R2
编写的示例包,它说明了如何使用批处理操作在两个数据库之间执行插入、更新.
Sample package using SSIS 2008 R2 that inserts or updates using batch operation:
Here is a sample package written in SSIS 2008 R2
that illustrates how to perform insert, update between two databases using batch operations.
- 使用
OLE DB 命令
会减慢对包的更新操作,因为它不执行批处理操作.每一行都单独更新.
- Using
OLE DB Command
will slow down the update operations on your package because it does not perform batch operations. Every row is updated individually.
示例使用两个数据库,分别是Source
和Destination
.在我的示例中,两个数据库都驻留在服务器上,但逻辑仍然可以应用于驻留在不同服务器和位置的数据库.
The sample uses two databases namely Source
and Destination
. In my example, both the databases reside on the server but the logic can still be applied for databases residing on different servers and locations.
我在我的源数据库 Source
中创建了一个名为 dbo.SourceTable
的表.
I created a table named dbo.SourceTable
in my source database Source
.
CREATE TABLE [dbo].[SourceTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL,
[IsActive] [bit] NULL
)
此外,在我的目标数据库中创建了两个名为 dbo.DestinationTable
和 dbo.StagingTable
的表 目的地
.
Also, created two tables named dbo.DestinationTable
and dbo.StagingTable
in my destination database Destination
.
CREATE TABLE [dbo].[DestinationTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL
)
GO
CREATE TABLE [dbo].[StagingTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL
)
GO
在表 dbo.SourceTable
中插入了大约 140 万行,并在 RowNumber
列中插入了唯一值.表 dbo.DestinationTable
和 dbo.StagingTable
一开始是空的.表中的所有行 dbo.SourceTable
都将标志 IsActive
设置为 false.
Inserted about 1.4 million rows in the table dbo.SourceTable
with unique values into RowNumber
column. The tables dbo.DestinationTable
and dbo.StagingTable
were empty to begin with. All the rows in the table dbo.SourceTable
have the flag IsActive
set to false.
使用两个 OLE DB 连接管理器创建了一个 SSIS 包,每个连接管理器都连接到 Source
和 Destination
数据库.设计控制流如下图所示:
Created an SSIS package with two OLE DB connection managers, each connecting to Source
and Destination
databases. Designed the Control Flow as shown below:
第一个
Execute SQL Task
对目标数据库执行语句TRUNCATE TABLE dbo.StagingTable
以截断临时表.
First
Execute SQL Task
executes the statementTRUNCATE TABLE dbo.StagingTable
against the destination database to truncate the staging tables.
下一部分说明如何配置 Data Flow Task
.
Next section explains how the Data Flow Task
is configured.
第二个 Execute SQL Task
执行以下给定的 SQL 语句,该语句使用 dbo.DestinationTable
中的可用数据更新数据em>dbo.StagingTable
,假设在这两个表之间存在匹配的唯一键.在这种情况下,唯一键是列 RowNumber
.
Second Execute SQL Task
executes the below given SQL statement that updates data in dbo.DestinationTable
using the data available in dbo.StagingTable
, assuming that there is a unique key that matches between those two tables. In this case, the unique key is the column RowNumber
.
UPDATE D
SET D.CreatedOn = S.CreatedOn
, D.ModifiedOn = S.ModifiedOn
FROM dbo.DestinationTable D
INNER JOIN dbo.StagingTable S
ON D.RowNumber = S.RowNumber
我设计了如下所示的数据流任务.
I have designed the Data Flow Task as shown below.
OLE DB Source
使用 SQL 命令从dbo.SourceTable
读取数据SELECT RowNumber,CreatedOn, ModifiedOn FROM Source.dbo.SourceTable WHERE IsActive = 1
Lookup transformation
用于检查表中是否已经存在 RowNumber 值dbo.DestinationTable
如果记录不存在,它将被重定向到名为
,将行插入到OLE DB Destination
>插入目标表dbo.DestinationTable
If the record does not exist, it will be redirected to the OLE DB Destination
named as Insert into destination table
, which inserts the row into dbo.DestinationTable
如果记录存在,它将被重定向到名为Insert的
,将行插入OLE DB Destination
进入临时表dbo.StagingTable
.临时表中的这些数据将用于第二个`执行 SQL 任务以执行批量更新.
If the record exists, it will be redirected to the OLE DB Destination
named as Insert into staging table
, which inserts the row into dbo.StagingTable
. This data in staging table will be used in the second `Execute SQL Task to perform batch update.
为了激活 OLE DB 源的更多行,我运行了以下查询以激活一些记录
To activate few more rows for OLE DB Source, I ran the below query to activate some records
UPDATE dbo.SourceTable
SET IsActive = 1
WHERE (RowNumber % 9 = 1)
OR (RowNumber % 9 = 2)
包的第一次执行如下所示.所有行都被定向到目标表,因为它是空的.在我的机器上执行包大约需要 3 秒
.
First execution of the package looked as shown below. All the rows were directed to destination table because it was empty. The execution of the package on my machine took about 3 seconds
.
再次运行行计数查询以查找所有三个表中的行计数.
Ran the row count query again to find the row counts in all three table.
为了激活 OLE DB 源的更多行,我运行了以下查询以激活一些记录
To activate few more rows for OLE DB Source, I ran the below query to activate some records
UPDATE dbo.SourceTable
SET IsActive = 1
WHERE (RowNumber % 9 = 3)
OR (RowNumber % 9 = 5)
OR (RowNumber % 9 = 6)
OR (RowNumber % 9 = 7)
包的第二次执行如下所示.之前在第一次执行期间插入的 314,268 行
被重定向到临时表.628,766 新行
直接插入到目标表中.在我的机器上执行包大约需要 12 秒
.314,268 行
目标表中的 314,268 行 在第二个执行 SQL 任务中使用临时表的数据更新.
Second execution of the package looked as shown below. 314,268 rows
that were previously inserted during first execution were redirected to staging table. 628,766 new rows
were directly inserted into the destination table. The execution of the package on my machine took about 12 seconds
. 314,268 rows
in destination table were updated in the second Execute SQL Task with the data using staging table.
再次运行行计数查询以查找所有三个表中的行计数.
Ran the row count query again to find the row counts in all three table.
我希望这能给您一个实施解决方案的想法.
I hope that gives you an idea to implement your solution.
这篇关于如何在 SSIS 包中优化 Upsert(更新和插入)操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!