SSIS 仅提取 Delta 更改 [英] SSIS only extract Delta changes

查看:29
本文介绍了SSIS 仅提取 Delta 更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过一些建议.我正在使用 SSIS\SQL Server 2014.我有一个夜间 SSIS 包,它将非 SQL Server 数据库中的数据提取到一个表中(SQL 表每次都事先被截断),然后我从该表中提取以创建一个每日 csv 文件.

After some advice. I'm using SSIS\SQL Server 2014. I have a nightly SSIS package that pulls in data from non-SQL Server db's into a single table (the SQL table is truncated beforehand each time) and I then extract from this table to create a daily csv file.

展望未来,我只想每天将已更改的记录(即 Delta)提取到 csv.

Going forward, I only want to extract to csv on a daily basis the records that have changed i.e. the Deltas.

最好的方法是什么?我想在 SSIS 中使用 CDC,但是由于我每次在初始加载之前都截断 SQL 表,这是最好的方法吗?或者我是否需要在初始加载时使用 SQL 中的主表,然后导入到另一个表中并只提取不同的地方?有关信息,SQL 中的表包含一个主键.

What is the best approach? I was thinking of using CDC in SSIS, but as I'm truncating the SQL table before the initial load each time, will this be best method? Or will I need to have a master table in SQL with an initial load, then import into another table and just extract where there are different? For info, the table in SQL contains a Primary Key.

我只想仔细检查一下,因为 CDC 假设这些表都在 SQL Server 中,而我的数据首先来自 SQL Server 外部.

I just want to double check as CDC assumes the tables are all in SQL Server, whereas my data is coming from outside SQL Server first.

感谢您的帮助.

推荐答案

该表上的主键是您的救星.很明显,您将不同数据拉入的 SQL Server 数据库不会知道从一个表刷新到下一个哪些记录已更改,但是如果您添加两个附加表,并使用附加列修改现有表,它应该能够通过利用 哈希字节.

The primary key on that table is your saving grace here. Obviously enough, the SQL Server database that you're pulling the disparate data into won't know from one table flush to the next which records have changed, but if you add two additional tables, and modify the existing table with an additional column, it should be able to figure it out by leveraging HASHBYTES.

在本例中,我将新表命名为 SentRows,但您可以在实践中使用更有意义的名称.我们将调用旧表中的新列 HashValue.

For this example, I'll call the new table SentRows, but you can use a more meaningful name in practice. We'll call the new column in the old table HashValue.

HashValue 列作为 varbinary 数据类型添加到您的表中.NOT NULL 也一样.

Add the column HashValue to your table as a varbinary data type. NOT NULL as well.

创建您的 SentRows 表,其中包含主表主键中所有列的列,以及 HashValue 列.

Create your SentRows table with columns for all the columns in the main table's primary key, plus the HashValue column.

创建一个与主表结构相同的 RowsToSend 表,包括 HashValue.

Create a RowsToSend table that's structurally identical to your main table, including the HashValue.

通过将 HASHBYTES 应用于表中的所有非键列,修改您的查询以创建 HashValue.(这会非常乏味.抱歉.)

Modify your queries to create the HashValue by applying HASHBYTES to all of the non-key columns in the table. (This will be horribly tedious. Sorry about that.)

发送您的完整数据集.

现在将所有键值和 HashValue 移动到 SentRows 表.截断主表.

Now move all of the key values and HashValues to the SentRows table. Truncate your main table.

在下一次拉取时,将 SentRows 中的键值和 HashValue 与主表中的新数据进行比较.

On the next pull, compare the key values and HashValues from SentRows to the new data in the main table.

主键匹配 + 哈希匹配 = 未更改行

Primary key match + hash match = Unchanged row

主键匹配 + 哈希不匹配 = 更新行

Primary key match + hash mismatch = Updated row

传入数据中的主键但现有数据集中缺失=新行

Primary key in incoming data but missing from existing data set = New row

主键不在传入数据中,而是在现有数据集中 = 已删除行

Primary key not in incoming data but in existing data set = Deleted row

提取您需要发送到 RowsToSend 表的任何更改.

Pull out any changes you need to send to the RowsToSend table.

RowsToSend 发送更改.

将键值和 HashValue 移动到您的 SentRows 表中.更新已更改键值的哈希值,插入新行,并决定如何处理删除(如果必须处理删除).

Move the key values and HashValues to your SentRows table. Update hashes for changed key values, insert new rows, and decide how you're going to handle deletes, if you have to deal with deletes.

截断 SentRows 表,为明天做好准备.

Truncate the SentRows table to get ready for tomorrow.

如果您愿意(如果您愿意,稍后您会感谢自己)将计算列添加到 SentRows 表中,默认为 GETDATE(),即会告诉您何时添加该行.

If you'd like (and you'll thank yourself later if you do) add a computed column to the SentRows table with default of GETDATE(), which will tell you when the row was added.

你走吧.从现在开始,只有增量.

And away you go. Nothing but deltas from now on.

编辑 2019-10-31:

循序渐进(或 TL;DR):

Step by step (or TL;DR):

1) 刷新并填充MainTable.

2) 将 MainTable 上的键和哈希与 SentRows 上的键和哈希进行比较,以识别新的/更改的行.

2) Compare keys and hashes on MainTable to keys and hashes on SentRows to identify new/changed rows.

3) 将新的/更改的行移动到 RowsToSend.

3) Move new/changed rows to RowsToSend.

4) 发送 RowsToSend 中的行.

4) Send the rows that are in RowsToSend.

5) 将所有行从 RowsToSend 移动到 SentRows.

5) Move all the rows from RowsToSend to SentRows.

6) 截断 RowsToSend.

这篇关于SSIS 仅提取 Delta 更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆