过程应该是可重复的,只插入新记录或/和更新现有记录 [英] process should be repeatable, only insert new records or/and update existing records

查看:34
本文介绍了过程应该是可重复的,只插入新记录或/和更新现有记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用这个脚本创建了一个表.

I created a table using this script.

use DWResourceTask
go
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[DimEntity]') AND type in (N'U'))
DROP TABLE [DimEntity]
Go
Create Table DimEntity

(EntityKey int NOT NULL identity PRIMARY KEY,
[EntCode] [nvarchar](8) NOT NULL,
    [Name] [nvarchar](80) NOT NULL,
    [CompanyRegistration] [nvarchar](80) NULL,
    [Active] [int] NOT NULL,
    [AccessLevel] [int] NOT NULL ,
    [SiteURN] [nvarchar](128) NOT NULL,
    [CompanyURN] [nvarchar](128) NOT NULL,
    [SiteName] [nvarchar](30) NOT NULL,
    [SiteDesc] [nvarchar](60) NULL,
    [SiteURL] [nvarchar](512) NOT NULL)

我用这个插入数据

Select e.EntCode, e.Name, e.Active, e.AccessLevel, ss.SiteURN, ss.CompanyURN,
ss.SiteName, ss.SiteDesc, ss.SiteURL
from USA.dbo.SCSite ss, USA.dbo.SCLegalEnt e
where ss.localsiteflag = 1
and e.active = 1
UNION ALL
Select e.EntCode, e.Name, e.Active, e.AccessLevel, ss.SiteURN, ss.CompanyURN,
ss.SiteName, ss.SiteDesc, ss.SiteURL
from UK.dbo.SCSite ss, UK.dbo.SCLegalEnt e
where ss.localsiteflag = 1
and e.active = 1

我想要做的是我计划创建一个 SSIS 包,无论我执行多少次,它都只会插入新记录或/和更新现有记录.过程应该是可重复的.

What I want to do is that I'm planning to create an SSIS package that no matter how many times I execute, it will only insert new records or/and update existing records. Process should be repeatable.

我应该在 OLE DB 源中的 SQL 命令中放入什么脚本来仅"插入新记录或/和更新现有记录.

What script should I put in the SQL command in OLE DB source to insert "only" new records or/and update existing records.

这是某种更新命令吗?但是如何?我不瘦我需要截断或删除命令来更新表中的数据.

Is it some sort of Update command? But how? I dont thin I need truncate or delete command to update the data in the table.

推荐答案

您的 OLE DB 源命令应该是一个查询,用于标识候选记录池.如果您可以消除一个范围,因为您知道它无效,请在此处应用过滤器.例如,您只加载当前年份数据,然后应用类似 SELECT T.* FROM Country.dbo.Table AS T WHERE T.year = 2013 的过滤器,或者您加载过去 30天或任何您的来源范围.

Your OLE DB Source command should be a query that identifies the candidate pool of records. If you can eliminate a range because you know it is invalid, apply a filter here. For example, you are only loading current year data then you'd apply a filter like SELECT T.* FROM Country.dbo.Table AS T WHERE T.year = 2013 or you're loading last 30 days or whatever your source range could be.

这将允许行流入您的数据流.从那里,您希望对参考/目标集的可用行执行更改检测.由于您需要考虑更改行,您可能需要一些标准来确定更改的内容与用于匹配标准的内容.在我们的数据仓库中,我们有两个哈希键:history 和 change.历史是通过 HASHBYTES 函数散列在一起的业务键,而更改键是散列在一起的非业务键.在我们的第一次查找中,我们测试当前行的历史哈希值和引用集的更改键是否存在.如果两者都匹配,那么我知道当前行存在于我的数据集中并且值相同,因此被路由到行计数然后关闭到位存储桶.

This will allows rows to flow into your data flow. From there, you'd want to perform change detection on the available rows to your reference/destination set. Since you need to account for change rows, you'll probably need some criteria to determine what's changed versus what is used for match criteria. In our data warehouse, we have two hash keys: historical and change. Historical are the business keys hashed together via the HASHBYTES function while the change key are the non-business keys hashed together. In our first lookup, we test for the existence of the current row's historical hash and the change key to the reference set's. If both match, then I know the current row exists in my dataset and the values are the same so that gets routed to a row count and then off to the bit bucket.

在第一次查找的不匹配输出中,我知道这是一个全新的行还是对现有行的更改.我需要执行第二次查找以确定哪个是真的,所以添加另一个查找组件,这次只匹配历史哈希键.如果找到匹配项,则需要执行更新.否则,我有一个新行.

In the no match output from the first lookup, I know either this is a brand new row or it's a change to an existing row. I'll need to perform a second lookup to determine which is true so add another Lookup Component, this time only matching on historical hash key. If I find a match, then I need to perform an update. Otherwise, I have a new row.

除非您的数据集很小,而不是使用 OLE DB 命令来执行更新,您可能会通过将更新暂存到表并在数据流完成后执行执行 SQL 任务来实际执行更新来获得更好的性能.这样做的原因是 OLE DB 命令(不要与 OLE DB Destination 混淆)执行单例数据库操作,这与适当 ETL 的基于集合的性质相反.

Unless your dataset is small, rather than using an OLE DB Command to perform an update, you will likely get better performance by staging your updates to a table and performing an Execute SQL Task after the Data Flow completes to actually perform the Update. The reason for this is that the OLE DB Command, not to be confused with OLE DB Destination, performs singleton database operations which is contrary to the set-based nature of proper ETL.

Andy Leonard 有一个很棒的系列,集成服务的阶梯,他在其中介绍了详细的增量负载模式.

Andy Leonard has a great series, the Stairway to Integration Services, where he covers the Incremental load pattern in detail.

这篇关于过程应该是可重复的,只插入新记录或/和更新现有记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆