在 SSIS 数据流中组合多种记录类型 [英] Combine Multiple Record Types in SSIS Data Flow

查看:35
本文介绍了在 SSIS 数据流中组合多种记录类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用集成服务来提取和转换数据,例如,包含在导入平面文件中的单一类型的固定格式记录中的数据.我还可以在一个文件中处理多种类型的固定格式记录,只要我能够将它们发送到不同的目的地.

I can use Integration Services to extract and transform data that is, for example, contained in a single type of fixed format record in an import flat file. I can also handle multiple types of fixed format records in a single file, as long as I am able to send them to different destinations.

但是,如果我需要将输入文件中的两条(或更多)条记录与要到达单个目的地的每条记录相关联,该怎么办?我知道这些记录属于一起的唯一方法是它们在输入文件中彼此相邻.数据流将处理一条又一条的记录.我无法使用 SSIS 包变量来计算记录数,因为包变量仅在数据流完成后更新.

But what if I need to associate two (or more) records from an input file for each record that is going to a single destination? The only way I know that these records belong together is that they occur next to each other in the input file. The Data Flow will process one record after another. I can't use an SSIS package variable to count records, because the package variables are only updated after the Data Flow has finished.

这是一个示例输入文件:

Here is a sample input file:

HDR00120140501      
DTL001JOAN     0.00 
DTL002     30.00 ABC
DTL001DAVE    11.00 
DTL002     21.85 DEF
DTL001BERT    50.00 
DTL002      0.00 GHI
TRL001              

我需要将 DTL001 和 DTL002 明细行中的数据合并为单个输出记录.此外,我需要包含来自 HDR001 标题行的信息(在本例中为日期值).所以目标表看起来像:

I need to combine data from the DTL001 and DTL002 detail lines for a single output record. In addition, I need to include information from the HDR001 header line (in this case a date value). So the destination table looks like:

CREATE TABLE TestImport (
  ID int IDENTITY(1,1) NOT NULL,
  ImportDate datetime NULL,
  Name char(4) NULL,
  Amount1 decimal(18, 2) NULL,
  Amount2 decimal(18, 2) NULL,
  Desc char(3) NULL )

六个输入详细记录,加上标题,将在目标表中产生三个记录:

The six input detail records, plus the header, will yield three records in the destination table:

ImportDate Name Amount1 Amount2 Desc
---------- ---- ------- ------- -----
2014-05-01 JOAN 0.00    30.00   ABC
2014-05-01 DAVE 11.00   21.85   DEF
2014-05-01 BERT 50.00   0.00    GHI

这是一个已解决的问题,旨在帮助可能遇到类似问题的任何人.

This is a solved problem that is posted to assist anyone who may encounter a similar problem.

推荐答案

可以使用 脚本条件拆分合并连接 组件.

The detail records can be combined using a combination of Script, Conditional Split, and Merge Join components.

脚本组件添加一个详细记录计数,它会随着遇到的每个 DTL001 记录而递增.这提供了一个详细记录计数列,可用于对输入文件中彼此相邻的 DTL001 和 DTL002 记录进行排序和连接.

The Script component adds a Detail Record Count that is incremented for each DTL001 record that it encounters. This provides a Detail Record Count column that can be used for sorting and joining the DTL001 and DTL002 records that are next to each other in the input file.

平面文件连接管理器和源用于将每个输入记录分成 RecordType、RecordSubType 和 RecordData 列.例如,前三个记录被分解为以下几部分:

The Flat File Connection Manager and Source are used to break each input record into RecordType, RecordSubType, and RecordData columns. For instance, the first three records are broken up into the following pieces:

"HDR", "001", "20140501      "
"DTL", "001", "JOAN     0.00 "
"DTL", "002", "     30.00 ABC"

接下来,Derived Header Columns 组件添加 HeaderRecordCount、DetailRecordCount 和 ImportDate 列.这些最初没有填充,但它们为脚本组件提供了一个位置来存储标题和详细信息计数,以及目标表的每一行所需的标题中的任何信息.此脚本还可以处理具有多个部分的文件,其中每个部分都有一个新的 HDR001 标头记录:

Next, the Derived Header Columns component adds HeaderRecordCount, DetailRecordCount, and ImportDate columns. These are initially unpopulated, but they provide a place for the Script component to store the Header and Detail counts, as well as any information from the header that will be needed for each row of the destination table. This Script could also handle files with multiple sections, where each section has a new HDR001 header record:

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
    static int _HeaderRecordCount = 0;
    static int _DetailRecordCount = 0;
    static DateTime _ImportDate = DateTime.MinValue;
    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        if (Row.RecordType == "HDR" && Row.RecordSubType == "001")
        {
            _HeaderRecordCount = _HeaderRecordCount + 1;
            _ImportDate = DateTime.ParseExact(Row.RecordData.Substring(0, 8), "yyyyMMdd", System.Globalization.CultureInfo.InvariantCulture);
        }
        else if (Row.RecordType == "DTL" && Row.RecordSubType == "001")
        {
            _DetailRecordCount = _DetailRecordCount + 1;
        }
        Row.HeaderRecordCount = _HeaderRecordCount;
        Row.DetailRecordCount = _DetailRecordCount;
        Row.ImportDate = _ImportDate;
    }
}

条件拆分组件然后将 DTL001 和 DTL002 记录发送到不同的路径.拆分条件为:

The Conditional Split component then sends the DTL001 and DTL002 records down separate paths. The split conditions are:

RecordType == "DTL" && RecordSubType == "001"
RecordType == "DTL" && RecordSubType == "002"

这些路径都在 DetailRecordCount 列上排序,Merge Join 使用该列将它们组合起来.来自 001 路径的所有记录都传递到 Merge Join 输出,来自 002 路径的 RecordData 列也传递到单独的输出列(例如,RecordData002).

These paths are each Sorted on the DetailRecordCount column, and the Merge Join combines them using that column. All of the records from the 001 path are passed through to the Merge Join output, and the RecordData column from the 002 path is also passed to a separate output column (e.g., RecordData002).

现在,两个平面文件记录的内容都可以在单个数据流记录中使用.Derived Detail Columns 组件用于提取任何所需信息,并将结果列发送到目标表.

Now the contents of both Flat File records are available in a single Data Flow record. The Derived Detail Columns component is used to pull out any required information, and the resulting columns are sent to the destination table.

这篇关于在 SSIS 数据流中组合多种记录类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆