SSIS-脚本组件,将单行拆分为多行(父子变量) [英] SSIS - Script Component, Split single row to multiple rows (Parent Child Variation)

查看:151
本文介绍了SSIS-脚本组件,将单行拆分为多行(父子变量)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此先感谢您的帮助.我在编写SSIS脚本组件以将单行分隔为多行方面需要帮助.我在下面查看了很多有用的博客和帖子:

http://beyondrelational.com/ask/public/questions/1324/ssis-script-component-split-single-row-to-multiple-rows-parent-child-variation.aspx

http://bi-polar23. blogspot.com/2008/06/splitting-delimited-column-in-ssis.html

但是,我需要一些额外的编码帮助来完成该项目.基本上这就是我想要做的.

输入数据


    ID               Item Name
    1                Apple01,02,Banana01,02,03
    2                Spoon1,2,Fork1,2,3,4

输出数据


    ParentID      ChildID          Item Name
    1             1                Apple01
    1             2                Apple02
    1             3                Banana01
    1             4                Banana02
    1             5                Banana03
    2             1                Spoon1
    2             2                Spoon2
    2             3                Fork1
    2             4                Fork2
    2             5                Fork3
    2             6                Fork4

下面是我的编码尝试,但是如果不合逻辑,请随意进行整体修改.设置了SSIS异步输出.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Dim posID As Integer, childID As Integer
    Dim delimiter As String = ","
    Dim txtHolder As String, suffixHolder As String
    Dim itemName As String = Row.ItemName
    Dim keyField As Integer = Row.ID

    If Not (String.IsNullOrEmpty(itemList)) Then

        Dim inputListArray() As String = _
        itemList.Split(New String() {delimiter}, _
        StringSplitOptions.RemoveEmptyEntries)

        For Each item As String In inputListArray
            Output0Buffer.AddRow()
            Output0Buffer.ParentID = keyField

            If item.Length >= 3 Then
                txtHolder = Trim(item)
                Output0Buffer.ItemName = txtHolder

                'when item length is less than 3, it's suffix
            Else
                suffixHolder = Trim(item)
                txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
                    - Len(suffixHolder)) & suffixHolder.ToString()
                Output0Buffer.ItemName = txtHolder
            End If
        Next
    End If
End Sub

当前代码产生以下输出

ID           Item Name
1            Apple01
1            02
1            Banana01
1            02
1            03
2            Spoon1
2            2
2            Fork1
2            2
2            3
2            4

解决方案

如果我在此回应中遇到腐,那不是我的意图.基于评论我是编码方面的新手,并且对问题进行故障排除",我想遍历我的观察以及如何得出这些观察.

问题分析

人们希望根据与行相关联的定界字段将一行拆分为多个输出行.

目前的代码正在生成适当数量的行,因此您确实可以使脚本的异步部分(拆分)正常工作,所以这是一个加号.需要发生的是,我们需要1)填充子ID"列2)在生成子项目时,将项目前缀应用于所有后续行.

我会像这样对待大多数问题.我想完成什么?有什么用?什么不起作用?需要做些什么才能使其正常工作.将问题分解为越来越小的问题,最终会导致您完成 的工作.

代码观察

粘贴提供的代码会导致未声明itemList的错误.根据使用情况,它似乎应为itemName.

修复此问题后,您应该注意到IDE指示您有2个未使用的变量(posID,childID),并且variable txHolder is used before it's been assigned a value. A null reference exception could result at runtime.我的同事经常提到警告是尚未长大的错误,因此我对您的建议是刚起步的开发人员要注意警告,除非您明确希望编译器对上述情况发出警告.

开始使用

在解决儿童ID"情况与名称前缀/后缀之间进行选择时,我将从一个简单的选项开始,即儿童ID"

生成代理密钥

这是花哨的标题短语,如果您进行搜索,将对ssistalk或sqlis或许多精明的博客作者中的任何一个都有很大的吸引力.魔鬼当然知道要搜索什么.否,您永远不会在哪里计算子ID值或将其分配给流,这当然是为什么它不显示在其中.

我们只需要生成一个单调递增的数字,即可在每次源ID更改时重置该数字.我假设入站ID在传入数据中是唯一的,例如销售发票编号将是唯一的,并且我们正在拆分购买的商品.但是,如果这些ID在数据集中重复出现,则可能不是销售发票编号,而是销售人员ID.销售人员1可以在销售蔬菜的批生产中再排一行.那是一个更复杂的场景,如果可以更好地描述您的源数据,我们可以重新访问.

生成代理密钥有两个部分(同样,将问题分解成较小的部分).首先要做的是制作一个从1到N的计数.您已经定义了一个childId变量来满足此要求.初始化此变量(1),然后在您的foreach循环中将其递增.

现在我们开始计数,我们需要将该值推送到输出流上.将这两个步骤放在一起看起来像

        childID = 1
        For Each item As String In inputListArray
            Output0Buffer.AddRow()
            Output0Buffer.ParentId = keyField
            Output0Buffer.ChildId = childID
            ' There might be VB shorthand for ++
            childID = childID + 1

运行程序包并成功!从列表中刮掉生成代理密钥.

字符串混搭

在问题的另一半中,我不知道需要做什么,但是我需要为这一部分添加一些标题.在给出源数据的情况下,可能很难正确地做到这一点.您提供了Apple01,Banana01,Spoon1,Fork1的值.看起来那里有一个模式(名称与代码并置),但是它是什么呢?您的代码表明,如果小于3,则为后缀,但是您怎么知道 base 是什么呢?第一行使用前导0且长度为两位数,而第二行不使用前导零.这是您需要了解数据的地方.识别第一行代码"部分的规则是什么?一些可能的算法

  • 强制您的上游数据提供者提供一致的长度代码(我认为这在我13年中就曾经起作用过,但向源头推销却丝毫没有伤害)
  • 假设代码始终是数字,请以相反的顺序评估每个字符,以测试是否可以将其转换为整数(处理可变长度代码)
  • 假定split数组中的第二个元素将提供代码的长度.这是您在代码中采用的方法,并且确实有效.

除了固定局部变量ItemName/itemList外,我没有进行任何更改以使生成的项目名称起作用.最终代码通过删除PosID并将txtHolder初始化为空字符串来消除警告.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Dim childID As Integer
    Dim delimiter As String = ","
    Dim txtHolder As String = String.Empty, suffixHolder As String
    Dim itemName As String = Row.ItemName
    Dim keyField As Integer = Row.ID

    If Not (String.IsNullOrEmpty(itemName)) Then

        Dim inputListArray() As String = _
        itemName.Split(New String() {delimiter}, _
        StringSplitOptions.RemoveEmptyEntries)

        ' The inputListArray (our split out field)
        ' needs to generate values from 1 to N
        childID = 1
        For Each item As String In inputListArray
            Output0Buffer.AddRow()
            Output0Buffer.ParentId = keyField
            Output0Buffer.ChildId = childID
            ' There might be VB shorthand for ++
            childID = childID + 1

            If item.Length >= 3 Then
                txtHolder = Trim(item)
                Output0Buffer.ItemName = txtHolder
            Else
                'when item length is less than 3, it's suffix
                suffixHolder = Trim(item)
                txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
                    - Len(suffixHolder)) & suffixHolder.ToString()
                Output0Buffer.ItemName = txtHolder
            End If
        Next
    End If
End Sub

Thanks in advance for your help. I'm in need of help on writing SSIS script component to delimit single row to multiple rows. There were many helpful blog and post I looked at below:

http://beyondrelational.com/ask/public/questions/1324/ssis-script-component-split-single-row-to-multiple-rows-parent-child-variation.aspx

http://bi-polar23.blogspot.com/2008/06/splitting-delimited-column-in-ssis.html

However, I need a little extra help on coding to complete the project. Basically here's what I want to do.

Input data


    ID               Item Name
    1                Apple01,02,Banana01,02,03
    2                Spoon1,2,Fork1,2,3,4

Output data


    ParentID      ChildID          Item Name
    1             1                Apple01
    1             2                Apple02
    1             3                Banana01
    1             4                Banana02
    1             5                Banana03
    2             1                Spoon1
    2             2                Spoon2
    2             3                Fork1
    2             4                Fork2
    2             5                Fork3
    2             6                Fork4

Below is my attempt to code, but feel free to revise whole if it's illogic. SSIS Asynchronous output is set.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Dim posID As Integer, childID As Integer
    Dim delimiter As String = ","
    Dim txtHolder As String, suffixHolder As String
    Dim itemName As String = Row.ItemName
    Dim keyField As Integer = Row.ID

    If Not (String.IsNullOrEmpty(itemList)) Then

        Dim inputListArray() As String = _
        itemList.Split(New String() {delimiter}, _
        StringSplitOptions.RemoveEmptyEntries)

        For Each item As String In inputListArray
            Output0Buffer.AddRow()
            Output0Buffer.ParentID = keyField

            If item.Length >= 3 Then
                txtHolder = Trim(item)
                Output0Buffer.ItemName = txtHolder

                'when item length is less than 3, it's suffix
            Else
                suffixHolder = Trim(item)
                txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
                    - Len(suffixHolder)) & suffixHolder.ToString()
                Output0Buffer.ItemName = txtHolder
            End If
        Next
    End If
End Sub

The current code produces the following output

ID           Item Name
1            Apple01
1            02
1            Banana01
1            02
1            03
2            Spoon1
2            2
2            Fork1
2            2
2            3
2            4

解决方案

If I come across as pedantic in this response, it is not my intention. Based on the comment "I'm new at coding and having a problem troubleshooting" I wanted to walk through my observations and how I came to them.

Problem analysis

The desire is to split a single row into multiple output rows based on a delimited field associated to the row.

The code as it stands now is generating the appropriate number of rows so you do have the asynchronous part (split) of the script working so that's a plus. What needs to happen is we need to 1) Populate the Child ID column 2) Apply the item prefix to all subsequent row when generating the child items.

I treat most every problem like that. What am I trying to accomplish? What is working? What isn't working? What needs to be done to make it work. Decomposing problems into smaller and smaller problems will eventually result in something you can do.

Code observations

Pasting in the supplied code resulted in an error that itemList was not declared. Based on usage, it seems that it was intended to be itemName.

After fixing that, you should notice the IDE indicating you have 2 unused variables (posID, childID) and that the variable txHolder is used before it's been assigned a value. A null reference exception could result at runtime. My coworker often remarks warnings are errors that haven't grown up yet so my advice to you as a fledgling developer is to pay attention to warnings unless you explicitly expect the compiler to warn you about said scenario.

Getting started

With a choice between solving the Child ID situation versus the name prefix/suffix stuff, I'd start with an easy one, the child id

Generating a surrogate key

That's the fancy title phrase that if you searched on you'd have plenty of hits to ssistalk or sqlis or any of a number of fabulously smart bloggers. Devil of course is knowing what to search on. No where do you ever compute or assign the child id value to the stream which of course is why it isn't showing up there.

We simply need to generate a monotonically increasing number which resets each time the source id changes. I am making an assumption that the inbound ID is unique in the incoming data like a sales invoice number would be unique and we are splitting out the items purchased. However if those IDs were repeated in the dataset, perhaps instead of representing invoice numbers they are salesperson id. Sales Person 1 could have another row in the batch selling vegetables. That's a more complex scenario and we can revisit if that better describes your source data.

There are two parts to generating our surrogate key (again, break problems down into smaller pieces). The first thing to do is make a thing that counts up from 1 to N. You have defined a childId variable to serve this. Initialize this variable (1) and then increment it inside your foreach loop.

Now that we counting, we need to push that value onto the output stream. Putting those two steps together would look like

        childID = 1
        For Each item As String In inputListArray
            Output0Buffer.AddRow()
            Output0Buffer.ParentId = keyField
            Output0Buffer.ChildId = childID
            ' There might be VB shorthand for ++
            childID = childID + 1

Run the package and success! Scratch the generate surrogate key off the list.

String mashing

I don't know of a fancy term for what needs to be done in the other half of the problem but I needed some title for this section. Given the source data, this one might be harder to get right. You've supplied value of Apple01, Banana01, Spoon1, Fork1. It looks like there's a pattern there (name concatenated with a code) but what it is it? Your code indicates that if it's less than 3, it's a suffix but how do you know what the base is? The first row uses a leading 0 and is two digits long while the second row does not use a leading zero. This is where you need to understand your data. What is the rule for identifying the "code" part of the first row? Some possible algorithms

  • Force your upstream data providers to provide consistent length codes (I think this has worked once in my 13 years but it never hurts to push back against the source)
  • Assuming code is always digits, evaluate each character in reverse order testing whether it can be cast to an integer (Handles variable length codes)
  • Assume the second element in the split array will provide the length of the code. This is the approach you are taking with your code and it actually works.

I made no changes to make the generated item name work beyond fixing the local variables ItemName/itemList. Final code eliminates the warnings by removing PosID and initializing txtHolder to an empty string.

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Dim childID As Integer
    Dim delimiter As String = ","
    Dim txtHolder As String = String.Empty, suffixHolder As String
    Dim itemName As String = Row.ItemName
    Dim keyField As Integer = Row.ID

    If Not (String.IsNullOrEmpty(itemName)) Then

        Dim inputListArray() As String = _
        itemName.Split(New String() {delimiter}, _
        StringSplitOptions.RemoveEmptyEntries)

        ' The inputListArray (our split out field)
        ' needs to generate values from 1 to N
        childID = 1
        For Each item As String In inputListArray
            Output0Buffer.AddRow()
            Output0Buffer.ParentId = keyField
            Output0Buffer.ChildId = childID
            ' There might be VB shorthand for ++
            childID = childID + 1

            If item.Length >= 3 Then
                txtHolder = Trim(item)
                Output0Buffer.ItemName = txtHolder
            Else
                'when item length is less than 3, it's suffix
                suffixHolder = Trim(item)
                txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
                    - Len(suffixHolder)) & suffixHolder.ToString()
                Output0Buffer.ItemName = txtHolder
            End If
        Next
    End If
End Sub

这篇关于SSIS-脚本组件,将单行拆分为多行(父子变量)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆