SSIS 根据 A 列中的值是否存在于 B 列中添加新的 C 列 [英] SSIS add new column C based on if value in column A exists in column B

查看:35
本文介绍了SSIS 根据 A 列中的值是否存在于 B 列中添加新的 C 列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,从网络服务获取如下,

I have a dataset, getting from a web service as following,

ColumnA ....  ColumnB...
userA         NULL
userB         userA
userC         NULL
userD         userC

我想要的是

ColumnA ....  ColumnB...   ColumnC (if columnA's value exists in column B, position doesn't matter)
userA         NULL         YES 
userB         userA        NO
userC         NULL         YES
userD         userC        NO

作为 SSIS 的新手,我不确定这是否可能,谷歌帮助不大,我可以在不使用 C# 创建任务的情况下做到这一点吗?

Being new to SSIS, I am not sure if it's even possible, google is not helping much, can i do this without getting create a task using C# ?

编辑

进一步解释,

ColumnC = Value of ColumnA exists in ***ANY*** of ColumnB row. 

例如userA 在 columnB 中的值为 NULL,但它在 ColumnB 的一行或多行中使用,因此 ColumnC 值为 YES.

E.g. userA has value NULL in columnB but it's used in one or more rows of ColumnB hence ColumnC value is YES.

所以就逻辑而言,

  • 从 ColumnB 中获取不同的值并将它们放入一个 listA 中
  • 如果值存在于 ColumnA 中,则 Foreach 值在 listA 中,则 columnC 为 YES,否则为 NO

推荐答案

您需要在数据流中执行此操作.添加派生列转换并使用逻辑检查列 B 中的 ISNULL,然后检查列 A 和 B 是否匹配.将 NULL 与任何内容进行比较可能会导致错误.该表达式使用 TERNARY 运算符,遵循以下形式:

You will need to do this in the data flow. Add a Derived Column transformation and use logic to check for ISNULL in column B and to then check to see if column A and B match. Comparing NULL to anything may result in an error. The expression uses TERNARY operators, which adhere to the following form:

  • 布尔条件 (If) ? 真表达式结果 (Then): 错误的表达结果 (Else)
  • Boolean Condition (If) ? True Expression Result (Then) : False Expression Result (Else)

因此,您的表达式将类似于以下两种情况之一:

So, your expression will look something like one of the two following:

  • !ISNULL(ColumnB) &&列 A == 列 B ?是":否"
  • !ISNULL(ColumnB) ?(ColumnA == ColumnB ? "YES" : "NO") : "NO"

https://docs.microsoft.com/en-us/sql/integration-services/expressions/conditional-ssis-expression

https://www.sqlchick.com/entries/2011/6/8/nested-conditional-operators-in-an-ssis-derived-column.html

问题编辑的答案更新

我现在更明白你在问什么了.基本上,您正在对 B 列中的每一行执行查找,并检查它是否存在于 A 列中的任何行中. 这是更复杂且更难解决的问题,因为典型的数据流组件非常适合查看行,但不能跨行.因此,我可以立即想到两种解决方案和一种不使用的解决方案.

I understand better now what you are asking. Basically, you are performing a lookup on each row in Column B and checking to see if it exists on any rows in column A. This is more complex and challenging to solve because the typical data flow components are ideally suited for looking at data within a row, but not across rows. So, there are two solutions that I can think of off the top of my head and one solution not to use.

  1. 如果可能,将 A 列存储在数据库表中,然后在 B 列上执行查找以查看该值是否存在(反之亦然 - 您的样本数据集太小,我无法确定).这是最简单的解决方案.
  2. 通过脚本组件抽取整个文件,将每列中的数据存储到单独的数组中,然后使用 C#/LINQ 检查匹配项.这要复杂得多,我不喜欢它.
  3. 不使用的解决方案.将 A 列和 B 列拆分为单独的数据流,对它们进行排序,然后执行合并连接以查看是否存在匹配项.这里的技巧是知道是否需要维护文件中的每一行(即完全外连接),或者是否可以删除列 B 为空的行(即内连接).我更不喜欢这个,因为我高度怀疑行是否会对齐,这可能会导致不正确的结果.我至少提出这个问题是为了指出这个解决方案的潜在缺陷.
  1. Store Column A in a database table if possible and then perform a lookup on column B to see if the value exists (or vice verse - your sample data set is too small for me to be positive). This is the easiest solution.
  2. Pump the entire file through a script component, store the data in each column into separate array's and then use C#/LINQ to check for matches. This is much more complicated and I don't like it.
  3. solution not to use. Split column A and column B into separate data flows, sort them, and then perform a merge join to see if there are matches. The trick here is to know whether you need to maintain each row in the file (i.e., full outer join) or if it is OK to get rid of rows where column B is null (i.e., inner join). I like this even less because I highly doubt the rows will line up, which may result in incorrect results. I am at least raising this to point out the potential flaw in this solution.

我认为您应该查看您收到的导入文件是否可以简化.除非我没有正确理解您的数据集,否则看起来有单独的表列,其中 A 列是维度数据,B 列是事实数据,包含在一个文件中,其中存在潜在的一对多基数.

I think you should look to see if the import file you are receiving can be simplified. Unless I am not correctly understanding your data set, it looks like there are separate table columns, where column A is dimensional data and column B is fact data, being included in one file where there is a potential one-to-many cardinality.

这篇关于SSIS 根据 A 列中的值是否存在于 B 列中添加新的 C 列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆