用于不一致列数导入的 SSIS 任务? [英] SSIS Task for inconsistent column count import?

查看:19
本文介绍了用于不一致列数导入的 SSIS 任务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题.

我经常收到来自不同供应商的提要文件.尽管列名是一致的,但当一些供应商发送的文本文件在其中的提要文件中包含更多或更少的列时,问题就会出现.

而且这些文件的排列也不一致.

除了 Cozy Roc 提供的动态数据流任务之外,还有另一种方法可以导入这些文件.我不是 C# 专家,但我使用脚本任务"控制流或脚本组件"数据流任务推动我前进.

任何建议、示例或方向将不胜感激.

http://www.cozyroc.com/ssis/data-flow-task

一些论坛

http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400

http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow

解决方案

在我的脑海里,我有一个 50% 的解决方案给你.

问题

SSIS 真的关心元数据,因此它的变化往往会导致异常.从这个意义上说,DTS 的宽容度要大得多.对一致元数据的强烈需求使得使用平面文件源变得很麻烦.

基于查询的解决方案

如果是组件的问题,我们就不要使用它.我喜欢这种方法的一点是,从概念上讲,它与查询表相同——列的顺序无关紧要,额外列的存在也无关紧要.

变量

我创建了 3 个变量,都是字符串类型:CurrentFileName、InputFolder 和 Query.

  • InputFolder 硬连线到源文件夹.在我的例子中,它是 C:ssisdataKipreal
  • CurrentFileName 是文件的名称.在设计时,它是 input5columns.csv,但在运行时会改变.
  • Query 是一个表达式 "SELECT col1, col2, col3, col4, col5 FROM " + @[User::CurrentFilename]

连接管理器

使用 JET OLEDB 驱动程序设置与输入文件的连接.按照链接文章中的描述创建后,我将其重命名为 FileOLEDB 并在 "Data Source=" + @[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB 的 ConnectionManager 上设置表达式.4.0;扩展属性="text;HDR=Yes;FMT=CSVDelimited;";"

控制流程

我的控制流看起来像是嵌套在 Foreach 文件枚举器中的数据流任务

Foreach 文件枚举器

我的 Foreach 文件枚举器配置为对文件进行操作.我在 @[User::InputFolder] 的目录上放置了一个表达式 请注意,此时,如果该文件夹的值需要更改,它将在连接管理器和文件枚举器.在检索文件名"中,选择名称和扩展名",而不是默认的完全限定"

在变量映射选项卡中,将值分配给我们的 @[User::CurrentFileName] 变量

此时,循环的每次迭代都会改变@[User::Query的值以反映当前文件名.

数据流

这实际上是最简单的一块.使用 OLE DB 源并按照指示进行连接.

使用 FileOLEDB 连接管理器并将数据访问模式更改为来自变量的 SQL 命令".使用其中的 @[User::Query] 变量,单击 OK,您就可以开始工作了.

示例数据

我创建了两个示例文件 input5columns.csv 和 input7columns.csv 5 的所有列都在 7 中,但 7 的顺序不同(col2 是序数位置 2 和 6).我否定了 7 中的所有值,以便很容易看出正在操作哪个文件.

col1,col3,col2,col5,col41、3、2、5、41111,3333,2222,5555,444411、33、22、55、44111,333,222,555,444

col1,col3,col7,col5,col4,col6,col2-1111,-3333,-7777,-5555,-4444,-6666,-2222-111,-333,-777,-555,-444,-666,-222-1,-3,-7,-5,-4,-6,-2-11,-33,-77,-55,-44,-666,-222

运行包会产生这两个屏幕截图

缺少什么

我不知道有什么方法可以告诉基于查询的方法,如果列不存在也可以.如果有唯一键,我想您可以将查询定义为仅包含必须的列,然后对文件执行查找以尝试获取应该的列> 如果列不存在,则在那里并且不会使查找失败.虽然很笨拙.

Problem.

I regularly receive a feed files from different suppliers. Although the column names are consistent the problem comes when some suppliers send text files with more or less columns in there feed file.

Furthermore the arrangement of these files are inconsistent.

Other than the Dynamic data flow task provided by Cozy Roc is there another way I could import these files. I am not a C# guru but i am driven torwards using a "Script Task" control flow or "Script Component" Data flow task.

Any suggestion, samples or direction will greatly be appreciated.

http://www.cozyroc.com/ssis/data-flow-task

Some forums

http://www.sqlservercentral.com/Forums/Topic525799-148-1.aspx#bm526400

http://www.bidn.com/forums/microsoft-business-intelligence/integration-services/26/dynamic-data-flow

解决方案

Off the top of my head, I have a 50% solution for you.

The problem

SSIS really cares about meta data so variations in it tend to result in exceptions. DTS was far more forgiving in this sense. That strong need for consistent meta data makes use of the Flat File Source troublesome.

Query based solution

If the problem is the component, let's not use it. What I like about this approach is that conceptually, it's the same as querying a table-the order of columns does not matter nor does the presence of extra columns matter.

Variables

I created 3 variables, all of type string: CurrentFileName, InputFolder and Query.

  • InputFolder is hard wired to the source folder. In my example, it's C:ssisdataKipreal
  • CurrentFileName is the name of a file. During design time, it was input5columns.csv but that will change at run time.
  • Query is an expression "SELECT col1, col2, col3, col4, col5 FROM " + @[User::CurrentFilename]

Connection manager

Set up a connection to the input file using the JET OLEDB driver. After creating it as described in the linked article, I renamed it to FileOLEDB and set an expression on the ConnectionManager of "Data Source=" + @[User::InputFolder] + ";Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties="text;HDR=Yes;FMT=CSVDelimited;";"

Control Flow

My Control Flow looks like a Data flow task nested in a Foreach file enumerator

Foreach File Enumerator

My Foreach File enumerator is configured to operate on files. I put an expression on the Directory for @[User::InputFolder] Notice that at this point, if the value of that folder needs to change, it'll correctly be updated in both the Connection Manager and the file enumerator. In "Retrieve file name", instead of the default "Fully Qualified", choose "Name and Extension"

In the Variable Mappings tab, assign the value to our @[User::CurrentFileName] variable

At this point, each iteration of the loop will change the value of the @[User::Query to reflect the current file name.

Data Flow

This is actually the easiest piece. Use an OLE DB source and wire it as indicated.

Use the FileOLEDB connection manager and change the Data Access mode to "SQL Command from variable." Use the @[User::Query] variable in there, click OK and you're ready to work.

Sample data

I created two sample files input5columns.csv and input7columns.csv All of the columns of 5 are in 7 but 7 has them in a different order (col2 is ordinal position 2 and 6). I negated all the values in 7 to make it readily apparent which file is being operated on.

col1,col3,col2,col5,col4
1,3,2,5,4
1111,3333,2222,5555,4444
11,33,22,55,44
111,333,222,555,444

and

col1,col3,col7,col5,col4,col6,col2
-1111,-3333,-7777,-5555,-4444,-6666,-2222
-111,-333,-777,-555,-444,-666,-222
-1,-3,-7,-5,-4,-6,-2
-11,-33,-77,-55,-44,-666,-222

Running the package results in these two screen shots

What's missing

I don't know of a way to tell the query based approach that it's OK if a column doesn't exist. If there's a unique key, I suppose you could define your query to have only the columns that must be there and then perform lookups against the file to try and obtain the columns that ought to be there and not fail the lookup if the column doesn't exist. Pretty kludgey though.

这篇关于用于不一致列数导入的 SSIS 任务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆