具有可变列号的SSIS平面文件 [英] SSIS Flat Files with Variable Column Numbers

查看:80
本文介绍了具有可变列号的SSIS平面文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SSIS在处理特别令人沮丧的平面文件方面做了两件事,看来应该有解决的办法,但我不知道.如果您定义一个具有10列的平面文件,则以CRLF分隔的制表符作为行尾标记,这将非常适用于每行中正好有10列的文件.这两种痛苦的情况是:

SSIS does 2 things in relation to handling flat files which are particularly frustrating, and it seems there should be a way around them, but I can't figure it out. If you define a flat file with 10 columns, tab delimited with CRLF as the end of row marker this will work perfectly for files where there are exactly 10 columns in every row. The 2 painful scenarios are these:

  1. 如果有人在任何地方提供文件的第11列,则SSIS只是忽略它会很好,因为您尚未定义它.它应该只读取您已定义的10列,然后跳到行末标记,但是这样做是将所有其他数据与第10列中的数据连接起来,然后将所有这些数据合并到第10列中.真的有点没用.我意识到发生这种情况是因为第10列的定界符不是所有其他制表符,而是CRLF,因此它只是抓取CRLF之前的所有内容,因此不执行任何操作而替换了多余的制表符.我认为这不明智.

  1. If someone supplies a file with an 11th column anywhere, it would be nice if SSIS simply ignored it, since you haven't defined it. It should just read the 10 columns you have defined then skip to the end of row marker, but what is does instead is concatenate any additional data with the data in the 10th column and bung all that into the 10th column. Kind of useless really. I realise this happens because the delimiter for the 10th column is not tab like all the others, but CRLF, so it just grabs everything up to the CRLF, replacing extra tabs with nothing as it does so. This is not smart, in my opinion.

如果某人提供的文件只有9列,则可能会发生更糟的情况.它将暂时忽略意外发现的CRLF,并从下一行的开头填充所有缺少的列!轻描淡写在这里.谁会希望那件事发生?此时文件的其余部分是垃圾.

If someone supplies a file with only 9 columns something even worse happens. It will temporarily disregard the CRLF it has unexpectedly found and pad any missing columns with columns from the start of the next row! Not smart is an understatement here. Who would EVER want that to happen? The remainder of the file is garbage at that point.

无论出于何种原因,文件宽度的变化似乎都是不合理的(当然,只有行尾的变化可以合理地处理(减少x列或增加更多的列),但看起来根本无法很好地处理它,除非我缺少任何东西.

It doesn't seem unreasonable to have variations in file width for whatever reason (of course only variations at the end of a row can reaonably be handled (x fewer or extra columns) but it looks like this is simply not handled well, unless I'm missing something.

到目前为止,我们唯一的解决方案是将一行作为一个巨型列(column0)加载,然后使用脚本任务使用找到的许多定界符对它进行动态拆分.这行之有效,只是将行宽度限制为4000个字符(一个unicode列的最大宽度).如果您需要导入一个较宽的行(例如,要导入多个4000宽的列以进行文本导入),则需要如上所述定义多个列,但这样一来,您就必须严格限制每行的列数.

So far our only solution to this is to load a row as one giant column (column0) and then use a script task to dynamically split it using however many delimiters it finds. This works well, except that it limits row widths to 4000 chars (the max width of one unicode column). If you need to import a wider row (say with multiple 4000 wide columns for text import) then you need to define multiple columns as above, but you are then stuck with requiring a strict number of columns per row.

有没有办法解决这些限制?

Is there any way around these limitations?

推荐答案

格兰,我感到你很痛苦:) SSIS无法使列动态化,因为它需要存储每个列的元数据,并且由于我们正在使用可包含任何类型数据的平面文件,因此不能假设CRLF位于列"中那不是最后一列"的确是应该读取的数据线的末端.

Glenn, i feel your pain :) SSIS cannot make the columns dynamic, as it needs to store metadata of each column as it come through, and since we're working with flat files which can contain any kind of data, it can't assume that the CRLF in a 'column-that-is-not-that-last-column', is indeed the end of the data line its supposed to read.

与SQL2000中的DTS不同,您不能在运行时更改SSIS包的属性.

Unlike DTS in SQL2000, you can't change the properties of a SSIS package at runtime.

您可以做的是创建一个父包,该包读取平面文件(脚本任务),并且仅读取平面文件的第一行以获取列数和列名.此信息可以存储在变量中.

What you could do is create a parent package, that reads the flat file (script task), and only reads the first line of the flat file to get the number of columns, and the column names. This info can be stored in a variable.

然后,父程序包以编程方式加载子程序包(再次执行脚本任务),并更新子程序包源连接"的元数据.这是你会在哪里 1.添加/删除列以匹配平面文件. 2.为列设置列定界符,最后一列必须为CRLF-与ROW定界符匹配 3.在数据流任务中重新初始化源组件的元数据(ComponentMetadata.ReinitializeMetadata())(以识别源连接中的最新更改). 4.保存子sis软件包.

Then, the parent package loads the child package (script task again) programmatically, and updates the metadata of the Source Connection of the child package. This is where you would either 1. Add / remove columns to match the flat file. 2. Set the column delimiter for the columns, the last column has to be the CRLF - matching the ROW delimiter 3. Reinitialise the metadata (ComponentMetadata.ReinitializeMetadata()) of the Source Compoenent in the Dataflow task (to recognize the recent changes in the Source Connection). 4. Save the child ssis package.

以编程方式修改程序包的详细信息仅可轻易获得.

Details on programmatically modifying a package is readily available only.

然后,您的父包将仅执行子包(执行包任务"),并使用您的新映射来执行.

Then, your parent package just executes the Child package (Execute Package Task), and it'll execute with your new mappings.

这篇关于具有可变列号的SSIS平面文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆