如何加载包含有时包含换行符的列的竖线 (|) 分隔的文本文件? [英] How can I load in a pipe (|) delimited text file that has columns that sometimes contain line breaks?

查看:26
本文介绍了如何加载包含有时包含换行符的列的竖线 (|) 分隔的文本文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经构建了一个 SSIS 包,该包将几个分隔的文本文件加载到 SQL 数据库中.其中一个文件中通常包含行空间,这打破了设置平面文件源并映射到 ado.net 目标的标准数据流任务,因为它认为它在遇到换行符时位于新行上.发送文件的供应商不想在没有任何编辑的情况下发送文件,并且此时无法执行 XML.有没有什么办法解决这一问题?我正在考虑编写一个小的 vb.net 程序来纠正文件,以便它们可以在 SSIS 包中工作,但不确定如何编写该逻辑.该文件有 5 列,前 2 列是大整数并且总是包含一些长整数 ID,然后有一个小文本列,其中只包含一个短单词,然后是日期,然后是导致问题的长注释字段.评论字段有时是空白的(没关系),问题是有换行符的行.我永远不知道评论中有多少换行符,有些没有,有些可以有几个,甚至是连续多个换行符,所以想知道这是否可能.

I have built an SSIS package that loads in several delimited text files into a SQL database. One of the files often contains line spaces in it, which breaks the standard data flow task of setting a flat file source and mapping to an ado.net destination since it thinks it is on a new line when it reaches a line break. The vendor sending over the files does not want to sent the file without any edits and can't do XML at this time. Is there any way to fix this? I was thinking of writing a small vb.net program that would correct the files so they would work in the SSIS package, but not sure how to write that logic. The file has 5 columns, the first 2 are big integer and always contain some long integer ID, then there is a small text column that just contains one short word, then a date, and then a long comments field that is causing the problem. The comments field is sometimes blank (which is ok), the problem are the rows that have line breaks. I never know how many line breaks are in the comments, some have none, some can have several, even multiple line breaks in a row, so was wondering if this is even possible.

5787626|6547599|已批准|1/10/2017|申请费用减免申请已获批准5443221|7742812|活动|11/5/2013|3430962|7643957|重新安排|5/25/2016|拒绝修改条款和条件申请人有 30 天的时间提交文件申请延期.34433624|7673715|被拒绝|1/24/2017|34113575|7653748|有效|1/8/2014|新条款已授予.

5787626|6547599|Approved|1/10/2017|Applicant request for fee waiver approved 5443221|7742812|Active|11/5/2013| 3430962|7643957|Re-Scheduled|5/25/2016|REVISED TERMS AND CONDITIONS REJECTED Applicant has 30 DAYS To submit paperwork for extension. 34433624|7673715|Denied|1/24/2017| 34113575|7653748|Active|1/8/2014|New terms have been granted.

示例文件格式.

推荐答案

只要有可以编程/预测的逻辑,就可以.

As long as there is logic that you can program/predict, it will be possible.

我会使用脚本组件作为源来执行此操作,这意味着您无需在处理之前重写文件.它还提供了很大的灵活性,例如,您可以在迭代文件中的多行时将值存储在变量中等.

I would do it using a Script Component as a source, which means you don't need to rewrite the file before processing it. It also provides a lot of flexibility, e.g., you can store values in variables while iterating over multiple lines in the file, etc.

我最近发布了另一个答案,其中详细介绍了如何解决此问题:SSIS 将平面文件导入到 SQL,第一行作为标题,最后一行作为总和.

I posted another answer recently that gives a lot of detail on how to go about this: SSIS import a Flat File to SQL with the first row as header and last row as a total.

在准备好写入行之前将值保存在变量中的示例:-

An example of holding the values in variables until the row is ready to be written:-

对于这个例子,我写了三列,ID1、ID2 和 Comments.该文件如下所示:

For this example I am writing three columns, ID1, ID2 and Comments. The file looks like this:

1|2|Comment1
Comment2
4|5|Comment3
Comment4
Comment5
6|7|Comment6

脚本组件包含以下方法.

The Script Component contains the following method.

public override void CreateNewOutputRows()
{
    System.IO.StreamReader reader = null;

    try
    {
        bool readFirstLine = false;
        int id1 = 0;
        int id2 = 0;
        string comments = null;

        reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path

        while (!reader.EndOfStream)
        {
            string line = reader.ReadLine();

            if (line.Contains("|"))
            {
                if (readFirstLine)
                {
                    Output0Buffer.AddRow();

                    Output0Buffer.ID1 = id1;
                    Output0Buffer.ID2 = id2;
                    Output0Buffer.Comments = comments;
                }
                else
                {
                    readFirstLine = true;
                }

                string[] fields = line.Split('|');

                id1 = Convert.ToInt32(fields[0]);
                id2 = Convert.ToInt32(fields[1]);
                comments = fields[2];
            }
            else
            {
                comments += " " + line;
            }

            if (reader.EndOfStream)
            {
                Output0Buffer.AddRow();

                Output0Buffer.ID1 = id1;
                Output0Buffer.ID2 = id2;
                Output0Buffer.Comments = comments;
            }
        }
    }
    catch
    {
        if (reader != null)
        {
            reader.Close();
            reader.Dispose();
        }

        throw;
    }
}

结果集为:

ID1    ID2    Comments
===    ===    ========
1      2      Comment1 Comment2
4      5      Comment3 Comment4 Comment5
6      7      Comment6

这篇关于如何加载包含有时包含换行符的列的竖线 (|) 分隔的文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆