在VB.net中将管道定界文件更改为逗号定界 [英] Changing a pipe delimited file to comma delimited in VB.net

查看:84
本文介绍了在VB.net中将管道定界文件更改为逗号定界的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我有一组用管道分隔的输入,如下所示:

So I have a set of pipe delimited inputs which are something like this:


787291 | 3224325523 | 37826427 | 2482472 | 46284729 | 46246 | 24682
| 82524 | 6846419 | 68247

"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | 82524 | 6846419 | 68247

,然后使用以下代码将它们转换为逗号分隔:

and I am converting them to comma delimited using the code given below:

 Dim line As String
    Dim fields As String()
    Using sw As New StreamWriter("c:\test\output.txt")
        Using tfp As New FileIO.TextFieldParser("c:\test\test.txt")
            tfp.TextFieldType = FileIO.FieldType.Delimited
            tfp.Delimiters = New String() {"|"}
            tfp.HasFieldsEnclosedInQuotes = True
            While Not tfp.EndOfData
                fields = tfp.ReadFields
                line = String.Join(",", fields)
                sw.WriteLine(line)
            End While
        End Using
    End Using

到目前为止一切顺利。它仅考虑引号外的定界符,并将其更改为逗号定界符。但是,当我输入的报价像下面这样时,麻烦就开始了:

So far so good. It only considers the delimiters that are present outside the quotes and changes them to the comma delimiter. But trouble starts when I have input with a stray quotation like below:


787291 | 3224325523 | 37826427 | 2482472 | 46284729 | 46246 | 24682
| 82524 | 6846419 | 68247

"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | "82524 | 6846419 | 68247

此处代码给出


MalformeLineExcpetion

MalformeLineExcpetion

我意识到这是由于输入中的引号引起的

Which I realize is due to the stray quotation in my input and since i am like a total noob in RegEx so i am not able to use it here(or I am incapable of). If anyone has any idea, it would be much appreciated.

推荐答案

以下是注释中描述的编码过程:

Here is the coded procedure described in the comments:


  • 阅读原始输入文件

  • 修复错误的行(使用正则表达式或其他合适的东西),

  • 使用 TextFieldParser 来执行正确输入的解析

  • Join() TextFieldParser 使用作为分隔符

  • 将固定的,重构的输入行保存到最终输出文件中

  • Read all the lines of the original input file,
  • fix the faulty lines (with Regex or anything else that fits),
  • use TextFieldParser to perform the parsing of the correct input
  • Join() the input parts created by TextFieldParser using , as separator
  • save the fixed, reconstructed input lines to the final output file

我正在使用 WiktorStribiżew正则表达式模式

I'm using Wiktor Stribiżew Regex pattern: it looks like it should work given the description of the problem.

注意

当然,我不会:不知道是否应使用特定的编码。

在这里,默认为编码 UTF-8 no-BOM ,进出。

Note:
Of course I don't know whether a specific Encoding should be used.
Here, the Encoding is the default UTF-8 no-BOM, in and out.

FaultyInput.txt 损坏的

FixedInput.txt 是包含正则表达式固定(希望)的输入行的文件。您还可以使用 MemoryStream

FixedOutput.txt 是最终的 CSV 文件,其中包含逗号分隔的字段和正确的值。

"FaultyInput.txt" is the corrupted source file.
"FixedInput.txt" is the file containing the input lines fixed (hopefully) by the Regex. You could also use a MemoryStream.
"FixedOutput.txt" is the final CSV file, containing comma separated fields and the correct values.

这些文件都是在可执行启动路径中读取/写入的。

These files are all read/written in the executable startup path.

Dim input As List(Of String) = File.ReadAllLines("FaultyInput.txt").ToList()
For line As Integer = 0 To input.Count - 1
    input(line) = Regex.Replace(input(line), "(""\b.*?\b"")|""", "$1")
Next

File.WriteAllLines("FixedInput.txt", input)

Dim output As List(Of String) = New List(Of String)
Using tfp As New FileIO.TextFieldParser("FixedInput.txt")
    tfp.TextFieldType = FileIO.FieldType.Delimited
    tfp.Delimiters = New String() {"|"}
    tfp.HasFieldsEnclosedInQuotes = True
    While Not tfp.EndOfData
        Dim fields As String() = tfp.ReadFields
        output.Add(String.Join(",", fields))
    End While
End Using

File.WriteAllLines("FixedOutput.txt", output)
'Eventually...
'File.Delete("FixedInput.txt")

这篇关于在VB.net中将管道定界文件更改为逗号定界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆