在VB.net中将管道定界文件更改为逗号定界 [英] Changing a pipe delimited file to comma delimited in VB.net
问题描述
因此,我有一组用管道分隔的输入,如下所示:
So I have a set of pipe delimited inputs which are something like this:
787291 | 3224325523 | 37826427 | 2482472 | 46284729 | 46246 | 24682
| 82524 | 6846419 | 68247
"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | 82524 | 6846419 | 68247
,然后使用以下代码将它们转换为逗号分隔:
and I am converting them to comma delimited using the code given below:
Dim line As String
Dim fields As String()
Using sw As New StreamWriter("c:\test\output.txt")
Using tfp As New FileIO.TextFieldParser("c:\test\test.txt")
tfp.TextFieldType = FileIO.FieldType.Delimited
tfp.Delimiters = New String() {"|"}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
fields = tfp.ReadFields
line = String.Join(",", fields)
sw.WriteLine(line)
End While
End Using
End Using
到目前为止一切顺利。它仅考虑引号外的定界符,并将其更改为逗号定界符。但是,当我输入的报价像下面这样时,麻烦就开始了:
So far so good. It only considers the delimiters that are present outside the quotes and changes them to the comma delimiter. But trouble starts when I have input with a stray quotation like below:
787291 | 3224325523 | 37826427 | 2482472 | 46284729 | 46246 | 24682
| 82524 | 6846419 | 68247
"787291 | 3224325523" | 37826427 | 2482472 | "46284729|46246" | 24682 | "82524 | 6846419 | 68247
此处代码给出
MalformeLineExcpetion
MalformeLineExcpetion
我意识到这是由于输入中的引号引起的
Which I realize is due to the stray quotation in my input and since i am like a total noob in RegEx so i am not able to use it here(or I am incapable of). If anyone has any idea, it would be much appreciated.
推荐答案
以下是注释中描述的编码过程:
Here is the coded procedure described in the comments:
- 阅读原始输入文件
- 修复错误的行(使用正则表达式或其他合适的东西),
- 使用
TextFieldParser
来执行正确输入的解析 -
Join()
TextFieldParser
使用,
作为分隔符 - 将固定的,重构的输入行保存到最终输出文件中
- Read all the lines of the original input file,
- fix the faulty lines (with Regex or anything else that fits),
- use
TextFieldParser
to perform the parsing of the correct input Join()
the input parts created byTextFieldParser
using,
as separator- save the fixed, reconstructed input lines to the final output file
我正在使用 WiktorStribiżew正则表达式模式
I'm using Wiktor Stribiżew Regex pattern: it looks like it should work given the description of the problem.
注意:
当然,我不会:不知道是否应使用特定的编码。
在这里,默认为编码 UTF-8 no-BOM
强>,进出。
Note:
Of course I don't know whether a specific Encoding should be used.
Here, the Encoding is the default UTF-8 no-BOM
, in and out.
FaultyInput.txt
是损坏的
FixedInput.txt
是包含正则表达式固定(希望)的输入行的文件。您还可以使用 MemoryStream
。
FixedOutput.txt
是最终的 CSV
文件,其中包含逗号分隔的字段和正确的值。
"FaultyInput.txt"
is the corrupted source file.
"FixedInput.txt"
is the file containing the input lines fixed (hopefully) by the Regex. You could also use a MemoryStream
.
"FixedOutput.txt"
is the final CSV
file, containing comma separated fields and the correct values.
这些文件都是在可执行启动路径中读取/写入的。
These files are all read/written in the executable startup path.
Dim input As List(Of String) = File.ReadAllLines("FaultyInput.txt").ToList()
For line As Integer = 0 To input.Count - 1
input(line) = Regex.Replace(input(line), "(""\b.*?\b"")|""", "$1")
Next
File.WriteAllLines("FixedInput.txt", input)
Dim output As List(Of String) = New List(Of String)
Using tfp As New FileIO.TextFieldParser("FixedInput.txt")
tfp.TextFieldType = FileIO.FieldType.Delimited
tfp.Delimiters = New String() {"|"}
tfp.HasFieldsEnclosedInQuotes = True
While Not tfp.EndOfData
Dim fields As String() = tfp.ReadFields
output.Add(String.Join(",", fields))
End While
End Using
File.WriteAllLines("FixedOutput.txt", output)
'Eventually...
'File.Delete("FixedInput.txt")
这篇关于在VB.net中将管道定界文件更改为逗号定界的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!