分割CSV行时处理连续的引号 [英] Handling consecutive quotes when splitting CSV lines
问题描述
由于两个连续的双引号""
,我无法解析CSV文件中的值.
I am struggling with parsing values from a CSV file because of two consecutive double quotes ""
.
这是我从维基百科提取的CSV字段的示例:1997,Ford,E350,"Super, ""luxurious"" truck"
Here's an example of a CSV field I pulled from wikipedia: 1997,Ford,E350,"Super, ""luxurious"" truck"
我试图找到其他解决方法.
I have tried to find different ways to account for it.
我不断得到的结果是:
"1997"
福特"
"E350"
超级",
超级"
"豪华"卡车""
"1997"
"Ford"
"E350"
"Super,"
""Super"
" ""luxurious"" truck""
这是我的VB.Net函数.
This is my VB.Net function.
Private Function splitCSV(ByVal sLine As String) As List(Of String)
Dim comA As Integer = -1, comB = -1, quotA = -1, quotB = -1, pos = 0
Dim parsed As New List(Of String)
Dim quote As String = """"
Dim comma As String = ","
Dim len As Integer = sLine.Length
Dim first As Boolean = True
comA = sLine.IndexOf(comma, pos) ' Find the next comma.
quotA = sLine.IndexOf(quote, pos) ' Find the next quotation mark.
' This if function works if there is only one field in the given row.
If comA < 0 Then
parsed.Add(False)
Return parsed
End If
While pos < len ' While not at end of the string
comB = sLine.IndexOf(comma, comA + 1) ' Find the second comma
quotB = sLine.IndexOf(quote, quotA + 1) ' Find the second quotation mark
' Looking for the actual second quote mark
' Skips over the double quotation marks.
If quotA > -1 And quotA < comB Then ' If the quotation mark is before the first comma
If Math.Abs(quotA - quotB).Equals(1) Then
Dim tempA As Integer = quotA
Dim tempB As Integer = quotB
' Looking for the actual second quote mark
' Skips over the double quotation marks.
While (Math.Abs(tempA - tempB).Equals(1))
tempA = tempB
If Not tempA.Equals(sLine.LastIndexOf(quote)) Then
tempB = sLine.IndexOf(quote, tempA + 1)
Else
tempA = tempB - 2
End If
End While
quotB = tempB
End If
If quotB < 0 Then ' If second quotation mark does not exist
parsed.Add(False) ' End the function and Return False
Return parsed
End If
parsed.Add(sLine.Substring(quotA + 1, quotB - quotA - 1)) ' Otherwise, add the substring of initial and end quotation marks.
quotA = quotB ' Give quotA the position of quotB
pos = quotB ' Mark the current position
ElseIf comA < comB Then
If first Then ' If it is the first comma in the line,
parsed.Add(sLine.Substring(pos, comA)) ' Parse the first field
first = False ' The future commas will not be considered as the first one.
End If
comB = sLine.IndexOf(comma, comA + 1) ' Find the second comma
If comB > comA Then ' If the second comma exists
parsed.Add(sLine.Substring(comA + 1, comB - comA - 1)) ' Add the substring of the first and second comma.
comA = comB ' Give comA the position of comB
pos = comB ' Mark the current position
End If
ElseIf len > 0 Then ' If the first comma does not exist, as long as sLine has something,
parsed.Add(sLine.Substring(pos + 1, len - pos - 1)) ' Return the substing of position to end of string.
pos = len ' Mark the position at the end to exit out of while loop
End If
End While
Return parsed ' Return parsed list of string
End Function
推荐答案
TextFieldParser
在这种事情上确实非常不错,当然比自己动手还容易.测试起来很容易:我将样本复制到文件中,然后:
The TextFieldParser
is really pretty good with this sort of thing, certainly easier than rolling your own. It was easy to test this: I copied your sample to a file, then:
Imports Microsoft.VisualBasic.FileIO
...
Using parser = New TextFieldParser("C:\Temp\CSVPARSER.TXT")
parser.Delimiters = New String() {","}
parser.TextFieldType = FieldType.Delimited
parser.HasFieldsEnclosedInQuotes = True
While parser.EndOfData = False
data = parser.ReadFields
' use pipe to show column breaks:
Dim s = String.Join("|", data)
Console.WriteLine(s)
End While
End Using
HasFieldsEnclosedInQuotes = True
在这种情况下很重要.结果:
HasFieldsEnclosedInQuotes = True
would be important in this case. Result:
1997 |福特| E350 |超级豪华"卡车
1997|Ford|E350|Super, "luxurious" truck
super之后的逗号看起来不合适-可能很合适-但它在原引号内是引号:1997,Ford,E350,"Super, ""luxurious"" truck"
The comma after super looks out of place - and may well be - but it is inside quotes in the original: 1997,Ford,E350,"Super, ""luxurious"" truck"
还有其他库/程序包也可以很好地处理各种CSV布局和格式.
There are other libraries/packages which also do well with various CSV layouts and formats.
这篇关于分割CSV行时处理连续的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!