分割CSV行时处理连续的引号 [英] Handling consecutive quotes when splitting CSV lines

查看:132
本文介绍了分割CSV行时处理连续的引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于两个连续的双引号"",我无法解析CSV文件中的值.

I am struggling with parsing values from a CSV file because of two consecutive double quotes "".

这是我从维基百科提取的CSV字段的示例:1997,Ford,E350,"Super, ""luxurious"" truck"

Here's an example of a CSV field I pulled from wikipedia: 1997,Ford,E350,"Super, ""luxurious"" truck"

我试图找到其他解决方法.

I have tried to find different ways to account for it.

我不断得到的结果是:

"1997"
福特"
"E350"
超级",
超级"
"豪华"卡车""

"1997"
"Ford"
"E350"
"Super,"
""Super"
" ""luxurious"" truck""

这是我的VB.Net函数.

This is my VB.Net function.

Private Function splitCSV(ByVal sLine As String) As List(Of String)
    Dim comA As Integer = -1, comB = -1, quotA = -1, quotB = -1, pos = 0
    Dim parsed As New List(Of String)
    Dim quote As String = """"
    Dim comma As String = ","
    Dim len As Integer = sLine.Length
    Dim first As Boolean = True

    comA = sLine.IndexOf(comma, pos)                        ' Find the next comma.
    quotA = sLine.IndexOf(quote, pos)                       ' Find the next quotation mark.

    ' This if function works if there is only one field in the given row.
    If comA < 0 Then
        parsed.Add(False)
        Return parsed
    End If

    While pos < len                                                     ' While not at end of the string

        comB = sLine.IndexOf(comma, comA + 1)                               ' Find the second comma
        quotB = sLine.IndexOf(quote, quotA + 1)                             ' Find the second quotation mark

        ' Looking for the actual second quote mark
        '     Skips over the double quotation marks.

        If quotA > -1 And quotA < comB Then                                 ' If the quotation mark is before the first comma

            If Math.Abs(quotA - quotB).Equals(1) Then
                Dim tempA As Integer = quotA
                Dim tempB As Integer = quotB

                ' Looking for the actual second quote mark
                '     Skips over the double quotation marks.
                While (Math.Abs(tempA - tempB).Equals(1))
                    tempA = tempB

                    If Not tempA.Equals(sLine.LastIndexOf(quote)) Then
                        tempB = sLine.IndexOf(quote, tempA + 1)

                    Else
                        tempA = tempB - 2
                    End If

                End While

                quotB = tempB
            End If

            If quotB < 0 Then                                                   ' If second quotation mark does not exist
                parsed.Add(False)                                                   ' End the function and Return False

                Return parsed
            End If

            parsed.Add(sLine.Substring(quotA + 1, quotB - quotA - 1))       ' Otherwise, add the substring of initial and end quotation marks.
            quotA = quotB                                                       ' Give quotA the position of quotB
            pos = quotB                                                         ' Mark the current position

        ElseIf comA < comB Then
            If first Then                                                   ' If it is the first comma in the line,
                parsed.Add(sLine.Substring(pos, comA))                          ' Parse the first field
                first = False                                                   ' The future commas will not be considered as the first one.
            End If

            comB = sLine.IndexOf(comma, comA + 1)                           ' Find the second comma

            If comB > comA Then                                             ' If the second comma exists
                parsed.Add(sLine.Substring(comA + 1, comB - comA - 1))          ' Add the substring of the first and second comma.
                comA = comB                                                     ' Give comA the position of comB
                pos = comB                                                      ' Mark the current position

            End If

        ElseIf len > 0 Then                                                 ' If the first comma does not exist, as long as sLine has something,
            parsed.Add(sLine.Substring(pos + 1, len - pos - 1))                         ' Return the substing of position to end of string.
            pos = len                                                           ' Mark the position at the end to exit out of while loop


        End If

    End While

    Return parsed                                                           ' Return parsed list of string
End Function

推荐答案

TextFieldParser在这种事情上确实非常不错,当然比自己动手还容易.测试起来很容易:我将样本复制到文件中,然后:

The TextFieldParser is really pretty good with this sort of thing, certainly easier than rolling your own. It was easy to test this: I copied your sample to a file, then:

Imports Microsoft.VisualBasic.FileIO
...
Using parser = New TextFieldParser("C:\Temp\CSVPARSER.TXT")
    parser.Delimiters = New String() {","}
    parser.TextFieldType = FieldType.Delimited
    parser.HasFieldsEnclosedInQuotes = True

    While parser.EndOfData = False
        data = parser.ReadFields

        ' use pipe to show column breaks:
        Dim s = String.Join("|", data)
        Console.WriteLine(s)

    End While
End Using

HasFieldsEnclosedInQuotes = True在这种情况下很重要.结果:

HasFieldsEnclosedInQuotes = True would be important in this case. Result:

1997 |福特| E350 |超级豪华"卡车

1997|Ford|E350|Super, "luxurious" truck

super之后的逗号看起来不合适-可能很合适-但它在原引号内是引号:1997,Ford,E350,"Super, ""luxurious"" truck"

The comma after super looks out of place - and may well be - but it is inside quotes in the original: 1997,Ford,E350,"Super, ""luxurious"" truck"

还有其他库/程序包也可以很好地处理各种CSV布局和格式.

There are other libraries/packages which also do well with various CSV layouts and formats.

这篇关于分割CSV行时处理连续的引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆