用于解析CSV行的正则表达式 [英] Regular expressions to parse a CSV line

查看:236
本文介绍了用于解析CSV行的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

亲爱的所有


是否有人使用正则表达式解析逗号分隔的行

某些字段可选地具有字符串分隔符(文本限定符)

我目前正在使用这个正则表达式进行测试,它几乎可以在我的所有测试用例中使用。我在互联网上用C#解决方案找到了这个。


,(?=([^ \"] *" [^] *")*(? ![^"] *"))


解释它。 />

我使用的VB.NET函数是


公共函数parseCSVLine(ByVal sInputString As String)As ArrayList

Dim r作为新正则表达式(,(?=([^ \& Chr(34)&'] *"& Chr(34)&" [^&

Chr(34)&"] *"& Chr(34)&")*(?![^"& Chr(34)&"] *" ;& Chr(34)&"))")

Dim iStart As Integer,m as Match

Dim oArrayList As New ArrayList()


每个m in r.Matches(sInputString)

oArrayList.Add(sInputString.Substring(iStart,m.Index - iStart))

iStart = m.Index + 1

下一页

oArrayList.Add(sInputString.Substring(iStart,sInpu) tString.Length -

iStart))


返回oArrayList

结束功能


我的测试用例如下:



CSV

价值1

价值2

价值3

价值4

结果


1

a,b,c

a

b

c


P


2

" a",b,c

a

b

c


P


3

'''',b,c

'''''

b

c


P


4

a,b,c

a

b

c


P


5

aa,bb; cc

aa

bb; cc

P


6


P


7

a

a


P


8

,b,


b

P


9

,, c

c / /

P


10

,,



P


11

",b


b

P


12

" ,b

[SPACE]

b

P


13

a,b

a,b

P


14

a,b,c

a,b

c

P


15

" a,b",c

a,b

c

P


16 br / >
ab,c

ab

c

P


17

ab,c

a" b

C

P


18

" a"" b",c

a" b

c

P


19

a"" b,c

a"  b

c

P


20

a,b",c

a

b"

c


O


21

a,b",c

a

b"

c


P


22

a," B:"" Hi,I''B B"",         
我是B>

c


P


23

a," b,c

a

" b

c


O


24

a,bc" d,e

a

bc" d

e

O


25

a,bc" d",e

a

bc" d"

e


O


26

a," bc" d,e

a

" bcd
$ b $


O


非常感谢,

Wazir

Dear All

Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I am currently testing with this regular expression and it works in almost
all my test cases. I found this on the internet in a C# solution.

,(?=([^\"]*"[^"]*")*(?![^"]*"))

However in some of my test cases it fails and I am having difficulty
interpreting it.

The VB.NET function I used is

Public Function parseCSVLine(ByVal sInputString As String) As ArrayList
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^" &
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))")
Dim iStart As Integer, m As Match
Dim oArrayList As New ArrayList()

For Each m In r.Matches(sInputString)
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart))
iStart = m.Index + 1
Next
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length -
iStart))

Return oArrayList
End Function

My test cases are as follows:
#
CSV
Value 1
Value 2
Value 3
Value 4
Results

1
a,b,c
a
b
c

P

2
"a",b,c
a
b
c

P

3
''a'',b,c
''a''
b
c

P

4
a , b , c
a
b
c

P

5
aa,bb;cc
aa
bb;cc
P

6

P

7
a
a

P

8
,b,

b
P

9
,,c
c

P

10
,,


P

11
"",b

b
P

12
" ",b
[SPACE]
b
P

13
"a,b"
a,b

P

14
"a,b",c
a,b
c
P

15
" a , b ", c
a , b
c
P

16
a b,c
a b
c
P

17
a"b,c
a"b
C
P

18
"a""b",c
a"b
c
P

19
a""b,c
a""b
c
P

20
a,b",c
a
b"
c

O

21
a,b"",c
a
b""
c

P

22
a,"B: ""Hi, I''m B""",c
a
B: "Hi, I''m B"
c

P

23
a,"b,c
a
"b
c

O

24
a,bc"d,e
a
bc"d
e
O

25
a,bc"d",e
a
bc"d"
e

O

26
a,"bc"d,e
a
"bc"d
e

O

Many thanks,
Wazir

推荐答案

对于测试用例的格式化抱歉,我没有意识到我发布了

纯文本。


这里他们又来了,我希望这次更具可读性。


他们中的一些人喜欢案例20并不适用于正则表达式


#CSV值1值2值3值4

1 a,b,cabc

2" a",b,cabc

3''a'',b,c''a''b c

4 a,b,cabc

5 aa,bb; cc aa bb; cc

6

7 aa

8,b,b

9 ,, cc

10 ,,

11"" ;,bb

12" ,b [SPACE] b

13" a,b" a,b

14a,b,c a,b c

15" a,b",ca,bc

16 ab,cabc

17 a" b,c a" bc

18" a" ;b,c abc

19 a" b,c a" bc

20 a,b",ca b" c

21 a,b"",c b b"" c

22 a,B:"我是B"",&b;我是B

c

23 a," b,ca" bc

24 a,bc" d,ea bc" de

25a,bcd,ea bcd。 e />
26 a,bcd,eabcde
Apologies for the formatting of test cases, I didnt realise I was posting in
Plain Text.

Here they are again, I hope it is more readable this time.

Some of them like case 20 doesnt work with the regular expression

# CSV Value 1 Value 2 Value 3 Value 4
1 a,b,c a b c
2 "a",b,c a b c
3 ''a'',b,c ''a'' b c
4 a , b , c a b c
5 aa,bb;cc aa bb;cc
6
7 a a
8 ,b, b
9 ,,c c
10 ,,
11 "",b b
12 " ",b [SPACE] b
13 "a,b" a,b
14 "a,b",c a,b c
15 " a , b ", c a , b c
16 a b,c a b c
17 a"b,c a"b c
18 "a""b",c a"b c
19 a""b,c a""b c
20 a,b",c a b" c
21 a,b"",c a b"" c
22 a,"B: ""Hi, I''m B""",c a B: "Hi, I''m B"
c
23 a,"b,c a "b c
24 a,bc"d,e a bc"d e
25 a,bc"d",e a bc"d" e
26 a,"bc"d,e a "bc"d e


为测试用例格式化道歉,我没有意识到我发帖在

纯文本。


这里再次出现,我希望这次更具可读性。


其中一些像案例20不适用于正则表达式


#CSV值1值2值3值4

1 a,b, cabc

2" a",b,cabc

3''''',b,c''''bc

4 a,b,cabc

5 aa,bb; cc aa bb; cc

6

7 aa

8,b,b

9 ,, c c

10 ,,

11",b b

12" ,b [SPACE] b

13" a,b" a,b

14a,b,c a,b c

15" a,b",ca,bc

16 ab,cabc

17 a" b,c a" bc

18" a" ;b,c abc

19 a" b,c a" bc

20 a,b",ca b" c

21 a,b"",c b b"" c

22 a,B:"我是B"",&b;我是B

c

23 a," b,ca" bc

24 a,bc" d,ea bc" de

25a,bcd,ea bcd。 e />
26 a,bcd,ea" bc" de
Apologies for the formatting of test cases, I didnt realise I was posting in
Plain Text.

Here they are again, I hope it is more readable this time.

Some of them like case 20 doesnt work with the regular expression

# CSV Value 1 Value 2 Value 3 Value 4
1 a,b,c a b c
2 "a",b,c a b c
3 ''a'',b,c ''a'' b c
4 a , b , c a b c
5 aa,bb;cc aa bb;cc
6
7 a a
8 ,b, b
9 ,,c c
10 ,,
11 "",b b
12 " ",b [SPACE] b
13 "a,b" a,b
14 "a,b",c a,b c
15 " a , b ", c a , b c
16 a b,c a b c
17 a"b,c a"b c
18 "a""b",c a"b c
19 a""b,c a""b c
20 a,b",c a b" c
21 a,b"",c a b"" c
22 a,"B: ""Hi, I''m B""",c a B: "Hi, I''m B"
c
23 a,"b,c a "b c
24 a,bc"d,e a bc"d e
25 a,bc"d",e a bc"d" e
26 a,"bc"d,e a "bc"d e


您可以使用OleDbCommand类读取csv文件


Tu-Thac
www.ongtech.co


----- MW写道:----


亲爱的Al


有没有人有一个正则表达式来解析逗号分隔的线条

某些字段可选择有字符串分隔符(文本限定符

我目前正在测试这个正则表达式我在所有的测试用例中都可以在almos中工作。我在互联网上用C#解决方案找到了这个。


,(?=([^ \] ;] *" [^] *")*(?![^] *")


但是在我的一些测试用例中,它失败了,我我很难

解释它


我使用的VB.NET函数我


公共函数parseCSVLine(ByVal sInputString As String)A ■ArrayLis

Dim r作为新的正则表达式(&,?(?=([^ \" &安培; Chr(34)& "] *" &安培; Chr(34)& [^

Chr(34)& "] *" &安培; Chr(34)& ")*(?![^"& Chr(34)&"] *"& Chr(34)&"))"

Dim iStart As整数,m As Matc

Dim oArrayList As New ArrayList(


For each m in r.Matches(sInputString

oArrayList。添加(sInputString.Substring(iStart,m.Index - iStart)

iStart = m.Index +

Nex

oArrayList.Add(sInputString .Substring(iStart,sInputString.Length

iStart)


返回oArrayLis

结束功能


我的测试用例如下


CS

价值

价值

价值

价值

结果

a,b,



" ; a,b,



''''',b,

''a



a,b,



aa,bb; c

a

bb; c






,b




,,



1




1

" ;",


1

" ",

[SPACE


1

" a,b

a,


1

" a,b",

a,


1

a,b",

a,

1

ab,

a


1

a< b,

a"


1

a" b",

a"


1

a"" b,
a""


2

a,b",


b



2

a,b","


b"



2

a," B:"" Hi,I''B B"","


B:我是B


2

a,b,


"


2

a,bc" d,


bc"


2

a,bc" d",


bc" d



2

a," bc" d,


" bc"


>
非常感谢

Wazi

You can use the OleDbCommand class to read a csv file

Tu-Thac
www.ongtech.co

----- MW wrote: ----

Dear Al

Does anyone have a regular expression to parse a comma delimited line wit
some fields optionally having string delimiters (text qualifiers
I am currently testing with this regular expression and it works in almos
all my test cases. I found this on the internet in a C# solution

,(?=([^\"]*"[^"]*")*(?![^"]*")

However in some of my test cases it fails and I am having difficult
interpreting it

The VB.NET function I used i

Public Function parseCSVLine(ByVal sInputString As String) As ArrayLis
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^"
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"
Dim iStart As Integer, m As Matc
Dim oArrayList As New ArrayList(

For Each m In r.Matches(sInputString
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart)
iStart = m.Index +
Nex
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length
iStart)

Return oArrayLis
End Functio

My test cases are as follows

CS
Value
Value
Value
Value
Result
a,b,



"a",b,



''a'',b,
''a


a , b ,



aa,bb;c
a
bb;c







,b



,,


1
,


1
"",


1
" ",
[SPACE

1
"a,b
a,

1
"a,b",
a,

1
" a , b ",
a ,

1
a b,
a

1
a"b,
a"

1
"a""b",
a"

1
a""b,
a""

2
a,b",

b


2
a,b"",

b"


2
a,"B: ""Hi, I''m B""",

B: "Hi, I''m B


2
a,"b,

"


2
a,bc"d,

bc"

2
a,bc"d",

bc"d


2
a,"bc"d,

"bc"


Many thanks
Wazi


这篇关于用于解析CSV行的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆