正则表达式拆分CSV [英] Regex to split a CSV
问题描述
我知道这个(或类似的)已被问过很多次,但尝试了许多可能性,我没有能够找到一个工作100%的正则表达式。
我有一个CSV文件,我试图将它拆分成一个数组,但遇到两个问题:引用逗号和空元素。
CSV看起来像:
123,2.99,AMO024,标题,说明,更多信息,, 123987564
我试图使用的正则表达式是:
code> thisLine.split(/,(?=(?:[^ \] * \[^ \] * \)*(?![^ \] * \\))/)
唯一的问题是在我的输出数组中第5个元素123987564而不是空字符串。
描述
使用分割,我认为简单地执行匹配并处理所有找到的匹配将更容易。
此表达式将:
- 在逗号分隔符上分隔您的示例文本
- 将处理空值
- 修剪返回值的周围引号
正则表达式:(?:^ |,)(?= [^] | (?(1)[^] * | [^,] *))?(?=,| $)
示例
示例文本
123,2.99,AMO024,标题,说明,更多信息,, 123987564
ASP示例使用非java表达式
设置regEx =新的RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
sourcestring =您的源字符串
regEx.Pattern =(?:^ |,)(?= [^] | )???(?(1)[^] * | [^,] *)) sourcestring)
对于z = 0到Matches.Count-1
results = results& Matches(& z&)=& chr(34)& Server.HTMLEncode(Matches(z))& chr(34)& chr(13)
对于zz = 0到Matches(z).SubMatches.Count-1
results = results& Matches(& z&).SubMatches(& zz&)=& chr(34)& Server.HTMLEncode(Matches(z).SubMatches(zz))& chr(34)& chr(13)
next
results = Left(results,Len(results)-1)& chr(13)
next
Response.Write< pre> & results
使用非java表达式匹配
$ b
组0获取包含逗号的整个子字符串
组1获取引用的引用
组2获取的值不包括逗号
[0] [0] = 123
[0] [1] =
[0] [2] = 123
[1] [0] =,2.99
[1] [1] =
[1] [2] = 2.99
[2] [0] =,AMO024
[2] [1] =
[2] [2] = AMO024
[3] 0] =,标题
[3] [1] =
[3] [2] =标题
[4] [0]
[4] [1] =
[4] [2] =描述,更多信息
[5] [0] =,
[5] [1] =
[5] [2] =
[6] [0] =,123987564
[6] [2] = 123987564
I know this (or similar) has been asked many times but having tried out numerous possibilities I've not been able to find a a regex that works 100%.
I've got a CSV file and I'm trying to split it into an array, but encountering two problems: quoted commas and empty elements.
The CSV looks like:
123,2.99,AMO024,Title,"Description, more info",,123987564
The regex I've tried to use is:
thisLine.split(/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/)
The only problem is that in my output array the 5th element comes out as 123987564 and not an empty string.
Description
Instead of using a split, I think it would be easier to simply execute a match and process all the found matches.
This expression will:
- divide your sample text on the comma delimits
- will process empty values
- will ignore double quoted commas, providing double quotes are not nested
- trims the delimiting comma from the returned value
- trims surrounding quotes from the returned value
Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)
Example
Sample Text
123,2.99,AMO024,Title,"Description, more info",,123987564
ASP example using the non-java expression
Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
sourcestring = "your source string"
regEx.Pattern = "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)"
Set Matches = regEx.Execute(sourcestring)
For z = 0 to Matches.Count-1
results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)
For zz = 0 to Matches(z).SubMatches.Count-1
results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode(Matches(z).SubMatches(zz)) & chr(34) & chr(13)
next
results=Left(results,Len(results)-1) & chr(13)
next
Response.Write "<pre>" & results
Matches using the non-java expression
Group 0 gets the entire substring which includes the comma
Group 1 gets the quote if it's used
Group 2 gets the value not including the comma
[0][0] = 123
[0][1] =
[0][2] = 123
[1][0] = ,2.99
[1][1] =
[1][2] = 2.99
[2][0] = ,AMO024
[2][1] =
[2][2] = AMO024
[3][0] = ,Title
[3][1] =
[3][2] = Title
[4][0] = ,"Description, more info"
[4][1] = "
[4][2] = Description, more info
[5][0] = ,
[5][1] =
[5][2] =
[6][0] = ,123987564
[6][1] =
[6][2] = 123987564
这篇关于正则表达式拆分CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!