正则表达式拆分 CSV [英] Regex to split a CSV

查看:47
本文介绍了正则表达式拆分 CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个(或类似的)已经被问过很多次了,但是在尝试了很多可能性之后,我一直没能找到一个 100% 有效的正则表达式.

我有一个 CSV 文件,我试图将它拆分成一个数组,但遇到两个问题:引号引起来的逗号和空元素.

CSV 看起来像:

123,2.99,AMO024,Title,"描述,更多信息",,123987564

我尝试使用的正则表达式是:

thisLine.split(/,(?=(?:[^"]*"[^"]*")*(?![^"]*"))/)

唯一的问题是在我的输出数组中,第 5 个元素显示为 123987564 而不是空字符串.

解决方案

说明

我认为简单地执行匹配并处理所有找到的匹配会更容易,而不是使用拆分.

这个表达式将:

  • 用逗号分隔您的示例文本
  • 将处理空值
  • 将忽略双引号逗号,前提是双引号不嵌套
  • 从返回值中去除分隔逗号
  • 从返回值中修剪周围的引号

正则表达式:(?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)

示例

示例文本

123,2.99,AMO024,Title,"描述,更多信息",,123987564

使用非 java 表达式的 ASP 示例

设置正则表达式 = 新正则表达式regEx.Global = 真regEx.IgnoreCase = TrueregEx.MultiLine = Truesourcestring = "你的源字符串"regEx.Pattern = "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)"设置匹配 = regEx.Execute(sourcestring)对于 z = 0 到 Matches.Count-1结果 = 结果 &"匹配(" & z & ") = " &chr(34) &Server.HTMLEncode(Matches(z)) &chr(34) &铬(13)对于 zz = 0 到 Matches(z).SubMatches.Count-1结果 = 结果 &"Matches(" & z & ").SubMatches(" & zz & ") = " &chr(34) &Server.HTMLEncode(Matches(z).SubMatches(zz)) &chr(34) &铬(13)下一个结果=左(结果,Len(结果)-1)&铬(13)下一个Response.Write "

"&结果

使用非 java 表达式匹配

第 0 组获取包含逗号的整个子字符串
如果使用过,第 1 组将获得报价
第 2 组获取不包括逗号的值

[0][0] = 123[0][1] =[0][2] = 123[1][0] = ,2.99[1][1] =[1][2] = 2.99[2][0] = ,AMO024[2][1] =[2][2] = AMO024[3][0] = ,标题[3][1] =[3][2] = 标题[4][0] = ,"说明,更多信息"[4][1] = "[4][2] = 描述,更多信息[5][0] = ,[5][1] =[5][2] =[6][0] = ,123987564[6][1] =[6][2] = 123987564

I know this (or similar) has been asked many times but having tried out numerous possibilities I've not been able to find a a regex that works 100%.

I've got a CSV file and I'm trying to split it into an array, but encountering two problems: quoted commas and empty elements.

The CSV looks like:

123,2.99,AMO024,Title,"Description, more info",,123987564

The regex I've tried to use is:

thisLine.split(/,(?=(?:[^"]*"[^"]*")*(?![^"]*"))/)

The only problem is that in my output array the 5th element comes out as 123987564 and not an empty string.

解决方案

Description

Instead of using a split, I think it would be easier to simply execute a match and process all the found matches.

This expression will:

  • divide your sample text on the comma delimits
  • will process empty values
  • will ignore double quoted commas, providing double quotes are not nested
  • trims the delimiting comma from the returned value
  • trims surrounding quotes from the returned value

Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)

Example

Sample Text

123,2.99,AMO024,Title,"Description, more info",,123987564

ASP example using the non-java expression

Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
sourcestring = "your source string"
regEx.Pattern = "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)"
Set Matches = regEx.Execute(sourcestring)
  For z = 0 to Matches.Count-1
    results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)
    For zz = 0 to Matches(z).SubMatches.Count-1
      results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode(Matches(z).SubMatches(zz)) & chr(34) & chr(13)
    next
    results=Left(results,Len(results)-1) & chr(13)
  next
Response.Write "<pre>" & results

Matches using the non-java expression

Group 0 gets the entire substring which includes the comma
Group 1 gets the quote if it's used
Group 2 gets the value not including the comma

[0][0] = 123
[0][1] = 
[0][2] = 123

[1][0] = ,2.99
[1][1] = 
[1][2] = 2.99

[2][0] = ,AMO024
[2][1] = 
[2][2] = AMO024

[3][0] = ,Title
[3][1] = 
[3][2] = Title

[4][0] = ,"Description, more info"
[4][1] = "
[4][2] = Description, more info

[5][0] = ,
[5][1] = 
[5][2] = 

[6][0] = ,123987564
[6][1] = 
[6][2] = 123987564

这篇关于正则表达式拆分 CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆