正则表达式拆分CSV [英] Regex to split a CSV

查看:194
本文介绍了正则表达式拆分CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个(或类似的)已被问过很多次,但尝试了许多可能性,我没有能够找到一个工作100%的正则表达式。



我有一个CSV文件,我试图将它拆分成一个数组,但遇到两个问题:引用逗号和空元素。



CSV看起来像:

  123,2.99,AMO024,标题,说明,更多信息,, 123987564 

我试图使用的正则表达式是:

 code> thisLine.split(/,(?=(?:[^ \] * \[^ \] * \)*(?![^ \] * \\))/)

唯一的问题是在我的输出数组中第5个元素123987564而不是空字符串。

解决方案

描述



使用分割,我认为简单地执行匹配并处理所有找到的匹配将更容易。



此表达式将:




  • 在逗号分隔符上分隔您的示例文本

  • 将处理空值


  • 修剪返回值的周围引号



正则表达式:(?:^ |,)(?= [^] | (?(1)[^] * | [^,] *))?(?=,| $)





示例



示例文本

  123,2.99,AMO024,标题,说明,更多信息,, 123987564 

ASP示例使用非java表达式

 设置regEx =新的RegExp 
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
sourcestring =您的源字符串
regEx.Pattern =(?:^ |,)(?= [^] | )???(?(1)[^] * | [^,] *)) sourcestring)
对于z = 0到Matches.Count-1
results = results& Matches(& z&)=& chr(34)& Server.HTMLEncode(Matches(z))& chr(34)& chr(13)
对于zz = 0到Matches(z).SubMatches.Count-1
results = results& Matches(& z&).SubMatches(& zz&)=& chr(34)& Server.HTMLEncode(Matches(z).SubMatches(zz))& chr(34)& chr(13)
next
results = Left(results,Len(results)-1)& chr(13)
next
Response.Write< pre> & results

使用非java表达式匹配
$ b

组0获取包含逗号的整个子字符串

组1获取引用的引用

组2获取的值不包括逗号

  [0] [0] = 123 
[0] [1] =
[0] [2] = 123

[1] [0] =,2.99
[1] [1] =
[1] [2] = 2.99

[2] [0] =,AMO024
[2] [1] =
[2] [2] = AMO024

[3] 0] =,标题
[3] [1] =
[3] [2] =标题

[4] [0]
[4] [1] =
[4] [2] =描述,更多信息

[5] [0] =,
[5] [1] =
[5] [2] =

[6] [0] =,123987564
[6] [2] = 123987564


I know this (or similar) has been asked many times but having tried out numerous possibilities I've not been able to find a a regex that works 100%.

I've got a CSV file and I'm trying to split it into an array, but encountering two problems: quoted commas and empty elements.

The CSV looks like:

123,2.99,AMO024,Title,"Description, more info",,123987564

The regex I've tried to use is:

thisLine.split(/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/)

The only problem is that in my output array the 5th element comes out as 123987564 and not an empty string.

解决方案

Description

Instead of using a split, I think it would be easier to simply execute a match and process all the found matches.

This expression will:

  • divide your sample text on the comma delimits
  • will process empty values
  • will ignore double quoted commas, providing double quotes are not nested
  • trims the delimiting comma from the returned value
  • trims surrounding quotes from the returned value

Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)

Example

Sample Text

123,2.99,AMO024,Title,"Description, more info",,123987564

ASP example using the non-java expression

Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
sourcestring = "your source string"
regEx.Pattern = "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)"
Set Matches = regEx.Execute(sourcestring)
  For z = 0 to Matches.Count-1
    results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)
    For zz = 0 to Matches(z).SubMatches.Count-1
      results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode(Matches(z).SubMatches(zz)) & chr(34) & chr(13)
    next
    results=Left(results,Len(results)-1) & chr(13)
  next
Response.Write "<pre>" & results

Matches using the non-java expression

Group 0 gets the entire substring which includes the comma
Group 1 gets the quote if it's used
Group 2 gets the value not including the comma

[0][0] = 123
[0][1] = 
[0][2] = 123

[1][0] = ,2.99
[1][1] = 
[1][2] = 2.99

[2][0] = ,AMO024
[2][1] = 
[2][2] = AMO024

[3][0] = ,Title
[3][1] = 
[3][2] = Title

[4][0] = ,"Description, more info"
[4][1] = "
[4][2] = Description, more info

[5][0] = ,
[5][1] = 
[5][2] = 

[6][0] = ,123987564
[6][1] = 
[6][2] = 123987564

这篇关于正则表达式拆分CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆