CSV正则表达式 [英] CSV Regular Expression
问题描述
我继承了一些使用正则表达式解析CSV格式的数据的代码。它不需要处理空字符串字段之前,但是要求已更改,以便空字符串字段是可能的。
I have inherited some code that uses regular expressions to parse CSV formatted data. It didn't need to cope with empty string fields before now, however the requirements have changed so that empty string fields are a possibility.
我已经更改了正则表达式this:
I have changed the regular expression from this:
new Regex("((?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
到此
new Regex("((?<field>[^\",\\r\\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
(即我已将+更改为*)
(i.e. I have changed the + to *)
问题是,我现在得到一个额外的空字段在末尾,例如ID,Name,Description返回我四个字段:ID,Name,Description和
The problem is that I am now getting an extra empty field at the end, e.g. "ID,Name,Description" returns me four fields: "ID", "Name", "Description" and ""
推荐答案
这一个:
var rx = new Regex("((?<=^|,)(?<field>)(?=,|$)|(?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
我将空白字段的处理移动到第三个或。现在,处理已经工作了(你不需要修改它,它是第二个
(?< field>))
您的代码块),因此您需要处理的是四种情况:
I move the handling of "blank" fields to a third "or". Now, the handling of ""
already worked (and you didn't need to modify it, it was the second (?<field>)
block of your code), so what you need to handle are four cases:
,
,Id
Id,
Id,,Name
应该这样做:
(?<=^|,)(?<field>)(?=,|$)
一个空字段必须在行的开头之前 ^
或,
必须为零长度((?< field>
捕获),并且必须后跟,
或行尾 $
。
An empty field must be preceeded by the beginning of the row ^
or by a ,
, must be of length zero (there isn't anything in the (?<field>)
capture) and must be followed by a ,
or by the end of the line $
.
这篇关于CSV正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!