如何使用正则表达式正确解析逗号分隔的行 [英] How to parse comma separated line correctly with regex
问题描述
试图用正则表达式解析逗号分隔的行,但结果不一致:
Trying to parse comma separated line with regex, but getting inconsistent results:
正则表达式:([[^,] *),?
实际值在匹配组1中(不包括逗号).
The actual value is in match group 1 (excludes the comma).
预期结果:
a,,b -> 3 matches
a,,b, -> 4 matches
a,,,b -> 4 matches
匹配数是逗号数+ 1.
The number of matches is the number of commas + 1.
问题在于即使没有逗号,正则表达式也会在末尾匹配,因此获取:
The problem is the regex matches at the end even if there is no comma, so getting:
a,,b -> 4 matches
a,,b, -> 4 matches
即使这些行具有不同数量的值,它们都返回4个匹配项.
Both return 4 matches, even though the lines have a different number of values.
是否可以修正正则表达式,使匹配数为值的数量(逗号+ 1),而无需在代码中更正结果?
Is it possible to fix the regex so the number of matches is the number of values (commas + 1) without correcting the results in code?
推荐答案
简介
您的正则表达式似乎在某些位置不匹配任何字符,并且只是一种断言(断言您的组中没有匹配的字符,并且后面没有逗号
,这很完美)根据您的正则表达式有效).
Brief
It seems your regex is not matching any characters in some locations and working as a sort of assertion (asserting that no character is matched in your group and not followed by a comma ,
, which is perfectly valid according to your regex).
此答案是一个 fix ,它允许您匹配1个或多个任意非逗号,
字符,或者断言零宽度匹配的断言-同时声明该位置之前是逗号
.
This answer is a fix that allows you to match 1 or more of any non-comma ,
character, or an assertion for a zero-width match all-the-while asserting the position as being preceded by a comma ,
.
执行此操作的最佳方法是使用字符串函数在,
上拆分字符串,但是此方法也可以使用.
The best way to go about this would be to split the string on ,
using a string function, but this method also works.
([^,\v]+|(?<=,))(?=,|$)
说明
-
([[^,\ v] + |(?< =,))
将以下任意一项捕获到捕获组1中-
[^,\ v] +
匹配集合,\ v
中不存在的任何一个或多个字符.该字面上匹配逗号,
或垂直空格字符\ v
(例如换行符) -
(?< =,)
匹配上一个标记(使用正向后引号)为逗号的位置,
([^,\v]+|(?<=,))
Capture either of the following into capture group 1[^,\v]+
Match one or more of any character not present in the set,\v
. This matches the comma,
literally, or vertical whitespace characters\v
(such as newline characters)(?<=,)
Match the position where the previous token (using positive lookbehind) is a comma,
这篇关于如何使用正则表达式正确解析逗号分隔的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
Explanation
-