匹配不被引号包围的逗号的正则表达式 [英] A regex to match a comma that isn't surrounded by quotes
问题描述
我使用Clojure,所以这是在Java正则表达式的上下文中。
这里是一个示例字符串:
{:aab,cd,efg,:bab,def,egf,,:cConjecture}
重要的位是每个字符串后的逗号。我想能够用换行符替换为Java的replaceAll方法。符合不包括引号的任何逗号的正则表达式将会执行。
如果我没有遇到问题,我很乐意澄清什么。
编辑:对标题的混乱感到遗憾。我还没有醒得很久。
字符串: {:aab,cd efg,}
< - 在这个例子中,最后的逗号将被匹配,但是引号内的逗号不会匹配。
String: {:a 3,:b 3,}
< - 每个逗号都匹配。
String {:aabcd,efg:babcedg,e}
< - 每个逗号都不匹配。
正则表达式:
,\s *(? ] *[^] *)* [^] * $)
/ p>
{:aab,cd,efg,:bab,def,egf,,:cConjecture }
^ ^
^ ^
和:
{:aab,cd efg,}
^
^
且不符合逗号:
{:aabcd,efg:babcedg,e}
如下所示:
{:aab,\cd efg,} //只有最后一个逗号匹配
那么正则表达式解决方案将不起作用。
正则表达式的简要说明:
,#匹配字符','
\s *#匹配一个空格字符:[\t\\\
\x0B\f\r],并重复零或多次
(?=#开始向前看
(#start捕获组1
[^] *#匹配除之外的任何字符并重复零或多次
#匹配字符'''
[^] *匹配任何字符而不是'并重复零或多次
#匹配字符''
)*#结束捕获组1并重复零或多次
[ ^] *#匹配除之外的任何字符并重复零或多次
$#匹配输入结束
)#end正向前看
换句话说:匹配任何在其前面有零或引号为偶数的逗号(直到字符串结尾)。
I'm using Clojure, so this is in the context of Java regexes.
Here is an example string:
{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}
The important bits are the commas after each string. I'd like to be able to replace them with newline characters with Java's replaceAll method. A regex that will match any comma that is not surrounded by quotes will do.
If I'm not coming across well, please ask and I'll be happily to clarify anything.
edit: sorry for the confusion in the title. I haven't been awake very long.
String:
{:a "ab, cd efg",}
<-- In this example, the comma at the end would be matched, but the ones inside the quote would not.String:
{:a 3, :b 3,}
<-- Every single comma matches.String
{:a "abcd,efg" :b "abcedg,e"}
<-- Every single comma doesn't match.解决方案The regex:
,\s*(?=([^"]*"[^"]*")*[^"]*$)
Matches:
{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"} ^ ^ ^ ^
and:
{:a "ab, cd efg",} ^ ^
and does not match a comma in:
{:a "abcd,efg" :b "abcedg,e"}
But when escaped quotes can appear, like so:
{:a "ab,\" cd efg",} // only the last comma should match
then a regex solution won't work.
A brief explanation of the regex:
, # match the character ',' \s* # match a whitespace character: [ \t\n\x0B\f\r] and repeat it zero or more times (?= # start positive look ahead ( # start capture group 1 [^"]* # match any character other than '"' and repeat it zero or more times " # match the character '"' [^"]* # match any character other than '"' and repeat it zero or more times " # match the character '"' )* # end capture group 1 and repeat it zero or more times [^"]* # match any character other than '"' and repeat it zero or more times $ # match the end of the input ) # end positive look ahead
In other words: match any comma that has zero, or an even number of quotes ahead of it (until the end of the string).
这篇关于匹配不被引号包围的逗号的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!