一个匹配没有被引号包围的逗号的正则表达式 [英] A regex to match a comma that isn't surrounded by quotes

查看:30
本文介绍了一个匹配没有被引号包围的逗号的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 Clojure,所以这是在 Java 正则表达式的上下文中.

I'm using Clojure, so this is in the context of Java regexes.

这是一个示例字符串:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}

重要的位是每个字符串后面的逗号.我希望能够用 Java 的 replaceAll 方法用换行符替换它们.一个匹配任何逗号但没有被引号括起来的正则表达式就可以了.

The important bits are the commas after each string. I'd like to be able to replace them with newline characters with Java's replaceAll method. A regex that will match any comma that is not surrounded by quotes will do.

如果我觉得不太好,请询问,我很乐意澄清任何事情.

If I'm not coming across well, please ask and I'll be happily to clarify anything.

对于标题中的混乱感到抱歉.好久没睡了.

String: {:a "ab, cd efg",} <-- 在这个例子中,末尾的逗号会被匹配,但引号内的不会.

String: {:a "ab, cd efg",} <-- In this example, the comma at the end would be matched, but the ones inside the quote would not.

String: {:a 3, :b 3,} <-- 每个逗号都匹配.

String: {:a 3, :b 3,} <-- Every single comma matches.

String {:a "abcd,efg" :b "abcedg,e"} <-- 每个逗号都不匹配.

String {:a "abcd,efg" :b "abcedg,e"} <-- Every single comma doesn't match.

推荐答案

正则表达式:

,s*(?=([^"]*"[^"]*")*[^"]*$)

匹配:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}
                ^                  ^
                ^                  ^

和:

{:a "ab, cd efg",}
                ^
                ^

and 中的逗号不匹配:

and does not match a comma in:

{:a "abcd,efg" :b "abcedg,e"}

但是当可以出现转义引号时,如下所示:

But when escaped quotes can appear, like so:

{:a "ab," cd efg",} // only the last comma should match

那么正则表达式解决方案将不起作用.

then a regex solution won't work.

正则表达式的简要说明:

A brief explanation of the regex:

,            # match the character ','
s*          # match a whitespace character: [ 	
x0Bf
] and repeat it zero or more times
(?=          # start positive look ahead
  (          #   start capture group 1
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
  )*         #   end capture group 1 and repeat it zero or more times
  [^"]*      #   match any character other than '"' and repeat it zero or more times
  $          #   match the end of the input
)            # end positive look ahead

换句话说:匹配任何前面有零个或偶数个引号的逗号(直到字符串结尾).

In other words: match any comma that has zero, or an even number of quotes ahead of it (until the end of the string).

这篇关于一个匹配没有被引号包围的逗号的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆