正则表达式匹配可选组,由任何字符分组包围 [英] Regex Match Optional Group Surrounded by Any Character Grouping

查看:32
本文介绍了正则表达式匹配可选组,由任何字符分组包围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试匹配一个可选的组,该组可以在任意数量的字符之前和之后.整个模式还具有必需的开始和结束匹配,但是中间匹配是可选的.

我从此开始,它在需要中间组时有效:

  string text = @等等等等,这是一个实验等等.该测试不起作用.字符串requiredBlah = @(foo).*?(blah).*?(bar)";匹配m = Regex.Match(text,requiredBlah); 

结果为"foo","blah","bar".

但是,当中间组是可选的时,我猜想正则表达式引擎的机制倾向于不匹配中间组.

  string optionalBlah = @(foo).*?(blah)?.*?(bar)"; 

结果:"foo",",bar".

最简单的解决方案是将懒惰的.*?模式和(blah)捕获组封装到一个可选的非捕获组中(即(?:.*?(blah))?)来使正则表达式引擎尝试至少一次匹配组模式(= greedily ):

 (foo)(?:.*?(blah))?.*?(bar) 

请参见

另一种解决方案是使用先行限制点匹配(使用所谓的

I am trying to match an optional group that can be preceded and followed by any number of characters. The entire pattern also has a required beginning and ending match, but the middle match is optional.

I started with this, which works when the middle group is required:

string text = @"blah blah foo This is a test blah.  the test does not work. bar";
string  requiredBlah = @"(foo).*?(blah).*?(bar)";
Match m = Regex.Match(text, requiredBlah);

Results are "foo", "blah", "bar".

However, when the middle group is optional, I guess the mechanisms of the regex engine prefer to not match the middle group.

string optionalBlah = @"(foo).*?(blah)?.*?(bar)";

Results: "foo", "", bar".

This SO answer says that I can capture the middle optional group if there are delimiters before and after the optional group, but that is not my situation.

I could skip the optional group entirely and use string.Contains("blah"), but I'm wondering if there is a purely regex solution to this kind of problem. My goal is to design regular expressions that match a generic pattern, with multiple optional parts, so that I can determine which parts of the pattern are missing.

解决方案

The problem is quite common. The second dot matching pattern grabs the blah and does not have to yield it back to (blah)? as it is optional (see this demo where I added capture groups to the original regex to show what group matches blah).

The simplest solution is to enclose the lazy .*? pattern and the (blah) capturing group into an optional non-capturing group (i.e. (?:.*?(blah))?) to make the regex engine try matching the group pattern at least once (= greedily):

(foo)(?:.*?(blah))?.*?(bar)

See the regex demo. Here, (foo) captures foo in Group 1, (?:.*?(blah))? matches an optional sequence of 0 or more chars other than line break chars, as few as possible and then captures blah into Group 2, and then .*?(bar) matches 0 or more chars other than line break chars, as few as possible and then captures bar into Group 3:

Another solution is to restrict the dot matching with a lookahead (using a so called tempered greedy token):

(foo)(?:(?!blah).)*(blah)?.*?(bar)
     ^^^^^^^^^^^^^^

See the regex demo. The (?:(?!blah).)* pattern matches any text up to the first blah. (If it is at the end of the pattern, it may also match up to the end of string.)

这篇关于正则表达式匹配可选组,由任何字符分组包围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆