由于重复捕获组而不是捕获重复组,正则表达式不匹配 [英] regex not matching due to repeated capturing group rather than capturing a repeated group

查看:50
本文介绍了由于重复捕获组而不是捕获重复组,正则表达式不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下正则表达式:

/(?:[\[\{]*)(?:([A-G\-][^A-G\]\}]*)+)(?:[\]\}]*)/

使用以下表达式:

{A''BsCb}

我希望有 3 个匹配的结果

I expect 3 matched results

A''
Bs
Cb

但在 https://regex101.com/ 测试只给了我最后一场比赛 Cb,并告诉我重复捕获组只会捕获最后一次迭代,在重复组周围放置一个捕获组.

but testing at https://regex101.com/ only gives me the last match Cb, and tells me that a repeated capturing group will only capture the last iteration, put a capturing group around the repeated group.

我以为这就是我所做的!我想我已经理解了这里描述的问题 http://www.regular-expressions.info/捕获所有.html因此,my + 外的括号与内部的捕获组.

I thought that was what I had done! I thought I'd understood the problem as described here http://www.regular-expressions.info/captureall.html Hence the brackets outside my + with the capturing group inside.

但要么为时已晚,要么我需要一个不会在提到正则表达式时头脑崩溃的人来告诉我我哪里出错了.

But either it's getting too late or I need someone who's head doesn't implode at the mention of regexp to show me where I've gone wrong.

推荐答案

您正在尝试匹配重复的捕获组并获取捕获.使用 PHP PCRE 正则表达式是不可能的.

You are trying to match repeated capturing groups and get the captures. It is not possible with PHP PCRE regex.

您可以做的是确保提取所有 {...}/[...] 子字符串,从括号中修剪它们并使用简单的 [AG-][^AG]* 正则表达式,或者添加一个 \G 操作符,让你的正则表达式无法维护,但可以像原来的一样工作.

What you can do is to make sure you either extract all {...} / [...] substrings, trim them from the brackets and use a simple [A-G-][^A-G]* regex, or add a \G operator and make your regex unmaintainable but working as the original one.

解决方案 1 是

/(?:[[{]*|(?!\A)\G)\K[A-G-][^A-G\]}]*/

查看正则表达式演示.注意:此正则表达式不检查结束的 ]},但可以通过正向预测添加.

See the regex demo. Note: this regex does not check for the closing ] or }, but it can be added with a positive lookahead.

  • (?:[[{]*|(?!\A)\G) - 匹配 [{,零或多次出现,或上一次成功匹配的结束位置
  • \K - 省略目前匹配的文本
  • [A-G-] - 从 AG 和一个 -
  • 的字母
  • [^AG\]}]*- 零个或多个字符,除了 AG]}.
  • (?:[[{]*|(?!\A)\G) - matches a [ or {, zero or more occurreces, or the end location of the previous successful match
  • \K - omits the text matched so far
  • [A-G-] - letters from A to G and a -
  • [^A-G\]}]*- zero or more chars other than A to G and other than ] and }.

参见 PHP 演示.

解决方案 2 是

$re = '/(?|{([^}]*)}|\[([^]]*)])/'; 
$str = "{A''BsCb}"; 
$res = array();
preg_match_all($re, $str, $m);
foreach ($m[1] as $match) {
    preg_match_all('~[A-G-][^A-G]*~', $match, $tmp);
    $res = array_merge($tmp, $res);
}
print_r($res);

查看 PHP 演示

(?|{([^}]*)}|\[([^]]*)]) 正则表达式只匹配字符串,如 {...}[...](但不是 {...][...})并捕获括号之间的内容进入组 1(因为分支重置组 (?|...) 重置每个分支中的组 ID).然后,我们所需要的就是使用更连贯的 '~[A-G-][^A-G]*~' 正则表达式来获取我们需要的内容.

The (?|{([^}]*)}|\[([^]]*)]) regex just matches strings like {...} or [...] (but not {...] or [...}) and captures the contents between brackets into Group 1 (since the branch reset group (?|...) resets the group IDs in each branch). Then, all we need is to grab what we need with a more coherent '~[A-G-][^A-G]*~' regex.

这篇关于由于重复捕获组而不是捕获重复组,正则表达式不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆