使用命名模式子例程的PCRE正则表达式 [英] PCRE regular expressions using named pattern subroutines

查看:120
本文介绍了使用命名模式子例程的PCRE正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在PHP的PCRE中尝试使用命名的subpattern/'subroutine'正则表达式功能,希望有人可以解释以下奇怪的输出:

I am experimenting with the named subpattern/'subroutine' regex features in PHP's PCRE and I'm hoping someone can explain the following strange output:

$re = "/
(?(DEFINE)
    (?<a> a )
)

^(?&a)$

/x";

var_dump(preg_match($re, 'a', $match)); // (int) 1 as expected
var_dump($match); // Array( [0] => 'a' ) <-- Why?

我不明白为什么命名的组"a"不在结果中(内容为"a").将preg_match更改为preg_match_all会将"a"和"1"放入匹配数据中,但都只包含一个空字符串.

I can't understand why the named group "a" is not in the result (with the contents "a"). Changing preg_match to preg_match_all puts "a" and "1" in the match data but both contain only an empty string.

我真的很喜欢以这种方式编写正则表达式的想法,因为您可以使它们异常强大,同时又使它们易于维护(请参阅

I really like the idea of writing regular expressions this way, as you can make them incredibly powerful whilst keeping them very maintainable (see this answer for a good example of this), however if the subpatterns are not available in the match data then it's not much use really.

我在这里想念什么吗?还是应该哀悼本来可以继续的事?

Am I missing something here or should I just mourn what could have been and move on?

推荐答案

完全可以理解,这些子模式不会捕获一个组-它们的主要目的是要多次使用,因此您无法真正捕获所有这些子模式.此外,如果默认设置是捕获所有子模式,则不会提供 not 选项来捕获您不希望使用的组-这不是最佳的默认行为.相反是微不足道的-您可以通过在(?&a)语句周围添加另一个组来捕获.
我在
PCRE.org 上找不到对此的引用.最接近的是这一点,这是相关的,因为您不直接匹配(?<a>...)(尽管您可能期望有一个空的组):

It makes perfect sense these subpatterns would not capture a group - their main purpose it to be used more than once, so you can't really capture them all. In addition, if the default was to capture all subpatterns it wouldn't give you an option not to capture a group where you don't want it - not the best default behavior. The opposite is trivial - you can capture by adding another group around the (?&a) statement.
I couldn't find a reference to this on PCRE.org. The closest is this, which is relevant because you don't match (?<a>...) directly (though you might expect an empty group):

任何捕获括号的 在子例程调用期间设置为恢复为其先前的值 之后.

Any capturing parentheses that are set during the subroutine call revert to their previous values afterwards.

Perl手册(相关部分已突出显示)上,这一点更加清楚:

It is clearer on the Perl manual (relevant part highlighted):

如何使用此示例如下:

An example of how this might be used is as follows:

/(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>....)
(?<ADRESS_PAT>....)
)/x

请注意,在递归内匹配的捕获缓冲区在递归返回后将无法访问,因此捕获缓冲区的额外层是必需的.

Note that capture buffers matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing buffers is necessary.

这篇关于使用命名模式子例程的PCRE正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆