PCRE模式如何检测回文? [英] How does this PCRE pattern detect palindromes?

查看:96
本文介绍了PCRE模式如何检测回文?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该问题是对PCRE模式中先行,嵌套引用和条件的使用以匹配所有回文(包括那些不能由递归模式提供的回文)的教育性演示. PCRE手册页.

This question is an educational demonstration of the usage of lookahead, nested reference, and conditionals in a PCRE pattern to match ALL palindromes, including the ones that can't be matched by the recursive pattern given in the PCRE man page.

在PHP代码段中检查此PCRE模式:

Examine this PCRE pattern in PHP snippet:

$palindrome = '/(?x)
^
  (?:
      (.) (?=
              .*
              (
                \1
                (?(2) \2 | )
              )
              $
          )
  )*
  .?
  \2?
$


/';

这种模式似乎可以检测回文,如本测试案例所示(另请参见ideone.com ) :

This pattern seems to detect palindromes, as seen in this test cases (see also on ideone.com):

$tests = array(
  # palindromes
  '',
  'a',
  'aa',
  'aaa',
  'aba',
  'aaaa',
  'abba',
  'aaaaa',
  'abcba',
  'ababa',

  # non-palindromes
  'aab',
  'abab',
  'xyz',
);

foreach ($tests as $test) {
  echo sprintf("%s '%s'\n", preg_match($palindrome, $test), $test);  
}

那么这个模式如何工作?

So how does this pattern work?

此模式使用嵌套引用,这是有条件的).

This pattern uses a nested reference, which is a similar technique used in How does this Java regex detect palindromes?, but unlike that Java pattern, there's no lookbehind (but it does use a conditional).

此外,请注意,PCRE 手册页提供了一种递归模式来匹配某些回文:

Also, note that the PCRE man page presents a recursive pattern to match some palindromes:

# the recursive pattern to detect some palindromes from PCRE man page
^(?:((.)(?1)\2|)|((.)(?3)\4|.))$

手册页警告该递归模式无法检测到所有回文(请参阅: 也在ideone.com上),但是此问题中呈现的嵌套参考/正向超前模式可以实现.

The man page warns that this recursive pattern can NOT detect all palindromes (see: Why will this recursive regex only match when a character repeats 2n - 1 times? and also on ideone.com), but the nested reference/positive lookahead pattern presented in this question can.

推荐答案

让我们尝试通过构造正则表达式来理解它.首先,回文必须在相反的方向上以相同的字符序列开始和结束:

Let's try to understand the regex by constructing it. Firstly, a palindrome must start and end with the same sequence of character in the opposite direction:

^(.)(.)(.) ... \3\2\1$

我们要重写此代码,以使...仅跟随有限长度的模式,以便我们有可能将其转换为*.提前做到这一点是可能的:

we want to rewrite this such that the ... is only followed by a finite length of patterns, so that it could be possible for us to convert it into a *. This is possible with a lookahead:

^(.)(?=.*\1$)
 (.)(?=.*\2\1$)
 (.)(?=.*\3\2\1$) ...

但是仍然有一些不常见的部分.如果我们可以记录"先前捕获的组怎么办?如果可能的话,我们可以将其重写为:

but there are still uncommon parts. What if we can "record" the previously captured groups? If it is possible we could rewrite it as:

^(.)(?=.*(?<record>\1\k<record>)$)   # \1     = \1 + (empty)
 (.)(?=.*(?<record>\2\k<record>)$)   # \2\1   = \2 + \1
 (.)(?=.*(?<record>\3\k<record>)$)   # \3\2\1 = \3 + \2\1
 ...

可以转换为

^(?: 
    (.)(?=.*(\1\2)$)
 )*

几乎很好,除了\2(记录的捕获)最初不为空.它将无法匹配任何内容.如果记录的捕获不存在,我们需要将其匹配为空.这就是条件表达式的蔓延方式.

Almost good, except that \2 (the recorded capture) is not empty initially. It will just fail to match anything. We need it to match empty if the recorded capture doesn't exist. This is how the conditional expression creeps in.

(?(2)\2|)   # matches \2 if it exist, empty otherwise.

所以我们的表达式变为

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*

现在,它与回文的上半部相匹配.下半场怎么样?好吧,在匹配了第一部分后,记录的捕获\2将包含第二部分.所以让我们把它放在最后.

Now it matches the first half of the palindrome. How about the 2nd half? Well, after the 1st half is matched, the recorded capture \2 will contain the 2nd half. So let's just put it in the end.

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*\2$

我们也想照顾奇长回文.在第一半和第二半之间会有一个自由字符.

We want to take care of odd-length palindrome as well. There would be a free character between the 1st and 2nd half.

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*.?\2$

在一种情况下(只有1个字符时),除了 以外,这种方法都很好.再次是由于\2没有匹配项.所以

This works good except in one case — when there is only 1 character. This is again due to \2 matches nothing. So

^(?: 
    (.)(?=.*(\1(?(2)\2|))$)
 )*.?\2?$
#      ^ since \2 must be at the end in the look-ahead anyway.

这篇关于PCRE模式如何检测回文?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆