使用正则表达式(PCRE)匹配a ^ n b ^ n c ^ n(例如"aaabbbccc") [英] Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)
问题描述
众所周知,现代正则表达式实现(最著名的是PCRE)与无上下文语法 {a n b n ; n> 0}(例如aaabbb
)使用此正则表达式(演示):
It is a well known fact that modern regular expression implementations (most notably PCRE) have little in common with the original notion of regular grammars. For example you can parse the classical example of a context-free grammar {anbn; n>0} (e.g. aaabbb
) using this regex (demo):
~^(a(?1)?b)$~
我的问题是:你能走多远?是否还可以解析上下文相关语法 {a n b n c n ; n> 0}(例如,aaabbbccc
)使用PCRE吗?
My question is: How far can you go? Is it also possible to parse the context-sensitive grammar {anbncn;n>0} (e.g. aaabbbccc
) using PCRE?
推荐答案
受到NullUserExceptions答案的启发(他已经删除了,因为它在一种情况下失败了),我想我自己也找到了解决方案:
Inspired by NullUserExceptions answer (which he already deleted as it failed for one case) I think I have found a solution myself:
$regex = '~^
(?=(a(?-1)?b)c)
a+(b(?-1)?c)
$~x';
var_dump(preg_match($regex, 'aabbcc')); // 1
var_dump(preg_match($regex, 'aaabbbccc')); // 1
var_dump(preg_match($regex, 'aaabbbcc')); // 0
var_dump(preg_match($regex, 'aaaccc')); // 0
var_dump(preg_match($regex, 'aabcc')); // 0
var_dump(preg_match($regex, 'abbcc')); // 0
亲自尝试: http://codepad.viper-7.com/1erq9v
如果考虑不带正向超前断言((?=...)
部分)的正则表达式,则具有以下条件:
If you consider the regex without the positive lookahead assertion (the (?=...)
part), you have this:
~^a+(b(?-1)?c)$~
这无非是检查是否有任意数量的a
,然后是相等数量的b
s和c
s.
This does nothing more than check that there's an arbitrary number of a
s, followed by an equal number of b
s and c
s.
这还不能满足我们的语法要求,因为a
的数量也必须相同.我们可以通过检查a
s的数量等于b
s的数量来确保.这就是前瞻断言中的表达式的作用:(a(?-1)?b)c
. c
是必需的,因此我们不仅要匹配b
的一部分.
This doesn't yet satisfy our grammar, because the number of a
s must be the same, too. We can ensure that by checking that the number of a
s equals the number of b
s. And this is what the expression in the lookahead assertion does: (a(?-1)?b)c
. The c
is necessary so we don't only match a part of the b
s.
我认为这令人印象深刻,这表明现代正则表达式不仅能够解析非常规语法,而且甚至可以解析非上下文无关的语法.希望这将消除您不能用正则表达式来执行X,因为X不规则"的无休止的模仿.
I think this impressively shows that modern regex is not only capable of parsing non-regular grammars, but can even parse non-context-free grammars. Hopefully this will lay to rest the endless parroting of "you can't do X with regex because X isn't regular"
这篇关于使用正则表达式(PCRE)匹配a ^ n b ^ n c ^ n(例如"aaabbbccc")的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!