匹配 xa?b?c? 的正则表达式但不是单独的 x [英] Regex that matches xa?b?c? but not x alone

查看:27
本文介绍了匹配 xa?b?c? 的正则表达式但不是单独的 x的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个与 xa?b?c 匹配的正则表达式?但不是 x.实际上,'x'、'a'、'b' 和 'c' 不是单个字符,它们是中等复杂的子表达式,所以我试图避免像 x(abc|ab|ac|bc|a|b|c).是否有一种简单的方法可以在正则表达式中匹配至少 a、b 和 c 中的一个",或者我运气不好?

I'm trying to write a regex that matches xa?b?c? but not x. In reality, 'x', 'a', 'b', and 'c' are not single characters, they are moderately complex sub-expressions, so I'm trying to avoid something like x(abc|ab|ac|bc|a|b|c). Is there a simple way to match "at least one of a, b, and c, in that order" in a regex, or am I out of luck?

推荐答案

这是最短的版本:

(a)?(b)?(c)?(?(1)|(?(2)|(?(3)|(*FAIL))))

如果您需要在一个单独的小组中继续比赛,请写下:

If you need to keep around the match in a separate group, write this:

((a)?(b)?(c)?)(?(2)|(?(3)|(?(4)|(*FAIL))))

但是在 abc 包含捕获组的情况下,这不是很健壮.所以改为这样写:

But that isn’t very robust in case a, b, or c contain capture groups. So instead write this:

(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL))))

如果整场比赛你需要一个小组,那么写这个:

And if you need a group for the whole match, then write this:

(?<M>(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL)))))

如果你像我一样喜欢多字母标识符,并且认为这种事情在没有 /x 模式的情况下很疯狂,那么写:

And if like me you prefer multi-lettered identifiers and also think this sort of thing is insane without being in /x mode, write this:

(?x)
(?<Whole_Match>
    (?<Group_A> a) ?
    (?<Group_B> b) ?  
    (?<Group_C> c) ?

    (?(<Group_A>)           # Succeed 
      | (?(<Group_B>)       # Succeed
          | (?(<Group_C>)   # Succeed
              |             (*FAIL)
            )
        )
    )
 )

这里是完整的测试程序,以证明这些都有效:

And here is the full testing program to prove that those all work:

#!/usr/bin/perl
use 5.010_000;

my @pats = (
    qr/(a)?(b)?(c)?(?(1)|(?(2)|(?(3)|(*FAIL))))/,
    qr/((a)?(b)?(c)?)(?(2)|(?(3)|(?(4)|(*FAIL))))/,
    qr/(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL))))/,
    qr/(?<M>(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL)))))/,
    qr{
        (?<Whole_Match>

            (?<Group_A> a) ?
            (?<Group_B> b) ?
            (?<Group_C> c) ?

            (?(<Group_A>)               # Succeed
              | (?(<Group_B>)           # Succeed
                  | (?(<Group_C>)       # Succeed
                      |                 (*FAIL)
                    )
                )
            )

        )
    }x,
);

for my $pat (@pats) {
    say "\nTESTING $pat";
    $_ = "i can match bad crabcatchers from 34 bc and call a cab";
    while (/$pat/g) {
        say "$`<$&>$'";
    }
}

所有五个版本都产生此输出:

All five versions produce this output:

i <c>an match bad crabcatchers from 34 bc and call a cab
i c<a>n match bad crabcatchers from 34 bc and call a cab
i can m<a>tch bad crabcatchers from 34 bc and call a cab
i can mat<c>h bad crabcatchers from 34 bc and call a cab
i can match <b>ad crabcatchers from 34 bc and call a cab
i can match b<a>d crabcatchers from 34 bc and call a cab
i can match bad <c>rabcatchers from 34 bc and call a cab
i can match bad cr<abc>atchers from 34 bc and call a cab
i can match bad crabc<a>tchers from 34 bc and call a cab
i can match bad crabcat<c>hers from 34 bc and call a cab
i can match bad crabcatchers from 34 <bc> and call a cab
i can match bad crabcatchers from 34 bc <a>nd call a cab
i can match bad crabcatchers from 34 bc and <c>all a cab
i can match bad crabcatchers from 34 bc and c<a>ll a cab
i can match bad crabcatchers from 34 bc and call <a> cab
i can match bad crabcatchers from 34 bc and call a <c>ab
i can match bad crabcatchers from 34 bc and call a c<ab>

甜,嗯?

对于开始部分的 x,只需在比赛开始时放置您想要的任何 x,在非常a 部分的第一个可选捕获组,如下所示:

For the x in the beginning part, just put whatever x you want at the start of the match, before the very first optional capture group for the a part, so like this:

x(a)?(b)?(c)?(?(1)|(?(2)|(?(3)|(*FAIL))))

或者像这样

(?x)                        # enable non-insane mode

(?<Whole_Match>
    x                       # first match some leader string

    # now match a, b, and c, in that order, and each optional
    (?<Group_A> a ) ?
    (?<Group_B> b ) ?  
    (?<Group_C> c ) ?

    # now make sure we got at least one of a, b, or c
    (?(<Group_A>)           # SUCCEED!
      | (?(<Group_B>)       # SUCCEED!
          | (?(<Group_C>)   # SUCCEED!
              |             (*FAIL)
            )
        )
    )
)

测试句是在没有 x 部分的情况下构建的,所以它不会起作用,但我想我已经展示了我的意思.请注意,所有 xabc 都可以是任意复杂的模式(是的,甚至递归),而不仅仅是单个字母,甚至它们是否使用自己编号的捕获组也没有关系.

The test sentence was constructed without the x part, so it won’t work for that, but I think I’ve shown how I mean to go at this. Note that all of x, a, b, and c can be arbitrarily complex patterns (yes, even recursive), not merely single letters, and it doesn’t matter if they use numbered capture groups of their own, even.

如果你想先行一步,你可以这样做:

If you want to go at this with lookaheads, you can do this:

(?x)

(?(DEFINE)
       (?<Group_A> a)
       (?<Group_B> b)
       (?<Group_C> c)
)

x

(?= (?&Group_A)
  | (?&Group_B)
  | (?&Group_C)
)

(?&Group_A) ?
(?&Group_B) ?
(?&Group_C) ?

这里是添加到测试程序中的 @pats 数组以表明这种方法也有效的内容:

And here is what to add to the @pats array in the test program to show that this approach also works:

qr{
    (?(DEFINE)
        (?<Group_A> a)
        (?<Group_B> b)
        (?<Group_C> c)
    )

    (?= (?&Group_A)
      | (?&Group_B)
      | (?&Group_C)
    )

    (?&Group_A) ?
    (?&Group_B) ?
    (?&Group_C) ?
}x

请注意,即使使用前瞻技术,我仍然设法从不重复任何 abc.

You’ll notice please that I still manage never to repeat any of a, b, or c, even with the lookahead technique.

我赢了吗?☺

这篇关于匹配 xa?b?c? 的正则表达式但不是单独的 x的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆