简化正则表达式“ab|a|b" [英] Simplifying the regex "ab|a|b"

查看:77
本文介绍了简化正则表达式“ab|a|b"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(如何)可以简化以下正则表达式:

(How) could the following regex be simplified:

ab|a|b

?

我正在寻找一个不那么冗余的,即只有一个 a 和一个 b.可能吗?

I'm looking for a less redundant one, i.e. with only one a and one b. Is it possible?

一些尝试:

a?b?       # matches empty string while shouldn't
ab?|b      # still two b

<小时>

请注意,真正的正则表达式具有更复杂的 ab 部分,即不是单个字符而是内部子正则表达式.


Note that the real regex has more complicated a and b parts, i.e. not a single char but inner subregexes let's say.

推荐答案

如果您正在使用 Perl 或某些 PCRE 引擎(如 PHP 的 preg_ 函数),您可以参考模式中的前几组,像这样:

If you are using Perl or some PCRE engine (like PHP's preg_ functions), you can refer to previous groups in the pattern, like this:

/(a)(b)|(?1)|(?2)/

此功能的主要目的是支持递归,但也可用于模式重用.

The main purpose of this feature is to support recursion, but it can be used for pattern reuse as well.

请注意,在这种情况下,您无法避免在第一次交替中捕获 ab,这会导致一些(可能)不必要的开销.为避免这种情况,您可以在永远不会执行的条件中定义组.执行此操作的规范方法是使用 (?(DEFINE)...) 组(检查命名的 DEFINE 组是否匹配任何内容,但该组当然不会'不存在):

Note that in this case you cannot get around capturing a and b in the first alternation, which incurs some (possibly) unnecessary overhead. To avoid this, you can define the groups inside a conditional that is never executed. The canonical way to do this is to use (?(DEFINE)...) group (which checks if a named DEFINE group matched anything, but of course that group doesn't exist):

/(?(DEFINE)(a)(b))(?1)(?2)|(?1)|(?2)/

如果您的引擎不支持该功能(,因为您使用的是 Java,因此不支持此功能),那么您可以在单一模式中获得的最佳效果确实是

If your engine doesn't support that ( since you are using Java, no this feature is not supported), the best you can get in a single pattern is indeed

ab?|b

或者,您可以通过字符串连接/格式手动构建 ab|a|b 版本,例如:

Alternatively, you can build the ab|a|b version manually by string concatenation/formatting like:

String a = "a";
String b = "b";
String pattern = a + b + "|" + a + "|" + b;

这也避免了重复.或者您可以对主题字符串使用 3 个单独的模式 abab(其中第一个再次是后两个的串联).

This avoids the duplication as well. Or you can use 3 separate patterns ab, a and b against the subject string (where the first one is again a concatenation of the latter two).

这篇关于简化正则表达式“ab|a|b"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆