以任意顺序匹配可选的捕获组 [英] Matching optional capture groups in any order

查看:70
本文介绍了以任意顺序匹配可选的捕获组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在解析用户输入时有很多情况,其中用户有机会向输入添加几个可选标志,这些标志应以任何顺序接受.如何使用正则表达式进行解析,以便每个标志(如果存在)将位于其自己的捕获组中?

There are many situations in parsing user input where the user has the opportunity to add several optional flags to the input which should be accepted in any order. How can this be parsed with regex so that each flag will be in it's own capture group if it is present?

例如:

有一个必需的令牌a,然后是3个可选的令牌,它们可以按任意顺序排列bcd.

There is a required token a, and then 3 optional tokens which can come in any order b, c, and d.

一些可以接受的输入是:

Some acceptable inputs would be:

a
a b
a c
a b c
a c b
a b c d
a d b c
a c d b

捕获组应始终如下所示:

The capture groups should always look like this:

0 => (anything, this is ignored)
1 => a
2 => b or null
3 => c or null
4 => d or null

这个问题有几个部分已经得到解决:

There are several parts to this problem that have already been answered:

  1. 使用(...)?表单将捕获组设为可选
  2. 使用超前(?=.*b)(?=.*c)(?=.*d)允许事物以任何顺序排列
  1. Using the (...)? form to make a capture group optional
  2. Using lookaheads (?=.*b)(?=.*c)(?=.*d) to allow things to be in any order

但是这些策略的组合不起作用:(a)(?=.*(b)?)(?=.*(c)?)(?=.*(d)?)

But the combination of these strategies doesn't work: (a)(?=.*(b)?)(?=.*(c)?)(?=.*(d)?)

Regex101测试

什么正则表达式将允许以任何顺序找到可选令牌?

What regex would allow optional tokens to be found in any order?

(答案可以使用任何正则表达式)

(The answer can use any flavor of regex)

推荐答案

适用于多种口味的正则表达式为:

A regex that works in many flavors is:

(a)(?=(?:.*(b))?)(?=(?:.*(c))?)(?=(?:.*(d))?)

这种形式是模块化的,只需在其上添加另一个(?=(?:.*(xxx))?)即可.之所以起作用,是因为它迫使.*进行回溯,而且还阻止了.*?立即退出(因为可以立即匹配下一个标记).

This form is modular in that adding on to it simply requires adding on another (?=(?:.*(xxx))?) to the pattern. It works because it forces the .* to do its backtracking, but also keeps a .*? from quitting immediately (since the next token is can be matched immediately).

经过Regex101测试 (可在PCRE,JavaScript和Python中使用)

JavaScript示例: JSFiddle

JavaScript Example: JSFiddle

var cmd = document.getElementById("cmd"),
    pre = document.getElementById("output"),
    reg = /(a)(?=(?:.*(b))?)(?=(?:.*(c))?)(?=(?:.*(d))?)/;
cmd.onkeyup = function() {
  var m = reg.exec(cmd.value) || [],
      output = "Match\n";
  for (var i = 1; i < m.length; i++)
    output += "[" + i + "] => " + (m[i] || "null") + "\n";
  pre.innerHTML = m.length ? output : "No Match";
}

Enter command: <input id="cmd" type="text" />
<pre id="output">No Match</pre>

问题中这两种策略的组合不起作用,因为形式.*(x)?过于贪婪(它跳过了捕获组).另一方面,.*?(x)?太懒了(因为它注意到下一项是可选的,所以它停在第一个索引处).

The combination of the two strategies in the question doesn't work because the form .*(x)? is too greedy (it skips over the capture group). On the other hand, .*?(x)? is too lazy (it is stops at the first index because it notices that the next item is optional).

这篇关于以任意顺序匹配可选的捕获组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆