捕获< thisPartOnly>和(thisPartOnly)具有相同的组 [英] Capturing <thisPartOnly> and (thisPartOnly) with the same group

查看:104
本文介绍了捕获< thisPartOnly>和(thisPartOnly)具有相同的组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们输入以下内容:

Let's say we have the following input:

<amy>
(bob)
<carol)
(dean>

我们还有以下正则表达式:

We also have the following regex:

<(\w+)>|\((\w+)\)

现在我们得到了两个匹配项(如在ularular.com上看到的):

Now we get two matches (as seen on rubular.com):


  • < amy> 是匹配项, \1 捕获 amy \2 失败

  • (bob)是匹配项, \2 捕获 bob \1 失败

  • <amy> is a match, \1 captures amy, \2 fails
  • (bob) is a match, \2 captures bob, \1 fails

此正则表达式可以满足我们的大部分需求,即:

This regex does most of what we want, which are:


  • 它正确地匹配了左括号和右括号(即没有混合)

  • 它捕获了我们感兴趣的部分

但是,它确实有一些缺点:

However, it does have a few drawbacks:


  • 捕获模式(即主要部分重复


    • 只有 \w + 在这种情况下,但通常来说可能会很复杂,


      • 如果涉及回溯引用,则必须为每个备用引用重新编号! / li>
      • 重复使维护成为噩梦! (如果更改了什么?)

      • The capturing pattern (i.e. the "main" part) is repeated
        • It's only \w+ in this case, but generally speaking this can be quite complex,
          • If it involves backreferences, then they must be renumbered for each alternate!
          • Repetition makes maintenance a nightmare! (what if it changes?)

          • 根据哪些替代匹配项,我们必须查询不同的组


            • 只有<$ c $在这种情况下,是c> \1 或 \2 ,但是通常,主要部分可以拥有自己的捕获组!

            • Depending on which alternate matches, we must query different groups
              • It's only \1 or \2 in this case, but generally the "main" part can have capturing groups of their own!

              所以问题很明显:我们如何在不重复主要模式的情况下做到这一点?


              注意:在大多数情况下,我对 java.util.regex 样式感兴趣,但欢迎使用其他样式。

              Note: for the most part I'm interested in java.util.regex flavor, but other flavors are welcomed.






              附录



              本节没有新内容;


              Appendix

              There's nothing new in this section; it only illustrates the problem mentioned above with an example.

              让我们将上面的示例转到下一步:我们现在要匹配这些:

              Let's take the above example to the next step: we now want to match these:

              <amy=amy>
              (bob=bob)
              [carol=carol]
              

              但不是这些:

              <amy=amy)   # non-matching bracket
              <amy=bob>   # left hand side not equal to right hand side
              

              使用替代技术,我们可以使用以下方法(在rubular.com上看到):

              Using the alternate technique, we have the following that works (as seen on rubular.com):

              <((\w+)=\2)>|\(((\w+)=\4)\)|\[((\w+)=\6)\]
              

              如上所述:


              • 主要模式不能简单地重复;向后引用必须重新编号

              • 重复也意味着维护噩梦,如果它发生变化

              • 根据哪些替代匹配项,我们必须查询 \1 \2 \3 \4 \5 \6

              • The main pattern can't simply be repeated; backreferences must be renumbered
              • Repetition also means maintenance nightmare if it ever changes
              • Depending on which alternate matches, we must query either \1 \2, \3 \4, or \5 \6

              推荐答案

              您可以使用前瞻功能锁定组

              You can use a lookahead to "lock in" the group number before doing the real match.

              String s = "<amy=amy>(bob=bob)[carol=carol]";
              Pattern p = Pattern.compile(
                "(?=[<(\\[]((\\w+)=\\2))(?:<\\1>|\\(\\1\\)|\\[\\1\\])");
              Matcher m = p.matcher(s);
              
              while(m.find())
              {
                System.out.printf("found %s in %s%n", m.group(2), m.group());
              }
              

              输出:

              found amy in <amy=amy>
              found bob in (bob=bob)
              found carol in [carol=carol]
              

              这仍然很丑陋,但是您不必重新计算所有例如,要添加对大括号的支持,就是这样:

              It's still ugly as hell, but you don't have to recalculate all the group numbers every time you make a change. For example, to add support for curly brackets, it's just:

              "(?=[<(\\[{]((\\w+)=\\2))(?:<\\1>|\\(\\1\\)|\\[\\1\\]|\\{\\1\\})"
              

              这篇关于捕获&lt; thisPartOnly&gt;和(thisPartOnly)具有相同的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆