多次捕获组 [英] Capture group multiple times

查看:42
本文介绍了多次捕获组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我一直在玩 Java 中的正则表达式,我发现自己遇到了一个(理论上)很容易解决的问题,但我在徘徊是否有更简单的方法来做到这一点(是的,是的,我很懒),问题是多次捕获一组,这是:

Lately I have being playing around with regex in Java, and I find myself into a problem which (theoretically) is easy to solve, but I was wandering if there is any easier way to do it (Yes, yes I am lazy), the problem is capture a group multiple times, this is:

public static void main(String[] args) {
    Pattern p = Pattern.compile("A (IvI(.*?)IvI)*? A");
    Matcher m = p.matcher("A IvI asd IvI IvI qwe IvI A"); //ANY NUMBER of IvI x IvI
    //Matcher m = p.matcher("A  A");
    int loi = 0; //last Occurrence Index
    String storage;
    while (loi >= 0 && m.find(loi)) {
        System.out.println(m.group(1));
        if ((storage = m.group(2)) != null) {
            System.out.println(storage);
        }
        //System.out.println(m.group(1));
        loi = m.end(1);
    }
    m.find();
    System.out.println("2 opt");
    Pattern p2 = Pattern.compile("IvI(.*?)IvI");
    Matcher m2 = p2.matcher(m.group(1)); //m.group(1) = "IvI asd IvI IvI qwe IvI"
    loi = 0;
    while (loi >= 0 && m2.find(loi)) {
        if ((storage = m2.group(1)) != null) {
            System.out.println(storage);
        }
        loi = m2.end(0);
    }
}

使用 ONLY Pattern p 有什么办法可以得到 IvI's 里面的东西吗?(在测试中字符串将是 "asd" 和 "qwe") 考虑到可能有任意数量的 IvI's 部分,类似于我在第一次尝试做的事情,即找到第一次出现的组,然后移动索引并搜索下一组等等......

Using ONLY Pattern p is there any way to get what is inside IvI's? (in the test string would be "asd" and "qwe") considering that there could be any number of IvI's sections, something alike of what I am trying to do in the first while which is, finding the first occurrence of the group, then moving the index and search for the next group and so on and so on...

使用我写的代码,虽然它返回 asd IvI IvI qwe 作为组 2,而不仅仅是 asd 然后是 qwe,部分我想这可能是因为 (.*?) 部分,不应该是贪婪的,但它仍然上升到 qwe 消耗两个 IvI 的>,我提到这一点是因为否则我可以使用 matcher.find(anInt) 方法的结束索引,但它也不起作用;我不认为正则表达式有什么问题,因为下一个代码可以在不消耗 IvI 的情况下工作.

Using the code I wrote in that while it returns asd IvI IvI qwe as the group 2, not just asd and then qwe, in part I suppose it could be because of the (.*?) part, is is not supposed to be greedy but still it goes up to the qwe consuming two of the IvI's, I mention this because otherwise I may be able to use the end index of those with the matcher.find(anInt) method, but it does not work either; I don't think it is anything wrong with the regex, since the next code works without consuming the IvI.

public static void main(String[] args) {
    Pattern p = Pattern.compile("(.*?)IvI");
    Matcher m = p.matcher("bla bla blaIvI");
    m.find();
    System.out.println(m.group(1));
}

打印:bla bla bla

有一个我知道的解决方案(但我懒得记住)

THERE IS A SOLUTION I KNOW (but I am lazy remember)

(同样在第一个代码上,下面是2 opt"消息)解决方案是将其划分为子组并使用另一个正则表达式,一次只处理一个子组...

(Also on the first code, bellow "2 opt" message) The solution is dividing it into sub-groups and use another regex where you process only those sub-groups one at a time...

顺便说一句:我做了功课在 this 页面中提到

BTW: I did my homework In this page it mentions

由于带有量词的捕获组保留其编号,因此当您检查该组时,引擎返回什么值?所有引擎都返回最后捕获的值.例如,如果您将字符串 A_B_C_D_ 与 ([A-Z])+ 匹配,当您检查匹配时,第 1 组将是 D.除 .NET 引擎外,所有中间值都将丢失.本质上,每次匹配模式时,第 1 组都会被覆盖.

Since a capture group with a quantifier holds on to its number, what value does the engine return when you inspect the group? All engines return the last value captured. For instance, if you match the string A_B_C_D_ with ([A-Z])+, when you inspect the match, Group 1 will be D. With the exception of the .NET engine, all intermediate values are lost. In essence, Group 1 gets overwritten each time its pattern is matched.

但我还是希望你给我个好消息...

But I am still hoping you to give me good news...

推荐答案

不,不幸的是,正如您的引文已经提到的,java.util.regex 正则表达式实现不支持在单场比赛.如您的代码所示,获得这些的唯一方法是通过 find() 对正则表达式的重复部分进行多次匹配.

No, unfortunately, as your citation already mentions, the java.util.regex regular expression implementation does not support retrieving any previous values of a repeated capturing group after a single match. The only way to get those, as your code illustrates, is by find()ing multiple matches of the repeated part of your regular expression.

我也一直在研究 Java 中正则表达式的其他实现,例如:

I've also been looking at other implementations of regular expressions in Java, for example:

但我找不到任何支持它的(只有 Microsoft .NET 引擎).如果我理解正确的话,基于状态机的正则表达式的实现不能轻易实现这个特性.不过,java.util.regex 不使用状态机.

but I could not find any that supported it (only the Microsoft .NET engine) . If I understood correctly, implementations of regular expressions based on state machines cannot easily implement this feature. java.util.regex does not use state machines, though.

如果有人知道支持这种行为的 Java 正则表达式库,请分享,因为这将是一个强大的功能.

If anyone knows of a Java regular expression library that supports this behaviour, please share it, because it would be a powerful feature.

附言我花了很长时间才明白你的问题.标题很好,但正文让我不知道我是否理解正确.

p.s. it took me quite a while to understand your question. The title is good, but the body confused me about whether I understood you correctly.

这篇关于多次捕获组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆