正则表达式分裂成重叠的字符串 [英] Regex split into overlapping strings

查看:111
本文介绍了正则表达式分裂成重叠的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索正则表达式的强大功能,所以我只是想知道这样的事情是否可行:

I'm exploring the power of regular expressions, so I'm just wondering if something like this is possible:

public class StringSplit {
    public static void main(String args[]) {
        System.out.println(
            java.util.Arrays.deepToString(
                "12345".split(INSERT_REGEX_HERE)
            )
        ); // prints "[12, 23, 34, 45]"
    }
}

如果可能的话,那就简单地提供正则表达式(以及对它如何工作的先发制人的解释)。

If possible, then simply provide the regex (and preemptively some explanation on how it works).

如果它只能用于Java之外的一些正则表达式,那么也可以随意提供。

If it's only possible in some regex flavors other than Java, then feel free to provide those as well.

如果不可能,那么请解释原因。

If it's not possible, then please explain why.

奖金问题

同样的问题,但是 find()循环而不是拆分

Same question, but with a find() loop instead of split:

    Matcher m = Pattern.compile(BONUS_REGEX).matcher("12345");
    while (m.find()) {
        System.out.println(m.group());
    } // prints "12", "23", "34", "45"



< hr>

请注意,并不是因为我有一个具体的任务来完成这种或那种方式,而是我想要了解正则表达式。我不需要做我想要的代码;我想要正则表达式,如果它们存在,我可以在上面的代码中使用它来完成任务(或其他版本的正则表达式,将代码直接翻译成另一种语言)。


Please note that it's not so much that I have a concrete task to accomplish one way or another, but rather I want to understand regular expressions. I don't need code that does what I want; I want regexes, if they exist, that I can use in the above code to accomplish the task (or regexes in other flavors that work with a "direct translation" of the code into another language).

如果它们不存在,我想要一个很好的解释原因。

And if they don't exist, I'd like a good solid explanation why.

推荐答案

我不认为这可以用 split(),但是使用 find()它非常简单。只需使用带有捕获组的前瞻:

I don't think this is possible with split(), but with find() it's pretty simple. Just use a lookahead with a capturing group inside:

Matcher m = Pattern.compile("(?=(\\d\\d)).").matcher("12345");
while (m.find())
{
  System.out.println(m.group(1));
}

很多人都没有意识到在前瞻或后视中捕获的文字可能是在匹配后引用就像任何其他捕获一样。在这种情况下,它特别违反直觉,其中捕获是整体匹配的超集。

Many people don't realize that text captured inside a lookahead or lookbehind can be referenced after the match just like any other capture. It's especially counter-intuitive in this case, where the capture is a superset of the "whole" match.

事实上,即使正则表达式为一整个都没有匹配。从上面的正则表达式中删除点((?=(\\\\\))),您将得到相同的结果。这是因为,只要成功匹配不消耗任何字符,正则表达式引擎会在尝试再次匹配之前自动向前突破一个位置,以防止无限循环。

As a matter of fact, it works even if the regex as a whole matches nothing. Remove the dot from the regex above ("(?=(\\d\\d))") and you'll get the same result. This is because, whenever a successful match consumes no characters, the regex engine automatically bumps ahead one position before trying to match again, to prevent infinite loops.

没有 split()相当于这种技术,但至少不是Java。虽然您可以拆分外观和其他零宽度断言,但是无法使相同的字符出现在多个生成的子字符串中。

There's no split() equivalent for this technique, though, at least not in Java. Although you can split on lookarounds and other zero-width assertions, there's no way to get the same character to appear in more than one of the resulting substrings.

这篇关于正则表达式分裂成重叠的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆