为什么空正则表达式和空捕获组正则表达式返回字符串长度加一个结果 [英] Why empty regex and empty capturing group regex return string length plus one results

查看:91
本文介绍了为什么空正则表达式和空捕获组正则表达式返回字符串长度加一个结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您如何解释空正则表达式和空捕获组正则表达式返回字符串长度加一个结果?

How would you explain that empty regex and empty capturing group regex return string length plus one results?

代码

public static void main(String... args) {
    {
        System.out.format("Pattern - empty string\n");
        String input = "abc";
        Pattern pattern = Pattern.compile("");
        Matcher matcher = pattern.matcher(input);
        while (matcher.find()) {
            String s = matcher.group();
            System.out.format("[%s]: %d / %d\n", s, matcher.start(),
                    matcher.end());
        }
    }
    {
        System.out.format("Pattern - empty capturing group\n");
        String input = "abc";
        Pattern pattern = Pattern.compile("()");
        Matcher matcher = pattern.matcher(input);
        while (matcher.find()) {
            String s = matcher.group();
            System.out.format("[%s]: %d / %d\n", s, matcher.start(),
                    matcher.end());
        }
    }
}

输出

Pattern - empty string
[]: 0 / 0
[]: 1 / 1
[]: 2 / 2
[]: 3 / 3
Pattern - empty capturing group
[]: 0 / 0
[]: 1 / 1
[]: 2 / 2
[]: 3 / 3


推荐答案

正则表达式引擎也会考虑字符前后的位置。你可以看到这样的事实,他们有像 ^ (字符串的开头), $ (字符串的结尾) )和 \b 字边界,它在某些位置匹配而不匹配任何字符(因此在字符之间/之前/之后)。因此,我们在必须考虑的字符之间有N-1个位置,以及第一个和最后一个位置(因为 ^ $ 将分别匹配),它给你N + 1个候选位置。所有这些都匹配一个完全无限制的空模式。

Regex engines consider positions before and after characters, too. You can see this from the fact that they have things like ^ (start of string), $ (end of string) and \b word boundary, which match at certain positions without matching any characters (and therefore between/before/after characters). Therefore we have the N-1 positions between characters that have to be considered, as well as the first and last position (because ^ and $ would match there respectively), which gives you N+1 candidate positions. All of which match for a completely unrestrictive empty pattern.

所以这是你的比赛:

" a b c "
 ^ ^ ^ ^

这显然是N N个字符的+1。

Which is obviously N+1 for N characters.

您将获得与其他允许零长度匹配的模式相同的行为,并且实际上不会在您的模式中找到更长的匹配。例如,尝试 \d * 。它在输入字符串中找不到任何数字,但 * 将很乐意返回零长度匹配。

You will get the same behavior with other patterns that allow zero-length matches and don't actually find longer ones in your pattern. For instance, try \d*. It cannot find any digits in your input string, but * will gladly return zero-length matches.

这篇关于为什么空正则表达式和空捕获组正则表达式返回字符串长度加一个结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆