为什么在`(.+?)`后面加一个空格可以完全改变结果 [英] why adding a space after `(.+?)` can completely change the result

查看:37
本文介绍了为什么在`(.+?)`后面加一个空格可以完全改变结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在更大的字符串中找到一个更小的字符串,String patternString1 = "(John) (.+?)";.较小的字符串由两组组成,即 (John) (.+?).但是,我在 (.+?) 后面加一个空格就得到了完全不同的结果.

I'm trying to find a smaller string, String patternString1 = "(John) (.+?)";, within a larger string. The smaller string are consist of two groups i.e. (John) (.+?). However, I have obtained completely different result just by adding a space after (.+?).

for String patternString1 = "(John) (.+?)";,(即没有空格),结果为

for String patternString1 = "(John) (.+?)";, (i.e. without space), the result is

found: John w
found: John D
found: John W

对于String patternString1 = "(John) (.+?) ";,(即带空格),结果为

For String patternString1 = "(John) (.+?) ";, (i.e. with space), the result is

found: John writes
found: John Doe
found: John Wayne

为什么一个空间会对结果产生如此大的影响?

How come a space can make such a big difference to the result?

String text
        = "John writes about this, and John Doe writes about that,"
        + " and John Wayne writes about everything.";

String patternString1 = "(John) (.+?)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
    System.out.println("found: " + matcher.group(1) + " " + matcher.group(2));
}

推荐答案

.+? 量词是不情愿的(或懒惰的").这意味着它将匹配它量化的子模式一次或多次,尽可能少地返回有效匹配.

The .+? quantifier is reluctant (or "lazy"). It means it will match the subpattern it quantifies one or more times, but as few times as necessary to return a valid match.

你有 (John) (.+?) 模式,你试图在 John 写的这个 中找到匹配.正则表达式引擎找到John,将其放入Group 1 内存缓冲区,找到一个空格,匹配它,然后在writes 中找到w.w 匹配,所以满足一个或多个的要求.由于匹配已经有效,它被返回.你得到 John w.

You have (John) (.+?) pattern and you try to find a match in John writes about this. The regex engine finds John, places it into Group 1 memory buffer, finds a space, matches it, and then finds w in writes. The w is matched, so the requirement of one or more is met. Since the match is already valid, it is returned. You get John w.

现在,您在 (.+?) 之后添加一个空格.John 像以前一样匹配并捕获到 Group 1 中,空格与模式中的空格匹配(再次,和以前一样),然后执行 .+? -在 writes 之前找到一个空位置.它匹配这个位置并继续匹配一个空格.该位置没有空间,因为有 w.正则表达式引擎返回到 .+? 并使用 w.检查 r 是否为空格 - 不,不是.引擎以这种方式检查字符串直到第一个匹配的空格,并在 writes 之后立即找到它.因此,您对 (John) (.+?) 的有效匹配是 John writes .

Now, you add a space after (.+?). The John is matched and captured into Group 1 as before, the space is matched with the space in the pattern (again, as before), then .+? is executed - finds an empty location before writes. It matches this location and goes on to match a space. There is no space at that location, since there is w. The regex engine goes back to .+? and consumes w. Checks if r is a space - no, it is not. The engine checks the string this way up to the first matching space and finds it right after writes. Thus, your valid match for (John) (.+?) is John writes .

这篇关于为什么在`(.+?)`后面加一个空格可以完全改变结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆