为什么在`(.+?)`后面加一个空格可以完全改变结果 [英] why adding a space after `(.+?)` can completely change the result
问题描述
我正在尝试在更大的字符串中找到一个更小的字符串,String patternString1 = "(John) (.+?)";
.较小的字符串由两组组成,即 (John) (.+?)
.但是,我在 (.+?)
后面加一个空格就得到了完全不同的结果.
I'm trying to find a smaller string, String patternString1 = "(John) (.+?)";
, within a larger string. The smaller string are consist of two groups i.e. (John) (.+?)
. However, I have obtained completely different result just by adding a space after (.+?)
.
for String patternString1 = "(John) (.+?)";
,(即没有空格),结果为
for String patternString1 = "(John) (.+?)";
, (i.e. without space), the result is
found: John w
found: John D
found: John W
对于String patternString1 = "(John) (.+?) ";
,(即带空格),结果为
For String patternString1 = "(John) (.+?) ";
, (i.e. with space), the result is
found: John writes
found: John Doe
found: John Wayne
为什么一个空间会对结果产生如此大的影响?
How come a space can make such a big difference to the result?
String text
= "John writes about this, and John Doe writes about that,"
+ " and John Wayne writes about everything.";
String patternString1 = "(John) (.+?)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("found: " + matcher.group(1) + " " + matcher.group(2));
}
推荐答案
.+?
量词是不情愿的(或懒惰的").这意味着它将匹配它量化的子模式一次或多次,但尽可能少地返回有效匹配.
The .+?
quantifier is reluctant (or "lazy"). It means it will match the subpattern it quantifies one or more times, but as few times as necessary to return a valid match.
你有 (John) (.+?)
模式,你试图在 John 写的这个
中找到匹配.正则表达式引擎找到John
,将其放入Group 1 内存缓冲区,找到一个空格,匹配它,然后在writes
中找到w
.w
匹配,所以满足一个或多个的要求.由于匹配已经有效,它被返回.你得到 John w
.
You have (John) (.+?)
pattern and you try to find a match in John writes about this
. The regex engine finds John
, places it into Group 1 memory buffer, finds a space, matches it, and then finds w
in writes
. The w
is matched, so the requirement of one or more is met. Since the match is already valid, it is returned. You get John w
.
现在,您在 (.+?)
之后添加一个空格.John
像以前一样匹配并捕获到 Group 1 中,空格与模式中的空格匹配(再次,和以前一样),然后执行 .+?
-在 writes
之前找到一个空位置.它匹配这个位置并继续匹配一个空格.该位置没有空间,因为有 w
.正则表达式引擎返回到 .+?
并使用 w
.检查 r
是否为空格 - 不,不是.引擎以这种方式检查字符串直到第一个匹配的空格,并在 writes
之后立即找到它.因此,您对 (John) (.+?)
的有效匹配是 John writes
.
Now, you add a space after (.+?)
. The John
is matched and captured into Group 1 as before, the space is matched with the space in the pattern (again, as before), then .+?
is executed - finds an empty location before writes
. It matches this location and goes on to match a space. There is no space at that location, since there is w
. The regex engine goes back to .+?
and consumes w
. Checks if r
is a space - no, it is not. The engine checks the string this way up to the first matching space and finds it right after writes
. Thus, your valid match for (John) (.+?)
is John writes
.
这篇关于为什么在`(.+?)`后面加一个空格可以完全改变结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!