Java 正则表达式匹配器找不到所有可能的匹配项 [英] Java Regular Expression Matcher doesn't find all possible matches
问题描述
我正在 TutorialsPoint 上查看代码,从那以后一直困扰着我……看看这段代码:
I was looking at a code at TutorialsPoint and something has been bothering me since then... take a look at this code :
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
}
}
此代码成功打印:
Found value: This was placed for QT300
Found value: 0
Found value: ! OK?
但根据正则表达式"(.*)(\\d+)(.*)"
,为什么不返回其他可能的结果,例如:
but according to the regex "(.*)(\\d+)(.*)"
, why doesn't it return other possible outcomes such as :
Found value: This was placed for QT30
Found value: 00
Found value: ! OK?
或
Found value: This was placed for QT
Found value: 3000
Found value: ! OK?
如果此代码不适合这样做,那么我如何编写可以找到所有可能匹配项的代码?
and if this code isn't suited to do so, then how can I write one that can find all possible matches ?
推荐答案
It's because of the greediness of *
and there comes the backtracking.
字符串:
This order was placed for QT3000! OK?
正则表达式:
(.*)(\\d+)(.*)
我们都知道 .*
是贪婪的,尽可能匹配所有字符.所以第一个 .*
匹配所有字符直到最后一个字符 ?
然后它回溯以提供匹配.我们正则表达式中的下一个模式是 \d+
,因此它回溯到一个数字.一旦找到一个数字,\d+
就会匹配那个数字,因为这里满足条件(\d+
匹配一个或多个数字).现在第一个 (.*)
捕获 This order was place for QT300
和下面的 (\\d+)
捕获数字 0
位于 !
符号之前.
We all know that .*
is greedy and matches all characters as much as possible. So the first .*
matches all the characters upto the last character that is ?
and then it backtracks in-order to provide a match. The next pattern in our regex is \d+
, so it backtracks upto a digit. Once it finds a digit, \d+
matches that digit because the condition is satisfied here (\d+
matches one or more digits). Now the first (.*)
captures This order was placed for QT300
and the following (\\d+)
captures the digit 0
located just before to the !
symbol.
现在下一个模式 (.*)
捕获所有剩余的字符,即 !
.m.group(1)
指的是出现在组索引 1 中的字符,m.group(2)
指的是索引 2,就这样下去.
Now the next pattern (.*)
captures all the remaining characters that is !<space>OK?
. m.group(1)
refers to the characters which are present inside the group index 1 and m.group(2)
refers to the index 2, like that it goes on.
在此处查看演示.
获得您想要的输出.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d{2})(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
}
输出:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
(.*)(\\d{2})
,最多回溯两位数以提供匹配.
(.*)(\\d{2})
, backtracks upto two digits in-order to provide a match.
改变你的模式,
String pattern = "(.*?)(\\d+)(.*)";
为了得到类似的输出,
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
?
在 *
之后强制 *
进行非贪婪匹配.
?
after the *
forces the *
to do a non-greedy match.
使用额外的捕获组从单个程序中获取输出.
Use extra captuing groups to get the outputs from a single program.
String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while(m.find( )) {
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(4));
System.out.println("Found value: " + m.group(5));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3) + m.group(4));
System.out.println("Found value: " + m.group(5));
}
输出:
Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?
这篇关于Java 正则表达式匹配器找不到所有可能的匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!