Java 正则表达式匹配器找不到所有可能的匹配项 [英] Java Regular Expression Matcher doesn't find all possible matches

查看:22
本文介绍了Java 正则表达式匹配器找不到所有可能的匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 TutorialsPoint 上查看代码,从那以后一直困扰着我……看看这段代码:

I was looking at a code at TutorialsPoint and something has been bothering me since then... take a look at this code :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(.*)(\\d+)(.*)";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3));
      }
   }
}

此代码成功打印:

Found value: This was placed for QT300 
Found value: 0
Found value: ! OK?

但根据正则表达式"(.*)(\\d+)(.*)",为什么不返回其他可能的结果,例如:

but according to the regex "(.*)(\\d+)(.*)", why doesn't it return other possible outcomes such as :

Found value: This was placed for QT30 
Found value: 00
Found value: ! OK?

Found value: This was placed for QT 
Found value: 3000
Found value: ! OK?

如果此代码不适合这样做,那么我如何编写可以找到所有可能匹配项的代码?

and if this code isn't suited to do so, then how can I write one that can find all possible matches ?

推荐答案

这是因为贪婪* 并且出现了 回溯.

It's because of the greediness of * and there comes the backtracking.

字符串:

This order was placed for QT3000! OK?

正则表达式:

(.*)(\\d+)(.*)

我们都知道 .* 是贪婪的,尽可能匹配所有字符.所以第一个 .* 匹配所有字符直到最后一个字符 ? 然后它回溯以提供匹配.我们正则表达式中的下一个模式是 \d+,因此它回溯到一个数字.一旦找到一个数字,\d+ 就会匹配那个数字,因为这里满足条件(\d+ 匹配一个或多个数字).现在第一个 (.*) 捕获 This order was place for QT300 和下面的 (\\d+) 捕获数字 0 位于 ! 符号之前.

We all know that .* is greedy and matches all characters as much as possible. So the first .* matches all the characters upto the last character that is ? and then it backtracks in-order to provide a match. The next pattern in our regex is \d+, so it backtracks upto a digit. Once it finds a digit, \d+ matches that digit because the condition is satisfied here (\d+ matches one or more digits). Now the first (.*) captures This order was placed for QT300 and the following (\\d+) captures the digit 0 located just before to the ! symbol.

现在下一个模式 (.*) 捕获所有剩余的字符,即 !OK?.m.group(1) 指的是出现在组索引 1 中的字符,m.group(2) 指的是索引 2,就这样下去.

Now the next pattern (.*) captures all the remaining characters that is !<space>OK?. m.group(1) refers to the characters which are present inside the group index 1 and m.group(2) refers to the index 2, like that it goes on.

此处查看演示.

获得您想要的输出.

String line = "This order was placed for QT3000! OK?";
  String pattern = "(.*)(\\d{2})(.*)";

  // Create a Pattern object
  Pattern r = Pattern.compile(pattern);

  // Now create matcher object.
  Matcher m = r.matcher(line);
  while(m.find( )) {
     System.out.println("Found value: " + m.group(1));
     System.out.println("Found value: " + m.group(2));
     System.out.println("Found value: " + m.group(3));
  }

输出:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?

(.*)(\\d{2}),最多回溯两位数以提供匹配.

(.*)(\\d{2}), backtracks upto two digits in-order to provide a match.

改变你的模式,

String pattern = "(.*?)(\\d+)(.*)";

为了得到类似的输出,

Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

?* 之后强制 * 进行非贪婪匹配.

? after the * forces the * to do a non-greedy match.

使用额外的捕获组从单个程序中获取输出.

Use extra captuing groups to get the outputs from a single program.

String line = "This order was placed for QT3000! OK?";
String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))";
Pattern r = Pattern.compile(pattern);
      Matcher m = r.matcher(line);
      while(m.find( )) {
         System.out.println("Found value: " + m.group(1));
         System.out.println("Found value: " + m.group(4));
         System.out.println("Found value: " + m.group(5));
         System.out.println("Found value: " + m.group(2));
         System.out.println("Found value: " + m.group(3) + m.group(4));
         System.out.println("Found value: " + m.group(5));
     }

输出:

Found value: This order was placed for QT30
Found value: 00
Found value: ! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

这篇关于Java 正则表达式匹配器找不到所有可能的匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆