Java重复模式匹配(2) [英] Java repetitive pattern matching (2)

查看:49
本文介绍了Java重复模式匹配(2)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下正则表达式:

(([^\|])*\|)*([^\|]*)

此匹配类型的重复字符串模式

("whatever except |" |) {0 to any times} ("whatever except |" |) {1 time}

因此它应该与以下字符串匹配,该字符串包含17个子字符串(重复16个子字符串,最后一个加上"z").

"abcd  | e | fg | hijk | lmnop | |   | qrs |   t| uv| w |||||x   y|  z"

确实, RegexPal 验证给定的正则表达式确实与上述字符串匹配.

现在,我想获取每个子字符串(例如,"abcd |","e |","fg |"等),对于这些子字符串,它们的数量,长度等没有任何先验知识. /p>

根据标题相似的之前的StackOverflow帖子Matcher的文档类find()方法,我只需要做类似的事情

Pattern pattern = Pattern.compile(regex); // regex is the above regex
Matcher matcher = pattern.matcher(input); // input is the above string

while (matcher.find())
{
   System.out.println(matcher.group(1));
}

但是,当我这样做时,我只得到2个字符串打印出来:最后一个重复的子字符串("x y |")和一个空值;绝对不是我期望的16个子字符串.

一件好事也是在运行find()循环之前检查是否确实发生了匹配,但是我不确定matches()groupCount() > 0还是其他假设find()也进行匹配,则应使用条件,而不需要进行两次匹配工作.

因此,问题:

  1. 如何获取所有16个重复的子字符串?
  2. 如何获取最后一个子字符串?
  3. 如何检查字符串是否匹配?

解决方案

如果必须使用正则表达式...

1)如何获得所有16个重复的子字符串?

请参阅下文.循环进行比赛时,您不需要匹配的所有内容,只需要您想要的部分. (我有17场比赛,对吗?)

2)如何获取最后一个子字符串?

将delim切换到正则表达式的开头,并允许使用'^'.

3)如何检查字符串是否匹配?

什么才有资格不匹配?任何字符串都将匹配.


这是使用正则表达式的解决方案:

 String input = "abcd  | e | fg | hijk | lmnop | |   | qrs |   t| uv| w |||||x   y|  z";
int expectedSize = 17;
List<String> expected = new ArrayList<String>(Arrays.asList("abcd  ", " e ", " fg ", " hijk ", " lmnop ", " ", "   ", " qrs ", "   t", " uv", " w ", "",
    "", "", "", "x   y", "  z"));

List<String> matches = new ArrayList<String>();

// Pattern pattern = Pattern.compile("(?:\\||^)([^\\|]*)");
Pattern pattern = Pattern.compile("(?:_?\\||^)([^\\|]*?)(?=_?\\||$)"); // Edit: allows _| or | as delim

for (Matcher matcher = pattern.matcher(input); matcher.find();)
{
  matches.add(matcher.group(1));
}

for (int idx = 0, len = matches.size(); idx < len; idx++)
{
  System.out.format("[%-2d] \"%s\"%n", idx + 1, matches.get(idx));
}

assertSame(expectedSize, matches.size());
assertEquals(expected, matches);
 

输出

 [1 ] "abcd  "
[2 ] " e "
[3 ] " fg "
[4 ] " hijk "
[5 ] " lmnop "
[6 ] " "
[7 ] "   "
[8 ] " qrs "
[9 ] "   t"
[10] " uv"
[11] " w "
[12] ""
[13] ""
[14] ""
[15] ""
[16] "x   y"
[17] "  z"
 

Consider the following regex:

(([^\|])*\|)*([^\|]*)

This matches repetitive string patterns of the type

("whatever except |" |) {0 to any times} ("whatever except |" |) {1 time}

So it should match the following String, which has 17 substrings (16 repeated, plus " z" as the last one).

"abcd  | e | fg | hijk | lmnop | |   | qrs |   t| uv| w |||||x   y|  z"

Indeed, RegexPal verifies that the given regex does match the above string.

Now, I want to get each of the substrings (i.e., "abcd |", "e |", "fg |", etc.), for which there is no prior knowledge about their number, length etc.

According to a similarly-titled previous StackOverflow post and the documentation of the Matcher class find() method, I just need to do something like

Pattern pattern = Pattern.compile(regex); // regex is the above regex
Matcher matcher = pattern.matcher(input); // input is the above string

while (matcher.find())
{
   System.out.println(matcher.group(1));
}

However, when I do this I just get 2 strings printed out: the last repeated substring ("x y|") and a null value; definitely not the 16 substrings I expect.

A nice thing would also be to check that a match has actually happened, before running the find() loop, but I am not sure whether matches(), groupCount() > 0, or some other condition should be used, without doing twice the matching work, given that find() also does matching.

So, questions:

  1. How can I get all the 16 repeated substrings?
  2. How can I get the last substring?
  3. How do I check that the string matched?

解决方案

If you must use the regular expression...

1) How can I get all the 16 repeated substrings?

See below. When cycling over for matches, you don't need everything to match, just the section you want. (I get 17 matches--is this correct?)

2) How can I get the last substring?

Switching the delim to the start of the regex and also allowing '^'.

3) How do I check that the string matched?

What qualifies for a non-match? Any string will match.


Here is a solution using regular expressions:

String input = "abcd  | e | fg | hijk | lmnop | |   | qrs |   t| uv| w |||||x   y|  z";
int expectedSize = 17;
List<String> expected = new ArrayList<String>(Arrays.asList("abcd  ", " e ", " fg ", " hijk ", " lmnop ", " ", "   ", " qrs ", "   t", " uv", " w ", "",
    "", "", "", "x   y", "  z"));

List<String> matches = new ArrayList<String>();

// Pattern pattern = Pattern.compile("(?:\\||^)([^\\|]*)");
Pattern pattern = Pattern.compile("(?:_?\\||^)([^\\|]*?)(?=_?\\||$)"); // Edit: allows _| or | as delim

for (Matcher matcher = pattern.matcher(input); matcher.find();)
{
  matches.add(matcher.group(1));
}

for (int idx = 0, len = matches.size(); idx < len; idx++)
{
  System.out.format("[%-2d] \"%s\"%n", idx + 1, matches.get(idx));
}

assertSame(expectedSize, matches.size());
assertEquals(expected, matches);

Output

[1 ] "abcd  "
[2 ] " e "
[3 ] " fg "
[4 ] " hijk "
[5 ] " lmnop "
[6 ] " "
[7 ] "   "
[8 ] " qrs "
[9 ] "   t"
[10] " uv"
[11] " w "
[12] ""
[13] ""
[14] ""
[15] ""
[16] "x   y"
[17] "  z"

这篇关于Java重复模式匹配(2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆