Java重复模式匹配(2) [英] Java repetitive pattern matching (2)
问题描述
考虑以下正则表达式:
(([^\|])*\|)*([^\|]*)
此匹配类型的重复字符串模式
("whatever except |" |) {0 to any times} ("whatever except |" |) {1 time}
因此它应该与以下字符串匹配,该字符串包含17个子字符串(重复16个子字符串,最后一个加上"z").
"abcd | e | fg | hijk | lmnop | | | qrs | t| uv| w |||||x y| z"
确实, RegexPal 验证给定的正则表达式确实与上述字符串匹配.
现在,我想获取每个子字符串(例如,"abcd |","e |","fg |"等),对于这些子字符串,它们的数量,长度等没有任何先验知识. /p>
根据标题相似的之前的StackOverflow帖子和Matcher
的文档类find()
方法,我只需要做类似的事情
Pattern pattern = Pattern.compile(regex); // regex is the above regex
Matcher matcher = pattern.matcher(input); // input is the above string
while (matcher.find())
{
System.out.println(matcher.group(1));
}
但是,当我这样做时,我只得到2个字符串打印出来:最后一个重复的子字符串("x y |")和一个空值;绝对不是我期望的16个子字符串.
一件好事也是在运行find()
循环之前检查是否确实发生了匹配,但是我不确定matches()
,groupCount() > 0
还是其他假设find()
也进行匹配,则应使用条件,而不需要进行两次匹配工作.
因此,问题:
- 如何获取所有16个重复的子字符串?
- 如何获取最后一个子字符串?
- 如何检查字符串是否匹配?
如果必须使用正则表达式...
1)如何获得所有16个重复的子字符串?
请参阅下文.循环进行比赛时,您不需要匹配的所有内容,只需要您想要的部分. (我有17场比赛,对吗?)
2)如何获取最后一个子字符串?
将delim切换到正则表达式的开头,并允许使用'^'.
3)如何检查字符串是否匹配?
什么才有资格不匹配?任何字符串都将匹配.
这是使用正则表达式的解决方案:
String input = "abcd | e | fg | hijk | lmnop | | | qrs | t| uv| w |||||x y| z";
int expectedSize = 17;
List<String> expected = new ArrayList<String>(Arrays.asList("abcd ", " e ", " fg ", " hijk ", " lmnop ", " ", " ", " qrs ", " t", " uv", " w ", "",
"", "", "", "x y", " z"));
List<String> matches = new ArrayList<String>();
// Pattern pattern = Pattern.compile("(?:\\||^)([^\\|]*)");
Pattern pattern = Pattern.compile("(?:_?\\||^)([^\\|]*?)(?=_?\\||$)"); // Edit: allows _| or | as delim
for (Matcher matcher = pattern.matcher(input); matcher.find();)
{
matches.add(matcher.group(1));
}
for (int idx = 0, len = matches.size(); idx < len; idx++)
{
System.out.format("[%-2d] \"%s\"%n", idx + 1, matches.get(idx));
}
assertSame(expectedSize, matches.size());
assertEquals(expected, matches);
输出
[1 ] "abcd "
[2 ] " e "
[3 ] " fg "
[4 ] " hijk "
[5 ] " lmnop "
[6 ] " "
[7 ] " "
[8 ] " qrs "
[9 ] " t"
[10] " uv"
[11] " w "
[12] ""
[13] ""
[14] ""
[15] ""
[16] "x y"
[17] " z"
Consider the following regex:
(([^\|])*\|)*([^\|]*)
This matches repetitive string patterns of the type
("whatever except |" |) {0 to any times} ("whatever except |" |) {1 time}
So it should match the following String, which has 17 substrings (16 repeated, plus " z" as the last one).
"abcd | e | fg | hijk | lmnop | | | qrs | t| uv| w |||||x y| z"
Indeed, RegexPal verifies that the given regex does match the above string.
Now, I want to get each of the substrings (i.e., "abcd |", "e |", "fg |", etc.), for which there is no prior knowledge about their number, length etc.
According to a similarly-titled previous StackOverflow post and the documentation of the Matcher
class find()
method, I just need to do something like
Pattern pattern = Pattern.compile(regex); // regex is the above regex
Matcher matcher = pattern.matcher(input); // input is the above string
while (matcher.find())
{
System.out.println(matcher.group(1));
}
However, when I do this I just get 2 strings printed out: the last repeated substring ("x y|") and a null value; definitely not the 16 substrings I expect.
A nice thing would also be to check that a match has actually happened, before running the find()
loop, but I am not sure whether matches()
, groupCount() > 0
, or some other condition should be used, without doing twice the matching work, given that find()
also does matching.
So, questions:
- How can I get all the 16 repeated substrings?
- How can I get the last substring?
- How do I check that the string matched?
If you must use the regular expression...
1) How can I get all the 16 repeated substrings?
See below. When cycling over for matches, you don't need everything to match, just the section you want. (I get 17 matches--is this correct?)
2) How can I get the last substring?
Switching the delim to the start of the regex and also allowing '^'.
3) How do I check that the string matched?
What qualifies for a non-match? Any string will match.
Here is a solution using regular expressions:
String input = "abcd | e | fg | hijk | lmnop | | | qrs | t| uv| w |||||x y| z";
int expectedSize = 17;
List<String> expected = new ArrayList<String>(Arrays.asList("abcd ", " e ", " fg ", " hijk ", " lmnop ", " ", " ", " qrs ", " t", " uv", " w ", "",
"", "", "", "x y", " z"));
List<String> matches = new ArrayList<String>();
// Pattern pattern = Pattern.compile("(?:\\||^)([^\\|]*)");
Pattern pattern = Pattern.compile("(?:_?\\||^)([^\\|]*?)(?=_?\\||$)"); // Edit: allows _| or | as delim
for (Matcher matcher = pattern.matcher(input); matcher.find();)
{
matches.add(matcher.group(1));
}
for (int idx = 0, len = matches.size(); idx < len; idx++)
{
System.out.format("[%-2d] \"%s\"%n", idx + 1, matches.get(idx));
}
assertSame(expectedSize, matches.size());
assertEquals(expected, matches);
Output
[1 ] "abcd "
[2 ] " e "
[3 ] " fg "
[4 ] " hijk "
[5 ] " lmnop "
[6 ] " "
[7 ] " "
[8 ] " qrs "
[9 ] " t"
[10] " uv"
[11] " w "
[12] ""
[13] ""
[14] ""
[15] ""
[16] "x y"
[17] " z"
这篇关于Java重复模式匹配(2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!