正则表达式不适用于java 1.5 [英] Regex not working in java 1.5
问题描述
public static final String PATTERN = "(?<=(^|,))(([^\",]+)|\"([^\"]*)\")(?=($|,))";
public static void main(String[] args) {
String line = ",1234,ABC";
Matcher matcher = Pattern.compile(PATTERN).matcher(line);
while (matcher.find()) {
if (matcher.group(3) != null) {
System.out.println(matcher.group(3));
} else {
System.out.println(matcher.group(4));
}
}
}
我用上面的程序来解析字符串,1234,ABC
。解析后我应该获得3个令牌,如下所示:
I used the above program to parse the string ",1234,ABC"
. After parsing I should get 3 tokens as follows:
- 空字符串即
- 1234
- ABC
它似乎适用于Java 1.6,但它不适用于Java 1.5。
It seems to work on Java 1.6 but its not working on Java 1.5.
从Java 1.4开始,正则表达式是java,然后为什么我会遇到这样的问题?
Regex are in java since Java 1.4 then why am I facing such problem?
推荐答案
这是Java类库(Sun的实现,由Oracle接管)中的一个错误,至少在JRE 1.5 Update 18之前和JRE 1.6 Update 32之前(我测试过的两个版本)。
This is a bug in the Java Class Library (Sun's implementation, taken over by Oracle), at least up to JRE 1.5 Update 18 and before JRE 1.6 Update 32 (the 2 versions I tested on).
经过一些测试后,正面观察(?< = pattern)
的执行中存在一些错误,也是负面的后视(?<!pattern)
1,2 。 也许它与通过交替分隔的模式的不同宽度 3 时引擎回溯的方式有关 |
,在一个后视非捕获组内。
After some testing, there are some bugs in the implementation of positive look-behind (?<=pattern)
and also negative look-behind (?<!pattern)
1,2. Maybe it has something to do with how the engine backtracking when there are different width3 of the pattern separated by alternation |
, inside a look-behind non-capturing group.
交换后视中项目的顺序有时会工作 4 ,但是附录2显示它可能无法一直工作。
Swapping the order of items in the look-behind sometimes work4, but appendix 2 shows that it may not work all the time.
现在,似乎 从后面提取交替 是一种可能的解决方案。例如:交替(?< = pat1 | pat2 | pat3)
的后视转换为(?:(?< = ?PAT1)|(小于= PAT2)|(小于?= PAT3))
。重复直到后视中没有 |
。它似乎为我在下面使用的测试用例产生了正确的结果。
For now, it seems like extracting alternation out of the look-behind is a possible solution. For example: a look-behind with alternation (?<=pat1|pat2|pat3)
is converted to (?:(?<=pat1)|(?<=pat2)|(?<=pat3))
. Repeat until there is no |
inside the look-behind. It seems to produce correct result for the test cases I used below.
因此对于正则表达式,这是解决方法(假设原始的正确):
So for the regex in question, this is the workaround (assuming the original one is correct):
"(?:^|(?<=,))(?:([^\",]+)|\"([^\"]*)\")(?:$|(?=,))"
为了防止前瞻问题,我还将其替换为非捕获组,因为结果对于您的用例保持不变。 (测试尚未揭示存在错误,但以防万一。)虽然我不完全确定,但我想我们可以相信引擎能够正常工作至少(?< =,)
和(?=,)
。我也冒昧地减少了捕获组的数量,所以请重新计算它们。
Just in case there is problem with look-ahead, I also replace it with non-capturing group, since the result stays the same for your use case. (Testing has yet to reveal there is bug, but just in case.) Although I am not completely sure, I guess we can trust the engine to work correctly at least for (?<=,)
and (?=,)
. I also take the liberty to reduce the number of capturing groups, so please recount them.
附录
-
使用输入字符串
,abc,1234
和正则表达式进行测试( ?< = ^ | [,。])
和(?<!^ | [,。])
。 JRE 1.5u18和JRE 1.6u32之间的结果不同。对于正面后视(?< = ^ | [,。])
,输出中缺少位置1的匹配JRE 1.5u18,与JRE 1.6u32相比。相反,对于JRE 1.5u18,位置1出现在否定后视(?<!^ | [,。])$ c $的结果中c>,而JRE 1.6u32的输出不包含它。
Tested with input string
",abc,1234"
and the regex"(?<=^|[,.])"
and"(?<!^|[,.])"
. The results were different between JRE 1.5u18 and JRE 1.6u32. For positive look-behind"(?<=^|[,.])"
, the match at position 1 is missing from the output of JRE 1.5u18, compare to that of JRE 1.6u32. Instead, for JRE 1.5u18, position 1 appears in the result for negative look-behind"(?<!^|[,.])"
, while output of JRE 1.6u32 doesn't contain it.
看到这种互补行为并不令人意外,因为正面和负面的外观-behind完全相反。
It is not that much of a surprise to see this complementary behavior, as the positive and negative look-behind are exact opposite of each other.
输入字符串,abc,。的另一个测试。
和正则表达式(?< =,abc | [,。])
。第1位的匹配没有出现在JRE 1.5u18的结果列表中,与JRE 1.6u32相比。
Another test with input string ",abc,."
and the regex "(?<=,abc|[,.])"
. The match at position 1 does not appear in the list of result for JRE 1.5u18, compare to JRE 1.6u32.
如果我们交换周围的交替:(?< = [,。] |,abc)
,JRE 1.5u18的结果中缺少位置4的匹配,与JRE 1.6u32相比。
If we swap the alternation around: "(?<=[,.]|,abc)"
, the match at position 4 is missing from the result of JRE 1.5u18, compare to JRE 1.6u32.
可能不限于不同宽度,但我已经测试过。
May not limited to different width, but it is the case that I have tested.
I通过交换,1234,ABC,\sdfsdf,sdf \,sdfskhkf,
有效在交替中,> ^ 和,
,即更改(?< =(^ |,))
到(?< =(,| ^))
。
I can make the regex in the question works on this input ",1234,ABC,\"sdfsdf,sdf\",sdfskhkf,"
by swapping ^
and ,
in the alternation, i.e. change (?<=(^|,))
to (?<=(,|^))
.
这篇关于正则表达式不适用于java 1.5的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!