正则表达式不适用于java 1.5 [英] Regex not working in java 1.5

查看:222
本文介绍了正则表达式不适用于java 1.5的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

public static final String PATTERN = "(?<=(^|,))(([^\",]+)|\"([^\"]*)\")(?=($|,))";
public static void main(String[] args) {
    String line = ",1234,ABC";
    Matcher matcher = Pattern.compile(PATTERN).matcher(line);
    while (matcher.find()) {
        if (matcher.group(3) != null) {
            System.out.println(matcher.group(3));
        } else {
            System.out.println(matcher.group(4));
        }
    }
}

我用上面的程序来解析字符串,1234,ABC。解析后我应该获得3个令牌,如下所示:

I used the above program to parse the string ",1234,ABC". After parsing I should get 3 tokens as follows:


  1. 空字符串即

  2. 1234

  3. ABC

它似乎适用于Java 1.6,但它不适用于Java 1.5。

It seems to work on Java 1.6 but its not working on Java 1.5.

从Java 1.4开始,正则表达式是java,然后为什么我会遇到这样的问题?

Regex are in java since Java 1.4 then why am I facing such problem?

推荐答案

这是Java类库(Sun的实现,由Oracle接管)中的一个错误,至少在JRE 1.5 Update 18之前和JRE 1.6 Update 32之前(我测试过的两个版本)。

This is a bug in the Java Class Library (Sun's implementation, taken over by Oracle), at least up to JRE 1.5 Update 18 and before JRE 1.6 Update 32 (the 2 versions I tested on).

经过一些测试后,正面观察(?< = pattern)的执行中存在一些错误,也是负面的后视(?<!pattern) 1,2 也许它与通过交替分隔的模式的不同宽度 3 时引擎回溯的方式有关 | ,在一个后视非捕获组内。

After some testing, there are some bugs in the implementation of positive look-behind (?<=pattern) and also negative look-behind (?<!pattern)1,2. Maybe it has something to do with how the engine backtracking when there are different width3 of the pattern separated by alternation |, inside a look-behind non-capturing group.

交换后视中项目的顺序有时会工作 4 ,但是附录2显示它可能无法一直工作。

Swapping the order of items in the look-behind sometimes work4, but appendix 2 shows that it may not work all the time.

现在,似乎 从后面提取交替 是一种可能的解决方案。例如:交替(?< = pat1 | pat2 | pat3)的后视转换为(?:(?< = ?PAT1)|(小于= PAT2)|(小于?= PAT3))。重复直到后视中没有 | 。它似乎为我在下面使用的测试用例产生了正确的结果。

For now, it seems like extracting alternation out of the look-behind is a possible solution. For example: a look-behind with alternation (?<=pat1|pat2|pat3) is converted to (?:(?<=pat1)|(?<=pat2)|(?<=pat3)). Repeat until there is no | inside the look-behind. It seems to produce correct result for the test cases I used below.

因此对于正则表达式,这是解决方法(假设原始的正确):

So for the regex in question, this is the workaround (assuming the original one is correct):

"(?:^|(?<=,))(?:([^\",]+)|\"([^\"]*)\")(?:$|(?=,))"

为了防止前瞻问题,我还将其替换为非捕获组,因为结果对于您的用例保持不变。 (测试尚未揭示存在错误,但以防万一。)虽然我不完全确定,但我想我们可以相信引擎能够正常工作至少(?< =,)(?=,)。我也冒昧地减少了捕获组的数量,所以请重新计算它们。

Just in case there is problem with look-ahead, I also replace it with non-capturing group, since the result stays the same for your use case. (Testing has yet to reveal there is bug, but just in case.) Although I am not completely sure, I guess we can trust the engine to work correctly at least for (?<=,) and (?=,). I also take the liberty to reduce the number of capturing groups, so please recount them.

附录


  1. 使用输入字符串,abc,1234和正则表达式进行测试( ?< = ^ | [,。])(?<!^ | [,。])。 JRE 1.5u18和JRE 1.6u32之间的结果不同。对于正面后视(?< = ^ | [,。]),输出中缺少位置1的匹配JRE 1.5u18,与JRE 1.6u32相比。相反,对于JRE 1.5u18,位置1出现在否定后视(?<!^ | [,。]),而JRE 1.6u32的输出不包含它。

  1. Tested with input string ",abc,1234" and the regex "(?<=^|[,.])" and "(?<!^|[,.])". The results were different between JRE 1.5u18 and JRE 1.6u32. For positive look-behind "(?<=^|[,.])", the match at position 1 is missing from the output of JRE 1.5u18, compare to that of JRE 1.6u32. Instead, for JRE 1.5u18, position 1 appears in the result for negative look-behind "(?<!^|[,.])", while output of JRE 1.6u32 doesn't contain it.

看到这种互补行为并不令人意外,因为正面和负面的外观-behind完全相反。

It is not that much of a surprise to see this complementary behavior, as the positive and negative look-behind are exact opposite of each other.

输入字符串,abc,。的另一个测试。和正则表达式(?< =,abc | [,。])。第1位的匹配没有出现在JRE 1.5u18的结果列表中,与JRE 1.6u32相比。

Another test with input string ",abc,." and the regex "(?<=,abc|[,.])". The match at position 1 does not appear in the list of result for JRE 1.5u18, compare to JRE 1.6u32.

如果我们交换周围的交替:(?< = [,。] |,abc),JRE 1.5u18的结果中缺少位置4的匹配,与JRE 1.6u32相比。

If we swap the alternation around: "(?<=[,.]|,abc)", the match at position 4 is missing from the result of JRE 1.5u18, compare to JRE 1.6u32.

可能不限于不同宽度,但我已经测试过。

May not limited to different width, but it is the case that I have tested.

I通过交换,1234,ABC,\sdfsdf,sdf \,sdfskhkf,有效在交替中,> ^ 和,即更改(?< =(^ |,))(?< =(,| ^))

I can make the regex in the question works on this input ",1234,ABC,\"sdfsdf,sdf\",sdfskhkf," by swapping ^ and , in the alternation, i.e. change (?<=(^|,)) to (?<=(,|^)).

这篇关于正则表达式不适用于java 1.5的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆