为什么\ R在Java 8和Java 9之间的正则表达式中表现不同? [英] Why does \R behave differently in regular expressions between Java 8 and Java 9?

查看:447
本文介绍了为什么\ R在Java 8和Java 9之间的正则表达式中表现不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码在Java 8& 9,但行为不同。

The following code compiles in both Java 8 & 9, but behaves differently.

class Simple {
    static String sample = "\nEn un lugar\r\nde la Mancha\nde cuyo nombre\r\nno quiero acordarme";

    public static void main(String args[]){
        String[] chunks = sample.split("\\R\\R");
        for (String chunk: chunks) {
            System.out.println("Chunk : "+chunk);
        }
    }
}

当我用Java运行它时8它返回:

When I run it with Java 8 it returns:

Chunk : 
En un lugar
de la Mancha
de cuyo nombre
no quiero acordarme

但是当我运行它时Java 9输出不同:

But when I run it with Java 9 the output is different:

Chunk : 
En un lugar
Chunk : de la Mancha
de cuyo nombre
Chunk : no quiero acordarme

为什么?

推荐答案

Java文档不符合Unicode标准。 Javadoc误认为 \ R 应匹配的内容。它显示:

The Java documentation is out of conformance with the Unicode Standard. The Javadoc mistates what \R is supposed to match. It reads:


\ R 任何Unicode换行序列,相当于 \ u000D \ u000A | [\\ 000A\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

\R Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

Java文档存在问题。在其有关R1.6换行符的部分,关于正则表达式的Unicode技术标准#18 中明确说明:

That Java documentation is buggy. In its section on R1.6 Line Breaks, Unicode Technical Standard #18 on Regular Expressions clearly states:


强烈建议使用正则表达式元字符,例如\ R,以匹配所有行结尾上面列出的字符和序列(例如,在#1中)。这将对应于与以下表达式等效的内容。 由于需要避免备份,因此表达式稍微复杂。

 (?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}]


换句话说,它只能匹配两个代码点CR + LF(回车+换行)序列 或者 来自该组的单个代码点,前提是只是单独的回车符接下来是换行。这是因为 不允许备份 。对于 \ R ,CRLF必须是原子的正确运行。

In other words, it can only match a two code-point CR+LF (carriage return + linefeed) sequence or else a single code-point from that set provided that it is not just a carriage return alone that is then followed by a linefeed. That’s because it is not allowed to back up. CRLF must be atomic for \R to function properly.

因此,Java 9不再符合R1.6强烈建议的内容。此外,它现在正在做一些它应该做的事情,并且做了不这样做,在Java 8中。

So Java 9 no longer conforms to what R1.6 strongly recommends. Moreover, it is now doing something that it was supposed to NOT do, and did not do, in Java 8.

看起来是时候让谢尔曼(读作:沉雪明)再次大喊大叫。我之前和他一起工作过事实真相正式合规事宜。

Looks like it's time for me to give Sherman (read: Xueming Shen) a holler again. I've worked with him before on these nitty-gritty matters of formal conformance.

这篇关于为什么\ R在Java 8和Java 9之间的正则表达式中表现不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆