为什么某些ASCII字符不能以Java源代码中的'\ uXXXX'形式表示? [英] Why can some ASCII characters not be expressed in the form '\uXXXX' in Java source code?

查看:280
本文介绍了为什么某些ASCII字符不能以Java源代码中的'\ uXXXX'形式表示?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我今天偶然发现了这个问题:

I stumbled over this (again) today:

class Test {
    char ok = '\n';
    char okAsWell = '\u000B';
    char error = '\u000A';
}

它不编译:


第4行中的字符常量无效。

Invalid character constant in line 4.

编译器似乎坚持要我写' \ n'而不是。我认为没有理由,但它非常令人讨厌。

The compiler seems to insist that I write '\n' instead. I see no reason for this, yet it's very annoying.

是否有合理符号的字符(如 \t \ n \ r 必须在Java源代码中用该表单表示吗?

Is there a logical explanation why characters that have a special notation (like \t, \n, \r) must be expressed in that form in Java source?

推荐答案

Unicode字符被其值替换,所以你的行被编译器替换为:

Unicode characters are replaced by their value, so your line is replaced by the compiler with:

char error = '
';

这不是有效的Java语句。

which is not a valid Java statement.

这是由语言规范决定的


Java编程语言的编译器(Java编译器)首先在其输入中识别Unicode转义,转换ASCII字符\ u后跟四个十六进制数字到指定十六进制值的UTF-16代码单元(第3.1节),并且不更改所有其他字符。表示补充字符需要两个连续的Unicode转义。此转换步骤将生成一系列Unicode输入字符。

A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) of the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.

这可能导致令人惊讶的东西,例如,这是一个有效的Java程序(它包含隐藏的unicode字符) - 由Peter Lawrey提供

This can lead to surprising stuff, for example, this is a valid Java program (it contains hidden unicode characters) - courtesy of Peter Lawrey:

public static void main(String[] args) {
    for (char c‮h = 0; c‮h < Character.MAX_VALUE; c‮h++) {
        if (Character.isJavaIdentifierPart(c‮h) && !Character.isJavaIdentifierStart(c‮h)) {
            System.out.printf("%04x <%s>%n", (int) c‮h, "" + c‮h);
        }
    }
}

这篇关于为什么某些ASCII字符不能以Java源代码中的'\ uXXXX'形式表示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆