为什么某些ASCII字符不能以Java源代码中的'\ uXXXX'形式表示? [英] Why can some ASCII characters not be expressed in the form '\uXXXX' in Java source code?
问题描述
我今天偶然发现了这个问题:
I stumbled over this (again) today:
class Test {
char ok = '\n';
char okAsWell = '\u000B';
char error = '\u000A';
}
它不编译:
第4行中的字符常量无效。
Invalid character constant in line 4.
编译器似乎坚持要我写' \ n'而不是。我认为没有理由,但它非常令人讨厌。
The compiler seems to insist that I write '\n' instead. I see no reason for this, yet it's very annoying.
是否有合理符号的字符(如 \t
, \ n
, \ r
)必须在Java源代码中用该表单表示吗?
Is there a logical explanation why characters that have a special notation (like \t
, \n
, \r
) must be expressed in that form in Java source?
推荐答案
Unicode字符被其值替换,所以你的行被编译器替换为:
Unicode characters are replaced by their value, so your line is replaced by the compiler with:
char error = '
';
这不是有效的Java语句。
which is not a valid Java statement.
这是由语言规范决定的:
Java编程语言的编译器(Java编译器)首先在其输入中识别Unicode转义,转换ASCII字符\ u后跟四个十六进制数字到指定十六进制值的UTF-16代码单元(第3.1节),并且不更改所有其他字符。表示补充字符需要两个连续的Unicode转义。此转换步骤将生成一系列Unicode输入字符。
A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) of the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.
这可能导致令人惊讶的东西,例如,这是一个有效的Java程序(它包含隐藏的unicode字符) - 由Peter Lawrey提供:
This can lead to surprising stuff, for example, this is a valid Java program (it contains hidden unicode characters) - courtesy of Peter Lawrey:
public static void main(String[] args) {
for (char ch = 0; ch < Character.MAX_VALUE; ch++) {
if (Character.isJavaIdentifierPart(ch) && !Character.isJavaIdentifierStart(ch)) {
System.out.printf("%04x <%s>%n", (int) ch, "" + ch);
}
}
}
这篇关于为什么某些ASCII字符不能以Java源代码中的'\ uXXXX'形式表示?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!