一些字符文字为什么会导致Java的语法错误? [英] Why do some character literals cause Syntax Errors in Java?

查看:251
本文介绍了一些字符文字为什么会导致Java的语法错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在JavaSpecialists通讯的最新版本,作者提到的一段代码是

In the latest edition of JavaSpecialists newsletter, the author mentions a piece of code that is un-compilable in Java

public class A1 {
  Character aChar = '\u000d';
}

尝试编译它,你会得到一个错误,如:

Try compile it, and you will get an error, such as:

A1.java:2: illegal line end in character literal
              Character aChar = '\u000d';
                                ^



为什么的C#代码相当于一块不显示这样的问题?

Why an equivalent piece of c# code does not show such a problem?

public class CharacterFixture
{
  char aChar = '\u000d';
}



我缺少什么?

Am I missing anything?

编辑:我原来的问题的意图是C#编译器是如何得到Unicode文件解析正确(如果有的话),为什么java的应该还是与不正确的(如果有的话)解析坚守?
编辑:另外我想恢复myoriginal问题的标题?为什么这样一个沉重的编辑,我强烈怀疑这在很大程度上修改了我的意图。

My original intention of question was how c# compiler got unicode file parsing correct (if so) and why java should still stick with the incorrect(if so) parsing? Also i want myoriginal question title to be restored? Why such a heavy editing and i strongly suspect that it heavily modified my intentions.

推荐答案

Java的编译器将 \uxxxx 转义序列的第一个步骤之一,标记生成器的代码得到了裂纹,甚至之前。通过它实际上标记化开始的时候,有没有 \uxxxx 序列了;他们已经变成了他们所代表的字符,所以编译器的Java例子看起来一样,如果你真正的键入的回车在那里莫名其妙。它这样做是为了提供一种方法来在源内使用Unicode,不管源文件的编码的。即使是ASCII文本仍然可以完全表示Unicode字符(如有必要,在可读性的成本),并且由于它是这么早做了,你可以让他们在代码中几乎任何地方。 (你可以说 \\\c\\\l\\\a\\\s\\\s\\\ \\\S\\\t\\\u\\\f\\\f\\\ \\\{\\ \} ,编译器将它读作类的东西{} ,如果你想成为恼人的或折磨自己。)

Java's compiler translates \uxxxx escape sequences as one of the very first steps, even before the tokenizer gets a crack at the code. By the time it actually starts tokenizing, there are no \uxxxx sequences anymore; they're already turned into the chars they represent, so to the compiler your Java example looks the same as if you'd actually typed a carriage return in there somehow. It does this in order to provide a way to use Unicode within the source, regardless of the source file's encoding. Even ASCII text can still fully represent Unicode chars if necessary (at the cost of readability), and since it's done so early, you can have them almost anywhere in the code. (You could say \u0063\u006c\u0061\u0073\u0073\u0020\u0053\u0074\u0075\u0066\u0066\u0020\u007b\u007d, and the compiler would read it as class Stuff {}, if you wanted to be annoying or torture yourself.)

C#不这样做。 \uxxxx 是后来被翻译,与程序的其余部分,并且只在某些类型的标记(即,标识符和字符串/字符文字)有效。这意味着它不能在一定的地方也可以在Java中可以使用被使用。 cl\\\ass 不是一个关键字,例如。

C# doesn't do that. \uxxxx is translated later, with the rest of the program, and is only valid in certain types of tokens (namely, identifiers and string/char literals). This means it can't be used in certain places where it can be used in Java. cl\u0061ss is not a keyword, for example.

这篇关于一些字符文字为什么会导致Java的语法错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆