为什么Java编译器在实际编译之前会剥离所有unicode字符? [英] Why is the java compiler stripping all unicode characters before the actual compilation?

查看:115
本文介绍了为什么Java编译器在实际编译之前会剥离所有unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Java的新手,我有这样的代码:

I am very new to Java and I have code like this:

    public class Puzzle {
        public static void main(String... args) {
            System.out.println("Hi Guys!");
  //        Character myChar = new Character('\u000d');
       }
    }

您可以看到以下行:

Character myChar = new Character('\u000d');

已被注释掉.但是,当我运行javac时,仍然会收到这样的错误:

is commented out. But still, I get an error like this when I run javac:

Puzzle.java:9: error: unclosed character literal
//        Character myChar = new Character('\u000d');
                                                  ^
1 error

在这篇博客中,我找到了原因例外.博客说:

In this blog post I found the reason for the exception. The blog says:

Java编译器,在实际编译将所有 unicode字符并将其转换为字符形式.这个解析是 完成了完整的源代码,其中还包括注释. 发生这种转换后,然后进行Java编译过程 继续.

Java compiler, just before the actual compilation strips out all the unicode characters and coverts it to character form. This parsing is done for the complete source code which includes the comments also. After this conversion happens then the Java compilation process continues.

在我们的代码中,当Java编译器遇到\ u000d时,它认为 作为换行符,并更改如下代码,

In our code, the when Java compiler encounters \u000d, it considers this as a newline and changes the code as below,

public class Puzzle {
    public static void main(String... args) {
        System.out.println("Hi Guys!");
//      Character myChar = new Character('
        ');
   }
}

对此,我有两个问题:

  1. 为什么Java首先解析unicode?有什么好处吗?
  2. 由于该行仍被注释,因此Java试图对其进行解析!这是唯一的情况吗?还是通常也解析注释行?我很困惑.
  1. Why does Java parse the unicode first? Are there any advantages to it?
  2. Because the line is still commented, Java is trying to parse it! Is this the only case it does? Or does it generally parse the commented lines too? I'm confused.

谢谢.

推荐答案

  1. 为什么Java首先解析unicode?有什么好处吗?

是的,在编译器进行词典分析之前,首先要替换unicode序列.

Yes, unicode sequences are first replaced before the compiler proceeds to lexicographical analysis.

引用Java™语言规范§3.3Unicode转义符 :

用于Java编程语言的编译器("Java编译器")首先在其输入中识别Unicode转义,将ASCII字符\ u后面跟四个十六进制数字转换为UTF-16代码单元( §3.1)为指定的十六进制值,并传递所有其他不变的字符.

A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) for the indicated hexadecimal value, and passing all other characters unchanged.

例如,以下源代码会导致错误:

So for example the following source code results in error:

// String s = "\u000d";

但这是有效的:

/*String s = "\u000d";*/

由于用新行替换\u000d时,它看起来像这样:

Because when \u000d is replaced with a new line it will look like this:

/*String s="
";*/

使用多行注释/* */完全可以.

Which is totally fine with the multi-line comment /* */.

还有以下代码:

public static void main(String[] args) {
    // Comment.\u000d System.out.println("I will be printed out");
    // Comment.\u000a System.out.println("Me too.");
}

将打印出:

I will be printed out
Me too.

因为在替换unicode之后,两个System.out.println()语句都将不在注释部分之外.

Because after the unicode replace both System.out.println() statements will be outside of comment sections.

要回答您的问题:替换Unicode必须花费一些时间.有人可能认为这应该在发表评论之前或之后进行.在删除评论之前,已选择执行此操作.

To answer your question: The unicode replace has to happen some time. One could argue that this should happen before or after taking out comments. A choice was made to do this before taking out the comments.

Reasonig可能是因为注释只是另一个词汇元素,并且在识别和分析通常要替换unicode序列的词汇元素之前.

Reasonig might be because the comment is just another lexical element and prior to identify and analyze lexical elements you usually want to replace unicode sequences.

请参见以下示例:

/\u002f This is a comment line

如果放置在Java源代码中,则不会引起编译错误,因为\u002f将被翻译为字符'/',并且与前面的'/'一起将成为行注释//的开始. >


If placed in a Java source, it causes no compile errors because \u002f will be translated to the character '/' and along with the preceeding '/' will form the start of a line comment //.

  1. 因为,该行仍然被注释,Java试图对其进行解析!这是唯一的情况吗?还是通常也解析注释行?我很困惑.

Java编译器不会分析注释,但仍必须对其进行解析才能知道它们的结束位置.

The Java compiler does not analyze comments but they still have to be parsed to know where they end.

这篇关于为什么Java编译器在实际编译之前会剥离所有unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆