为什么 Java 允许在源代码中使用转义的 unicode 字符? [英] Why does Java permit escaped unicode characters in the source code?

查看:29
本文介绍了为什么 Java 允许在源代码中使用转义的 unicode 字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近了解到 Java 源代码中允许使用 Unicode,这不仅是因为Unicode 字符(例如 double π = Math.PI; )以及转义序列(例如 double u03C0 = Math.PI; ).

I recently learned that Unicode is permitted within Java source code not only as Unicode characters (eg. double π = Math.PI; ) but also as escaped sequences (eg. double u03C0 = Math.PI; ).

第一个变体对我来说很有意义 - 它允许程序员用他们选择的国际语言命名变量和方法.但是,我没有看到第二种方法的任何实际应用.

The first variant makes sense to me - it allows programmers to name variables and methods in an international language of their choice. However, I don't see any practical application of the second approach.

这里有几段代码来说明用法,在 Java SE 6 和 NetBeans 6.9.1 上测试过:

Here are a few pieces of code to illustrate usage, tested with Java SE 6 and NetBeans 6.9.1:

此代码将打印出 3.141592653589793

This code will print out 3.141592653589793

public static void main(String[] args) {
    double π = Math.PI;
    System.out.println(u03C0);
}

说明:π和u03C0是同一个Unicode字符

Explanation: π and u03C0 are the same Unicode character

此代码不会打印任何内容

This code will not print out anything

public static void main(String[] args) {
    double π = Math.PI; /u002A
    System.out.println(π);

    /* a comment */
}

说明:上面的代码实际上是编码:

Explanation: The code above actually encodes:

public static void main(String[] args) {
    double π = Math.PI; /*
    System.out.println(π);

    /* a comment */
}

注释掉打印语句.

仅从我的示例中,我注意到此语言功能存在许多潜在问题.

Just from my examples, I notice a number of potential problems with this language feature.

首先,一个糟糕的程序员可能会用它来秘密地注释掉一些代码,或者创建多种方法来识别相同的变量.也许还有其他我没有想到的可怕的事情可以做.

First, a bad programmer could use it to secretly comment out bits of code, or create multiple ways of identifying the same variable. Perhaps there are other horrible things that can be done that I haven't thought of.

其次,IDE 之间似乎缺乏支持.NetBeans 和 Eclipse 都没有为示例提供正确的代码突出显示.事实上,NetBeans 甚至标记了一个语法错误(虽然编译不是问题).

Second, there seems to be a lack of support among IDEs. Neither NetBeans nor Eclipse provided the correct code highlighting for the examples. In fact, NetBeans even marked a syntax error (though compilation was not a problem).

最后,这个功能的文档很差,不被普遍接受.为什么程序员会在他的代码中使用其他程序员无法识别和理解的东西?事实上,我什至在 隐藏的 Java 功能问题 上都找不到相关内容.

Finally, this feature is poorly documented and not commonly accepted. Why would a programmer use something in his code that other programmers will not be able to recognize and understand? In fact, I couldn't even find something about this on the Hidden Java Features question.

我的问题是:

为什么 Java 允许在语法中使用转义的 Unicode 序列?尽管有许多缺点",但此功能的哪些优点"使其成为 Java 的一部分?

Why does Java allow escaped Unicode sequences to be used within syntax? What are some "pros" of this feature that have allowed it to stay a part Java, despite its many "cons"?

推荐答案

Unicode 转义序列允许您以纯 ASCII 存储和传输源代码,并且仍然使用整个 Unicode 字符范围.这有两个优点:

Unicode escape sequences allow you to store and transmit your source code in pure ASCII and still use the entire range of Unicode characters. This has two advantages:

  • 没有非 ASCII 字符被无法处理的工具破坏的风险.在 1990 年代初期设计 Java 时,这确实是一个令人担忧的问题.发送包含非 ASCII 字符的电子邮件并使其未损坏是例外而不是常态.

  • No risk of non-ASCII characters getting broken by tools that can't handle them. This was a real concern back in the early 1990s when Java was designed. Sending an email containing non-ASCII characters and having it arrive unmangled was the exception rather than the norm.

无需告诉编译器和编辑器/IDE 使用哪种编码来解释源代码.这仍然是一个非常有效的担忧.当然,更好的解决方案是将编码作为文件头中的元数据(如在 XML 中),但这在当时还没有成为最佳实践.

No need to tell the compiler and editor/IDE which encoding to use for interpreting the source code. This is still a very valid concern. Of course, a much better solution would have been to have the encoding as metadata in a file header (as in XML), but this hadn't yet emerged as a best practice back then.

第一个变体对我来说很有意义 -它允许程序员命名变量和方法他们的国际语言选择.然而,我没有看到任何第二节的实际应用方法.

The first variant makes sense to me - it allows programmers to name variables and methods in an international language of their choice. However, I don't see any practical application of the second approach.

两者都将产生完全相同的字节码,并具有与语言功能相同的功能.唯一的区别在于源代码.

Both will result in exactly the same byte code and have the same power as a language feature. The only difference is in the source code.

首先,一个糟糕的程序员可以使用它秘密地注释掉一些代码,或创建多种识别方式相同的变量.

First, a bad programmer could use it to secretly comment out bits of code, or create multiple ways of identifying the same variable.

如果您担心程序员故意破坏您代码的可读性,那么此语言功能是您遇到的最少问题.

If you're concerned about a programmer deliberately sabotaging your code's readability, this language feature is the least of your problems.

其次,IDE 之间似乎缺乏支持.

Second, there seems to be a lack of support among IDEs.

这几乎不是功能或其设计者的错.但是,我认为它从未打算手动"使用.理想情况下,IDE 可以选择让您正常输入字符并正常显示它们,但会自动将它们保存为 Unicode 转义序列.甚至可能已经有插件或配置选项使 IDE 具有这种行为.

That's hardly the fault of the feature or its designers. But then, I don't think it was ever intended to be used "manually". Ideally, the IDE would have an option to have you enter the characters normally and have them displayed normally, but automatically save them as Unicode escape sequences. There may even already be plugins or configuration options that makes the IDEs behave that way.

但总的来说,此功能似乎很少使用,因此可能得不到很好的支持.但是 1993 年左右设计 Java 的人怎么会知道这一点?

But in general, this feature seems to be very rarely used and probably therefore badly supported. But how could the people who designed Java around 1993 have known that?

这篇关于为什么 Java 允许在源代码中使用转义的 unicode 字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆