为什么Java允许在源代码中转义unicode字符? [英] Why does Java permit escaped unicode characters in the source code?

查看:92
本文介绍了为什么Java允许在源代码中转义unicode字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近了解到在Java源代码中允许Unicode不仅仅是Unicode字符(例如doubleπ= Math.PI; ),但也作为转义序列(例如。 double \ u03C0 = Math.PI; )。



第一个变体对我有意义 - 它允许程序员用他们选择的国际语言命名变量和方法。但是,我没有看到第二种方法的任何实际应用。



以下是一些用于说明用法的代码,使用Java SE 6和NetBeans 6.9进行了测试。 1:



此代码将打印出来3.141592653589793

  public static void main(String [] args){
doubleπ= Math.PI;
System.out.println(\\\ 03C0);
}

说明:π和\ u03C0是相同的Unicode字符



此代码不会打印任何内容

  public static void main(String [] args){
doubleπ= Math.PI; / \\\ 0000A
System.out.println(π);

/ *评论* /
}

说明:上面的代码实际编码:

  public static void main(String [] args){
doubleπ= Math。 PI; / *
System.out.println(π);

/ *评论* /
}

哪些评论打印出来。



仅从我的例子中,我注意到这个语言功能存在许多潜在的问题。



<首先,一个糟糕的程序员可以使用它来秘密注释掉一些代码,或创建多种识别相同变量的方法。也许还有其他可怕的事情可以做,我没有想过。



其次,IDE之间似乎缺乏支持。 NetBeans和Eclipse都没有为示例提供正确的代码突出显示。事实上,NetBeans甚至标记了语法错误(虽然编译不是问题)。



最后,此功能的记录很少,并且不被普遍接受。为什么程序员会在他的代码中使用其他程序员无法识别和理解的东西?事实上,我甚至无法在隐藏Java功能问题上找到相关信息。



我的问题是:



为什么Java允许在语法中使用转义的Unicode序列?
虽然有很多缺点,但是这个功能的一些优点使它能够成为Java的一部分吗?

解决方案

Unicode转义序列允许您以纯ASCII存储和传输源代码,并仍使用整个Unicode字符范围。这有两个好处:




  • 非ASCII字符不会被无法处理它们的工具破坏。这是在20世纪90年代早期设计Java时的一个真正的问题。发送包含非ASCII字符并使其无法到达的电子邮件是例外而非常态。


  • 无需告诉编译器和编辑器/ IDE用于解释源代码的编码。这仍然是一个非常有效的问题。当然,一个更好的解决方案是将编码作为元数据放在文件头中(如在XML中),但当时尚未成为最佳实践。





第一个变体对我有意义 -
它允许程序员在
中命名变量和方法
他们的
选择的国际语言。但是,我没有看到第二个
方法的任何
实际应用。


两者都将导致完全相同的字节代码,并具有与语言功能相同的功能。唯一的区别在于源代码。


首先,一个糟糕的程序员可以使用
来秘密注释掉一些代码,
或创建多种方法来识别
相同的变量。


如果你关心程序员故意破坏您的代码的可读性,这种语言功能是您遇到的最少问题。


其次,似乎是IDE之间缺乏支持。


这不是该功能或其设计者的错。但是,我认为它并不打算手动使用。理想情况下,IDE可以选择让您正常输入字符并使它们正常显示,但会自动将它们保存为Unicode转义序列。甚至可能已经存在使IDE以这种方式运行的插件或配置选项。



但是一般来说,这个功能似乎很少使用,因此可能因此受到严重支持。但是,1993年左右设计Java的人怎么会知道呢?


I recently learned that Unicode is permitted within Java source code not only as Unicode characters (eg. double π = Math.PI; ) but also as escaped sequences (eg. double \u03C0 = Math.PI; ).

The first variant makes sense to me - it allows programmers to name variables and methods in an international language of their choice. However, I don't see any practical application of the second approach.

Here are a few pieces of code to illustrate usage, tested with Java SE 6 and NetBeans 6.9.1:

This code will print out 3.141592653589793

public static void main(String[] args) {
    double π = Math.PI;
    System.out.println(\u03C0);
}

Explanation: π and \u03C0 are the same Unicode character

This code will not print out anything

public static void main(String[] args) {
    double π = Math.PI; /\u002A
    System.out.println(π);

    /* a comment */
}

Explanation: The code above actually encodes:

public static void main(String[] args) {
    double π = Math.PI; /*
    System.out.println(π);

    /* a comment */
}

Which comments out the print satement.

Just from my examples, I notice a number of potential problems with this language feature.

First, a bad programmer could use it to secretly comment out bits of code, or create multiple ways of identifying the same variable. Perhaps there are other horrible things that can be done that I haven't thought of.

Second, there seems to be a lack of support among IDEs. Neither NetBeans nor Eclipse provided the correct code highlighting for the examples. In fact, NetBeans even marked a syntax error (though compilation was not a problem).

Finally, this feature is poorly documented and not commonly accepted. Why would a programmer use something in his code that other programmers will not be able to recognize and understand? In fact, I couldn't even find something about this on the Hidden Java Features question.

My question is this:

Why does Java allow escaped Unicode sequences to be used within syntax? What are some "pros" of this feature that have allowed it to stay a part Java, despite its many "cons"?

解决方案

Unicode escape sequences allow you to store and transmit your source code in pure ASCII and still use the entire range of Unicode characters. This has two advantages:

  • No risk of non-ASCII characters getting broken by tools that can't handle them. This was a real concern back in the early 1990s when Java was designed. Sending an email containing non-ASCII characters and having it arrive unmangled was the exception rather than the norm.

  • No need to tell the compiler and editor/IDE which encoding to use for interpreting the source code. This is still a very valid concern. Of course, a much better solution would have been to have the encoding as metadata in a file header (as in XML), but this hadn't yet emerged as a best practice back then.

The first variant makes sense to me - it allows programmers to name variables and methods in an international language of their choice. However, I don't see any practical application of the second approach.

Both will result in exactly the same byte code and have the same power as a language feature. The only difference is in the source code.

First, a bad programmer could use it to secretly comment out bits of code, or create multiple ways of identifying the same variable.

If you're concerned about a programmer deliberately sabotaging your code's readability, this language feature is the least of your problems.

Second, there seems to be a lack of support among IDEs.

That's hardly the fault of the feature or its designers. But then, I don't think it was ever intended to be used "manually". Ideally, the IDE would have an option to have you enter the characters normally and have them displayed normally, but automatically save them as Unicode escape sequences. There may even already be plugins or configuration options that makes the IDEs behave that way.

But in general, this feature seems to be very rarely used and probably therefore badly supported. But how could the people who designed Java around 1993 have known that?

这篇关于为什么Java允许在源代码中转义unicode字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆