Java编译器平台文件编码问题 [英] Java compiler platform file encoding problem

查看:273
本文介绍了Java编译器平台文件编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的第一篇文章堆栈溢出。我从1998年以来一直在做Java,所以我不是初学者。最近我遇到一个文件字符编码问题,我不记得有没有面临。很常见的是要知道文本文件的字符编码,并编写在不同平台上运行时正确处理编码的代码。但是,我发现的问题是由执行平台的不同平台上的编译引起的。这是完全意想不到的,因为在我的经验中,当javac创建一个类文件,重要的参数是java源和目标params,以及JDK的版本做编译。在我的情况下,在Mac OS X上使用JDK 1.6.0_22编译的类与在Linux上使用1.6.0_23-b05编译的类不同,当在Mac OS X上运行时。指定的源和目标是1.4。

This is my first post to stack overflow. I've been doing Java since 1998, so I'm no beginner. Recently I encountered a file character encoding issue that I cannot remember ever having faced. It's quite common to have to be aware of character encoding of text files and write code that handles encoding correctly when run on different platforms. But the problem I found was caused by compilation on a different platform from the execution platform. That was entirely unexpected, because in my experience when javac creates a class file, the important parameters are the java source and target params, and the version of the JDK doing the compile. I my case, classes compiled with JDK 1.6.0_22 on Mac OS X behaved differently than classes compiled with 1.6.0_23-b05 on Linux, when run on Mac OS X. The specified source and target were 1.4.

在内存中编码为ISO-8859_1的字符串使用PrintStream println方法写入磁盘。根据哪个平台的Java代码被COMPILED,字符串被不同地写。这导致一个错误。错误的修复是在编写和读取文件时明确指定文件编码。

A String that was encoded as ISO-8859_1 in memory was written to disk using a PrintStream println method. Depending on which platform the Java code was COMPILED on, the string was written differently. This lead to a bug. The fix for the bug was to specify the file encoding explicitly when writing and reading the file.

令我惊讶的是,行为不同取决于类的编译位置,不是在哪个平台上运行类。我非常熟悉Java代码在不同平台上运行时的行为不同。但是,如果在不同平台上编译的同一代码 在同一平台上运行不同 ,则会有点吓人。

What surprised me was that the behavior differed depending on where the classes were compiled, not on which platform the class was run. I'm quite familiar with Java code behaving differently when run on different platforms. But it is a bit scary when the same code, compiled on different platforms, runs differently on the same platform.

有没有人遇到这个具体问题?它似乎对任何Java代码读取和写入字符串文件没有明确指定字符编码。

Has anyone encountered this specific problem? It would seem to bode ill for any Java code that reads and writes strings to file without explicitly specifying the character encoding. And how often is that done?

感谢,

Richard Brewster
http://rabbitsoftware.com

Richard Brewster http://rabbitsoftware.com

推荐答案

没有像在存储器中编码为ISO-8859-1的字符串这样的东西。内存中的Java字符串总是Unicode字符串。 (编码为UTF-16,但你现在不需要这么做)。

There are no such things like a a String that was encoded as ISO-8859-1 in memory. Java Strings in memory are always Unicode strings. (Encoded in UTF-16, but you don't really need to now this).

编码只在输入或输出字符串时才起作用 - 没有显式编码,它使用系统默认(在某些系统上取决于用户设置)。

The encoding comes only in play when you input or output the string - then, given no explicit encoding, it uses the system default (which on some systems depends on user settings).

如McDowell所说,你的源文件的实际编码应该是匹配你的编译器假设你的源文件,否则你会得到问题,你观察到。您可以通过以下几种方法实现此目的:

As said by McDowell, the actual encoding of your source file should be matched by the encoding which your compiler assumes about your source file, otherwise you get problems as you observed. You can achieve this by several means:


  • 使用 -encoding 编译器,给出您的源文件的编码。 (使用ant,您设置 encoding = 参数。)

  • 使用您的编辑器或任何其他工具 $ recode )将文件的编码更改为编译器默认值。

  • 使用 native2ascii \uXXXX -escapes
  • 将code> -encoding 选项)
  • Use the -encoding option of the compiler, giving the encoding of your source file. (With ant, you set the encoding= parameter.)
  • Use your editor or any other tool (like recode) to change the encoding of your file to the compiler default.
  • use native2ascii (with the right -encoding option) to translate your source file to ASCII with \uXXXX-escapes.

在最后一种情况下,以后可以使用每个默认编码编译这个文件,所以如果你给源代码编码不知道的人在某处编译。

In the last case, you later can compile this file everywhere with every default encoding, so this may be the way to go if you give the sourcecode to encoding-unaware persons to compile somewhere.

如果你有一个更大的项目包含多个文件,他们应该都有相同的编码,因为编译器只有一个这样的切换,而不是几个。

If you have a bigger project consisting of more than one file, they should all have the same encoding, since the compiler has only one such switch, not several.

在过去几年里我所有的项目,我总是编码所有我的文件在UTF-8, code> encoding =utf-8参数。 (我的编辑器足够聪明,可以自动识别编码,但我将默认值设置为UTF-8。)

In all projects I had in the last years, I always encode all my files in UTF-8, and in my ant buildfile set the encoding="utf-8" parameter to the javac task. (My editor is smart enough to automatically recognize the encoding, but I set the default to UTF-8.)

编码对其他源代码处理工具很重要,如javadoc。 (这里还应该有输出的 -charset -docencoding 选项 - 它们应该匹配,到源 - -encoding 。)

The encoding matters to other source-code handling tools to, like javadoc. (There you should additionally the -charset and -docencoding options for the output - they should match, but can be different to the source--encoding.)

这篇关于Java编译器平台文件编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆