Java编译平台文件编码问题 [英] Java compiler platform file encoding problem

查看:22
本文介绍了Java编译平台文件编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我遇到了一个我不记得曾经遇到过的文件字符编码问题.必须了解文本文件的字符编码并编写在不同平台上运行时正确处理编码的代码是很常见的.但是我发现的问题是在与执行平台不同的平台上编译引起的.这是完全出乎意料的,因为根据我在 javac 创建类文件时的经验,重要的参数是 java 源和目标参数,以及进行编译的 JDK 版本.我的情况是,在 Mac OS X 上使用 JDK 1.6.0_22 编译的类与在 Linux 上使用 1.6.0_23-b05 编译的类在 Mac OS X 上运行时的行为不同.指定的源和目标是 1.4.

Recently I encountered a file character encoding issue that I cannot remember ever having faced. It's quite common to have to be aware of character encoding of text files and write code that handles encoding correctly when run on different platforms. But the problem I found was caused by compilation on a different platform from the execution platform. That was entirely unexpected, because in my experience when javac creates a class file, the important parameters are the java source and target params, and the version of the JDK doing the compile. I my case, classes compiled with JDK 1.6.0_22 on Mac OS X behaved differently than classes compiled with 1.6.0_23-b05 on Linux, when run on Mac OS X. The specified source and target were 1.4.

使用 PrintStream println 方法将内存中编码为 ISO-8859_1 的字符串写入磁盘.根据编译 Java 代码的平台,字符串的编写方式不同.这会导致一个错误.该错误的修复是在写入和读取文件时明确指定文件编码.

A String that was encoded as ISO-8859_1 in memory was written to disk using a PrintStream println method. Depending on which platform the Java code was COMPILED on, the string was written differently. This lead to a bug. The fix for the bug was to specify the file encoding explicitly when writing and reading the file.

令我惊讶的是,行为因类的编译位置而异,而不是类在哪个平台上运行.我非常熟悉 Java 代码在不同平台上运行时的不同行为.但是当相同的代码在不同平台上编译时在同一平台上运行不同,这有点可怕.

What surprised me was that the behavior differed depending on where the classes were compiled, not on which platform the class was run. I'm quite familiar with Java code behaving differently when run on different platforms. But it is a bit scary when the same code, compiled on different platforms, runs differently on the same platform.

有没有人遇到过这个特定的问题?对于在没有明确指定字符编码的情况下读取和写入字符串到文件的任何 Java 代码来说,这似乎是一个不祥之兆.多久做一次?

Has anyone encountered this specific problem? It would seem to bode ill for any Java code that reads and writes strings to file without explicitly specifying the character encoding. And how often is that done?

推荐答案

没有像在内存中编码为 ISO-8859-1 的字符串这样的东西.内存中的 Java 字符串始终是 Unicode 字符串.(以 UTF-16 编码(截至 2011 年 - 我认为它随着更高的 Java 版本而改变),但您现在不需要这样做).

There are no such things like a a String that was encoded as ISO-8859-1 in memory. Java Strings in memory are always Unicode strings. (Encoded in UTF-16 (as of 2011 – I think it changed with later Java versions), but you don't really need to now this).

编码仅在您输入或输出字符串时起作用 - 然后,在没有明确编码的情况下,它使用系统默认值(在某些系统上取决于用户设置).

The encoding comes only in play when you input or output the string - then, given no explicit encoding, it uses the system default (which on some systems depends on user settings).

正如 McDowell 所说,你的源文件的实际编码应该与你的编译器对你的源文件的编码相匹配,否则你会遇到问题.您可以通过多种方式实现这一点:

As said by McDowell, the actual encoding of your source file should be matched by the encoding which your compiler assumes about your source file, otherwise you get problems as you observed. You can achieve this by several means:

  • 使用编译器的 -encoding 选项,给出源文件的编码.(使用 ant,您可以设置 encoding= 参数.)
  • 使用您的编辑器或任何其他工具(如 recode)将文件的编码更改为编译器默认值.
  • 使用 native2ascii(使用正确的 -encoding 选项)通过 uXXXX-escapes 将您的源文件转换为 ASCII.
  • Use the -encoding option of the compiler, giving the encoding of your source file. (With ant, you set the encoding= parameter.)
  • Use your editor or any other tool (like recode) to change the encoding of your file to the compiler default.
  • use native2ascii (with the right -encoding option) to translate your source file to ASCII with uXXXX-escapes.

在最后一种情况下,您以后可以使用每种默认编码在任何地方编译此文件,因此如果您将源代码提供给不了解编码的人在某处进行编译,这可能是可行的方法.

In the last case, you later can compile this file everywhere with every default encoding, so this may be the way to go if you give the sourcecode to encoding-unaware persons to compile somewhere.

如果你有一个由多个文件组成的更大的项目,它们都应该有相同的编码,因为编译器只有一个这样的开关,而不是几个.

If you have a bigger project consisting of more than one file, they should all have the same encoding, since the compiler has only one such switch, not several.

在我过去几年的所有项目中,我总是用 UTF-8 编码我的所有文件,并在我的 ant 构建文件中将 encoding="utf-8" 参数设置为 javac 任务.(我的编辑器足够聪明,可以自动识别编码,但我将默认设置为 UTF-8.)

In all projects I had in the last years, I always encode all my files in UTF-8, and in my ant buildfile set the encoding="utf-8" parameter to the javac task. (My editor is smart enough to automatically recognize the encoding, but I set the default to UTF-8.)

编码对于其他源代码处理工具很重要,例如 javadoc.(在那里,您还应该为输出添加 -charset-docencoding 选项 - 它们应该匹配,但可以与 source--encoding 不同>.)

The encoding matters to other source-code handling tools to, like javadoc. (There you should additionally the -charset and -docencoding options for the output - they should match, but can be different to the source--encoding.)

这篇关于Java编译平台文件编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆