在 javac 中指定编码是否与在 Windows CMD 中更改活动代码页然后直接编译产生相同的结果? [英] Does specifying the encoding in javac yield the same results as changing the active code page in Windows CMD and then compiling directly?
问题描述
我正在尝试使用 Windows-1250 编码在 Windows CMD 中编译一段 Java 代码,但我似乎无法让 -encoding 选项正常工作.
I am trying to compile a piece of Java code in Windows CMD using Windows-1250 encoding, and I can't seem to get the -encoding option to work right.
编译器似乎不会使用指定的编码,除非存在非法字符,在这种情况下它只会显示错误消息.否则它无论如何都会使用活动代码页.
The compiler just doesn't seem to use the specified encoding unless there are illegal characters, in which case it just displays the error message. Otherwise it uses the active code page anyway.
特别是,我试图显示一个包含阿尔巴尼亚语字符的字符串,特别是ë".
In particular, I am trying to display a string containing Albanian characters, specifically 'ë'.
我需要显示的字符串如下:
The string I need to display is as follows:
Hëllë Wërld
这是我正在使用的命令及其产生的输出:
Here are the commands I am using and the output they produce:
chcp
Output: Active code page: 437
javac -encoding Windows-1250 AlbanianHello.java
java AlbanianHello
Output: Hδllδ Wδrld
如您所见,它仍然使用默认编码,即 Cp437,即使我指定了我希望使用的编码.
As you can see, it still uses the default encoding, which is Cp437, even though I specified the encoding I wish to use.
现在这是当我将代码页更改为 1250 然后在不指定编码的情况下编译时发生的情况:
Now this is what happens when I change the code page to 1250 and then compile without specifying the encoding:
chcp 1250
Output: Active code page: 1250
javac AlbanianHello.java
java AlbanianHello
Output: Hëllë Wërld
似乎工作正常.
在这种情况下指定编码会产生相同的结果:
Specifying the encoding in this case yields the same results:
chcp 1250
Output: Active code page: 1250
javac -encoding Windows-1250 AlbanianHello.java
java AlbanianHello
Output: Hëllë Wërld
那么它是否完全忽略了我指定的编码?不完全的.当我尝试使用不应该与我的字符串一起使用的编码时,它会显示一堆错误消息:
So does it just completely ignore my specified encoding? Not quite. When I try to use the encoding that is not supposed to work with my string, it displays a bunch of error messages:
javac -encoding UTF8 AlbanianHello.java
Output: AlbanianHello.java:5: error: unmappable character for encoding UTF8
System.out.println("H?ll? W?rld");
^
...
3 errors
我的问题是:为什么它在理论上应该工作时忽略编码,而在不应该工作时不忽略它?
My question is: Why does it ignore the encoding when it should theoretically work, and doesn't ignore it when it shouldn't work?
我还想知道这些命令之间的结果是否有任何差异:
I would also like to know if there is any difference in the result between these commands:
chcp 1250
javac AlbanianHello.java
还有这些:
chcp 1250
javac -encoding Windows-1250 AlbanianHello.java
推荐答案
欢迎来到本站!javac 编码选项 设置javac
会将源文件中的字节映射到 Unicode 字符,因为 Java 在内部使用 Unicode.chcp
命令 设置Windows 控制台会将输出的字节映射到字体中的字形.Java 不知道也不关心 chcp
,反之亦然.如果两者匹配,则一切正常.如果没有...
Welcome to the site! The javac encoding option sets how javac
will map the bytes in your source file to Unicode characters, since Java uses Unicode internally. The chcp
command sets how the Windows console will map bytes of output to glyphs in a font. Java doesn't know or care about chcp
, and vice versa. If both match, all is well. If not...
在您的第一个示例中,Java 正确解释了您的 Windows-1250 源代码.字符 ë
是 U+00EB代码>
.当该字节 (0xEB
) 输出到代码页 437 终端时,显示的结果是 cp437 中字节 0xEB
的含义,无论你认为你想展示的东西.根据CP437字符表,即小写delta,δ
.(只是为了突出区别,δ
是 U+03B4
Unicode.)
In your first example, Java correctly interprets your Windows-1250 source. Character ë
is U+00EB
. When that byte (0xEB
) is output to a code-page 437 terminal, the displayed result is what byte 0xEB
means in cp437, regardless of what you thought you wanted to display. Per the CP437 character table, that is lowercase delta, δ
. (Just to highlight the difference, δ
is U+03B4
in Unicode.)
为了完整性,结果证明 找出javac
的默认编码是什么.Charset
的文档 说:
For completeness, it turns out to be less than easy to find out what the default encoding for javac
is. The docs for Charset
say that:
默认字符集是在虚拟机启动期间确定的,通常取决于底层操作系统使用的区域设置和字符集.
The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
根据您看到的行为,我猜测您系统上的 javac
正在从控制台读取代码页并将其用作默认值.要么是这样,要么默认是一个代码页,其中 ë
= 0xEB
(例如,CP1252 或 ISO 8859-1,根据您的配置(据我所知),其中任何一个都可能是默认值.
Based on the behaviour you saw, I am guessing javac
on your system is reading the code page from the console and using that as the default. Either that, or the default is a code page in which ë
= 0xEB
(e.g., CP1252 or ISO 8859-1, either of which might be the default depending on your configuration (as far as I know)).
编辑 在我的机器上,默认是 CP1252(Java 字符集名称 windows-1252
).我已将我使用的代码放在 GitHub 上.
Edit On my machine, the default is CP1252 (Java charset name windows-1252
). I have put the code I used on GitHub.
这篇关于在 javac 中指定编码是否与在 Windows CMD 中更改活动代码页然后直接编译产生相同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!