在 javac 中指定编码是否与在 Windows CMD 中更改活动代码页然后直接编译产生相同的结果? [英] Does specifying the encoding in javac yield the same results as changing the active code page in Windows CMD and then compiling directly?

查看:22
本文介绍了在 javac 中指定编码是否与在 Windows CMD 中更改活动代码页然后直接编译产生相同的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Windows-1250 编码在 Windows CMD 中编译一段 Java 代码,但我似乎无法让 -encoding 选项正常工作.

I am trying to compile a piece of Java code in Windows CMD using Windows-1250 encoding, and I can't seem to get the -encoding option to work right.

编译器似乎不会使用指定的编码,除非存在非法字符,在这种情况下它只会显示错误消息.否则它无论如何都会使用活动代码页.

The compiler just doesn't seem to use the specified encoding unless there are illegal characters, in which case it just displays the error message. Otherwise it uses the active code page anyway.

特别是,我试图显示一个包含阿尔巴尼亚语字符的字符串,特别是ë".

In particular, I am trying to display a string containing Albanian characters, specifically 'ë'.

我需要显示的字符串如下:

The string I need to display is as follows:

Hëllë Wërld

这是我正在使用的命令及其产生的输出:

Here are the commands I am using and the output they produce:

chcp
Output: Active code page: 437

javac -encoding Windows-1250 AlbanianHello.java

java AlbanianHello
Output: Hδllδ Wδrld

如您所见,它仍然使用默认编码,即 Cp437,即使我指定了我希望使用的编码.

As you can see, it still uses the default encoding, which is Cp437, even though I specified the encoding I wish to use.

现在这是当我将代码页更改为 1250 然后在不指定编码的情况下编译时发生的情况:

Now this is what happens when I change the code page to 1250 and then compile without specifying the encoding:

chcp 1250
Output: Active code page: 1250

javac AlbanianHello.java
java AlbanianHello
Output: Hëllë Wërld

似乎工作正常.

在这种情况下指定编码会产生相同的结果:

Specifying the encoding in this case yields the same results:

chcp 1250
Output: Active code page: 1250

javac -encoding Windows-1250 AlbanianHello.java
java AlbanianHello
Output: Hëllë Wërld

那么它是否完全忽略了我指定的编码?不完全的.当我尝试使用不应该与我的字符串一起使用的编码时,它会显示一堆错误消息:

So does it just completely ignore my specified encoding? Not quite. When I try to use the encoding that is not supposed to work with my string, it displays a bunch of error messages:

javac -encoding UTF8 AlbanianHello.java
Output: AlbanianHello.java:5: error: unmappable character for encoding UTF8
    System.out.println("H?ll? W?rld");
                         ^
...
3 errors

我的问题是:为什么它在理论上应该工作时忽略编码,而在不应该工作时不忽略它?

My question is: Why does it ignore the encoding when it should theoretically work, and doesn't ignore it when it shouldn't work?

我还想知道这些命令之间的结果是否有任何差异:

I would also like to know if there is any difference in the result between these commands:

chcp 1250
javac AlbanianHello.java

还有这些:

chcp 1250
javac -encoding Windows-1250 AlbanianHello.java

推荐答案

欢迎来到本站!javac 编码选项 设置javac 会将源文件中的字节映射到 Unicode 字符,因为 Java 在内部使用 Unicode.chcp 命令 设置Windows 控制台会将输出的字节映射到字体中的字形.Java 不知道也不关心 chcp,反之亦然.如果两者匹配,则一切正常.如果没有...

Welcome to the site! The javac encoding option sets how javac will map the bytes in your source file to Unicode characters, since Java uses Unicode internally. The chcp command sets how the Windows console will map bytes of output to glyphs in a font. Java doesn't know or care about chcp, and vice versa. If both match, all is well. If not...

在您的第一个示例中,Java 正确解释了您的 Windows-1250 源代码.字符 ëU+00EB.当该字节 (0xEB) 输出到代码页 437 终端时,显示的结果是 cp437 中字节 0xEB 的含义,无论你认为你想展示的东西.根据CP437字符表,即小写delta,δ.(只是为了突出区别,δU+03B4 Unicode.)

In your first example, Java correctly interprets your Windows-1250 source. Character ë is U+00EB. When that byte (0xEB) is output to a code-page 437 terminal, the displayed result is what byte 0xEB means in cp437, regardless of what you thought you wanted to display. Per the CP437 character table, that is lowercase delta, δ. (Just to highlight the difference, δ is U+03B4 in Unicode.)

为了完整性,结果证明 找出javac 的默认编码是什么.Charset 的文档 说:

For completeness, it turns out to be less than easy to find out what the default encoding for javac is. The docs for Charset say that:

默认字符集是在虚拟机启动期间确定的,通常取决于底层操作系统使用的区域设置和字符集.

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

根据您看到的行为,我猜测您系统上的 javac 正在从控制台读取代码页并将其用作默认值.要么是这样,要么默认是一个代码页,其中 ë = 0xEB(例如,CP1252ISO 8859-1,根据您的配置(据我所知),其中任何一个都可能是默认值.

Based on the behaviour you saw, I am guessing javac on your system is reading the code page from the console and using that as the default. Either that, or the default is a code page in which ë = 0xEB (e.g., CP1252 or ISO 8859-1, either of which might be the default depending on your configuration (as far as I know)).

编辑 在我的机器上,默认是 CP1252(Java 字符集名称 windows-1252).我已将我使用的代码放在 GitHub 上.

Edit On my machine, the default is CP1252 (Java charset name windows-1252). I have put the code I used on GitHub.

这篇关于在 javac 中指定编码是否与在 Windows CMD 中更改活动代码页然后直接编译产生相同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆