在javac中指定编码产生与更改Windows CMD中的活动代码页然后直接编译相同的结果? [英] Does specifying the encoding in javac yield the same results as changing the active code page in Windows CMD and then compiling directly?

查看:223
本文介绍了在javac中指定编码产生与更改Windows CMD中的活动代码页然后直接编译相同的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Windows-1250编码在Windows CMD中编译一段Java代码,我似乎无法使-encoding选项正常工作。

I am trying to compile a piece of Java code in Windows CMD using Windows-1250 encoding, and I can't seem to get the -encoding option to work right.

编译器似乎没有使用指定的编码,除非有非法字符,在这种情况下,它只显示错误消息。否则它会使用活动代码页。

The compiler just doesn't seem to use the specified encoding unless there are illegal characters, in which case it just displays the error message. Otherwise it uses the active code page anyway.

特别是,我试图显示一个包含阿尔巴尼亚字符的字符串,特别是'ë'。

In particular, I am trying to display a string containing Albanian characters, specifically 'ë'.

我需要显示的字符串如下:

The string I need to display is as follows:

Hëllë Wërld

以下是我使用的命令及其生成的输出:

Here are the commands I am using and the output they produce:

chcp
Output: Active code page: 437

javac -encoding Windows-1250 AlbanianHello.java

java AlbanianHello
Output: Hδllδ Wδrld

如您所见,它仍然使用默认编码, Cp437,尽管我指定了我想使用的编码。

As you can see, it still uses the default encoding, which is Cp437, even though I specified the encoding I wish to use.

现在,当我将代码页更改为1250,然后编译而不指定编码时,会发生什么: / p>

Now this is what happens when I change the code page to 1250 and then compile without specifying the encoding:

chcp 1250
Output: Active code page: 1250

javac AlbanianHello.java
java AlbanianHello
Output: Hëllë Wërld

似乎正常工作。

在此处指定编码案例产生相同的结果:

Specifying the encoding in this case yields the same results:

chcp 1250
Output: Active code page: 1250

javac -encoding Windows-1250 AlbanianHello.java
java AlbanianHello
Output: Hëllë Wërld



那么它完全忽略了我指定的编码呢?不完全的。当我尝试使用不应该使用我的字符串的编码时,它会显示一堆错误消息:

So does it just completely ignore my specified encoding? Not quite. When I try to use the encoding that is not supposed to work with my string, it displays a bunch of error messages:

javac -encoding UTF8 AlbanianHello.java
Output: AlbanianHello.java:5: error: unmappable character for encoding UTF8
    System.out.println("H?ll? W?rld");
                         ^
...
3 errors

我的问题是:
为什么在理论上工作时会忽略编码,而不应该忽略它呢?

My question is: Why does it ignore the encoding when it should theoretically work, and doesn't ignore it when it shouldn't work?

我也想知道这些命令之间的结果是否有差异:

I would also like to know if there is any difference in the result between these commands:

chcp 1250
javac AlbanianHello.java

这些:

chcp 1250
javac -encoding Windows-1250 AlbanianHello.java


推荐答案

欢迎来到网站! javac encoding选项设置如何 javac 将映射源文件中的字节为Unicode字符,因为 Java在内部使用Unicode chcp 命令设置Windows控制台如何将字节的字节映射到字体中的字形。 Java不知道或关心 chcp ,反之亦然。如果两者都匹配,一切都很好。如果没有...

Welcome to the site! The javac encoding option sets how javac will map the bytes in your source file to Unicode characters, since Java uses Unicode internally. The chcp command sets how the Windows console will map bytes of output to glyphs in a font. Java doesn't know or care about chcp, and vice versa. If both match, all is well. If not...

在第一个例子中,Java正确解释了Windows-1250源代码。角色ë U + 00EB 。当该字节( 0xEB )被输出到码页437终端时,显示的结果是什么字节 0xEB cp437 中,无论您想要显示什么。根据 CP437字符表,即小写三角洲,δ。 (只是为了突出显示差异,δ U + 03B4 在Unicode。)

In your first example, Java correctly interprets your Windows-1250 source. Character ë is U+00EB. When that byte (0xEB) is output to a code-page 437 terminal, the displayed result is what byte 0xEB means in cp437, regardless of what you thought you wanted to display. Per the CP437 character table, that is lowercase delta, δ. (Just to highlight the difference, δ is U+03B4 in Unicode.)

为了完整,事实证明不容易找出什么是默认编码 javac 文档 Charset 说:

For completeness, it turns out to be less than easy to find out what the default encoding for javac is. The docs for Charset say that:


默认字符集是在虚拟机启动期间确定的,通常取决于底层操作系统使用的区域设置和字符集。

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

根据您看到的行为,我猜猜系统上的javac 正在从控制台读取代码页并将其用作默认值。或者,默认是代码页,其中ë = 0xEB (例如, CP1252 ISO 8859-1 ,根据您的配置,根据您的配置,这两者可能都是默认值。)

Based on the behaviour you saw, I am guessing javac on your system is reading the code page from the console and using that as the default. Either that, or the default is a code page in which ë = 0xEB (e.g., CP1252 or ISO 8859-1, either of which might be the default depending on your configuration (as far as I know)).

编辑在我的机器上,默认是CP1252(Java字符集名称 windows-1252 )。我已将我使用的代码放在GitHub上。

Edit On my machine, the default is CP1252 (Java charset name windows-1252). I have put the code I used on GitHub.

这篇关于在javac中指定编码产生与更改Windows CMD中的活动代码页然后直接编译相同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆