JVM如何确定Linux上argv的(默认?)字符编码 [英] How does the JVM determine the (default?) character encoding for argv on Linux

查看:155
本文介绍了JVM如何确定Linux上argv的(默认?)字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Java有一个 默认字符编码 ,它在未明确提供字符编码的上下文中使用。 它选择编码的文档是模糊的:

Java has a default character encoding, which it uses in contexts where a character encoding is not explicitly supplied. The documentation for how it chooses that encoding is vague:


默认字符集是在虚拟机启动时确定的并且通常取决于底层操作系统的语言环境和字符集。

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system.

该文档必须含糊不清,因为JVM使用的方法是特定于系统。

That documentation has to be vague because the method the JVM uses is system specific.

使用默认字符编码通常是一个坏主意;最好使用明确指示的编码,或者总是对某些I / O使用相同的编码。但是,不可避免地使用默认字符编码似乎是用于命令行参数的字符编码。在诸如Linux的POSIX系统上,JVM的本机(C / C ++)代码将命令行参数作为C / C ++ char 指针的空终止列表获取。应该将其视为字节指针,因为它们必须在 some中编码代码点(不清楚的方式。 JVM必须解释C / C ++ char s(字节)的序列,将它们转换为Java char s,将被赋予Java程序的 main()。我假设JVM使用默认的字符编码。

Using the default character encoding is often a bad idea; it is better to use an explicitly indicated encoding, or to always use the same encoding for some I/O. But one unavoidable use of the default character encoding would seem to be the character encoding used for command-line arguments. On a POSIX system such as Linux, the native (C/C++) code of the JVM gets the command-line arguments as a null terminated list of C/C++ char pointers. Which ought to be thought of as byte pointers, as they must be encoding code points in some (unclear) manner. The JVM has to interpret those sequences of C/C++ chars (bytes) to convert them into a sequence of Java chars, to be given to the main() of the Java program. I assume the JVM uses the default character encoding for this.

所以我需要确切知道JVM如何确定特定系统的默认编码(现代GNU / Linux)操作系统),所以我可以提供有关我的程序行为的用户文档,因此我的程序用户可以预测它的行为方式。

So I need to know precisely how the JVM determines the default encoding for a particular system (a modern GNU/Linux operating system), so I can provide user documentation about how my program behaves, and so users of my program can predict how it will behave.

我想JVM会检查一些环境变量,但是哪些?

I guess the JVM examines some environment variables, but which ones?

推荐答案

您可以查看 java.nio的源代码.charset.Charset.defaultCharset()。当我在我的系统(64位Windows 7,Oracle JDK 8更新25)上执行此操作时,我看到:

You can ofcourse look at the source code of java.nio.charset.Charset.defaultCharset(). When I do that on my system (64-bit Windows 7, with Oracle JDK 8 update 25) I see this:

public static Charset defaultCharset() {
    if (defaultCharset == null) {
        synchronized (Charset.class) {
            String csn = AccessController.doPrivileged(
                new GetPropertyAction("file.encoding"));
            Charset cs = lookup(csn);
            if (cs != null)
                defaultCharset = cs;
            else
                defaultCharset = forName("UTF-8");
        }
    }
    return defaultCharset;
}

换句话说,它会查看系统属性 file.encoding 如果找不到匹配的 Charset 实例,则使用 UTF-8

In other words, it looks at the system property file.encoding and if it cannot find a matching Charset instance, it uses UTF-8.

这篇关于JVM如何确定Linux上argv的(默认?)字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆