file.encoding没有任何效果,LC_ALL环境变量是这样做的 [英] file.encoding has no effect, LC_ALL environment variable does it

查看:133
本文介绍了file.encoding没有任何效果,LC_ALL环境变量是这样做的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以下使用OpenJDK 1.6.0_22在Linux中运行的Java程序中,我只需在命令行中列出作为参数的目录的内容。该目录包含UTF-8中文件名的文件(例如印地文,普通话,德文等)。

In the following Java program running in Linux using OpenJDK 1.6.0_22 I simply list the contents of the directory taken in as parameter at the command line. The directory contains the files which have file names in UTF-8 (e.g. Hindi, Mandarin, German etc.).

import java.io.*;

class ListDir {

    public static void main(String[] args) throws Exception {
    //System.setProperty("file.encoding", "en_US.UTF-8");
        System.out.println(System.getProperty("file.encoding"));
    File f = new File(args[0]);
    for(String c : f.list()) {
        String absPath = args[0] + "" + c;
        File cf = new File(args[0] + "/" + c);
        System.out.println(cf.getAbsolutePath() + " --> " + cf.exists());
    }
    }
}

如果我设置LC_ALL变量到en_US.UTF-8的结果打印正确。但是,如果我将LC_ALL变量设置为POSIX,并从命令行将file.encoding和sun.jnu.encoding属性提供为UTF-8,则得到垃圾输出,cf.exists()返回false。

If I set the LC_ALL variable to en_US.UTF-8 the results are printed fine. But if I set the LC_ALL variable to POSIX and supply the file.encoding and sun.jnu.encoding properties as UTF-8 from command line I get the garbage output and cf.exists() returns false.

你能解释一下这个行为吗?当我读到这么多的网站file.encoding据说足以阅读文件名称并使用它们进行操作。

Can you please explain this behavior. As I read on so many websites file.encoding is said to be sufficient to read file names and use them for operations. Here it looks like that property has no effect at all.

更新1:如果我将file.encoding设置为像GBK(中文)而LC_ALL变量为en_US.UTF-8,则cf.exists()返回true。只出现'?',而不是文件名。惊喜o_O。

Update 1: If I set file.encoding to something like GBK (Chinese) and LC_ALL variable to en_US.UTF-8 then cf.exists() returns true. only the '?' appears instead of file name. Surprise o_O.

更新2:更多调查,看起来不是Java问题。看起来libc在Linux上使用区域设置来翻译文件名称编码,这些设置将导致文件找不到错误/异常。 file.encoding是为了Java如何解释文件名。

Update 2: More investigation and it looks like its not a Java issue. It looks like libc on Linux used locale settings to translate file name encodings and those settings will cause file not found error/exception. "file.encoding" is for how Java interprets file names.

更新3 现在看起来问题是Java如何解释文件名。以下简单的C代码适用于Linux,无论文件编码和LC_ALL环境变量的值如何(我很高兴这证明了这里给出的答案: https://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding )。但是我仍然不清楚Java如何解释LC_ALL变量。现在查看OpenJDK代码。

Update 3 Now it looks problem is how Java interprets file names. The following simple C code works on Linux regardless of file encoding and value of LC_ALL environment variable (I am happy that this proves for answer given here: https://unix.stackexchange.com/questions/39175/understanding-unix-file-name-encoding). But still I am not clear how Java interprets on LC_ALL variable. Now looking into OpenJDK code for that.

示例C代码:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>

int main(int argc, char *argv[])
{
    char *argdir = argv[1];
    DIR *dp = opendir(argdir);
    struct dirent *de;
    while(de = readdir(dp)) {
        char *abspath = (char *) malloc(strlen(argdir)  + 1 + strlen(de->d_name) + 1);
        strcpy(abspath, argdir);
        abspath[strlen(argdir)] = '/';
        strcpy(abspath + strlen(argdir) + 1, de->d_name);
        printf("%d %s ", de->d_type, abspath);
        FILE *fp = fopen(abspath, "r");
        if (fp) {
            printf("Success");
        }
        fclose(fp);
        putchar('\n');
    }
}


推荐答案

注意:所以终于我觉得我已经把它钉了下来。我没有确认是对的。但是通过一些代码读取和测试,这是我发现的,我没有额外的时间来研究它。如果有任何人有兴趣,他们可以检查出来,告诉我这个答案是对还是错的 - 我会很高兴的:)

Note: So finally I think that I have nailed it down. I am not confirming that it is right. But with some code reading and tests this is what I found out and I don't have additional time to look into it. If anyone is interested they can check it out and tell if this answer is right or wrong - I would be glad :)

我使用的引用是从OpenJDK网站上提供此tarball:
openjdk-6-src-b25-01_may_2012.tar.gz

The reference I used was from this tarball available at OpenJDK's site: openjdk-6-src-b25-01_may_2012.tar.gz


  1. Java本方法将所有字符串翻译为平台的本地编码: jdk / src / share / native / common / jni_util.c - JNU_GetStringPlatformChars()。系统属性 sun.jnu.encoding 用于确定平台的编码。

  1. Java natively translates all string to platform's local encoding in this method: jdk/src/share/native/common/jni_util.c - JNU_GetStringPlatformChars() . System property sun.jnu.encoding is used to determine the platform's encoding.

code> sun.jnu.encoding 设置为 jdk / src / solaris / native / java / lang / java_props_md.c - GetJavaProperties()使用libc的 setlocale()方法。环境变量 LC_ALL 用于设置 sun.jnu.encoding 的值。在命令提示符下使用 -Dsun.jnu.encoding 选项给Java的值将被忽略。

The value of sun.jnu.encoding is set at jdk/src/solaris/native/java/lang/java_props_md.c - GetJavaProperties() using setlocale() method of libc. Environment variable LC_ALL is used to set the value of sun.jnu.encoding. Value given at the command prompt using -Dsun.jnu.encoding option to Java is ignored.

调用 File.exists()已在文件 jdk / src / share / classes / java / io / File.java 并返回为

return((fs.getBooleanAttributes(this)& FileSystem.BA_EXISTS)!= 0);

return ((fs.getBooleanAttributes(this) & FileSystem.BA_EXISTS) != 0);

getBooleanAttributes()是本机编码的(我正在跳过代码中的步骤在$ code> jdk / src / share / native / java / io / UnixFileSystem_md.c 中的功能:
Java_java_io_UnixFileSystem_getBooleanAttributes0() 。这里宏
WITH_FIELD_PLATFORM_STRING(env,file,ids.path,path)将路径字符串转换为平台的编码。

getBooleanAttributes() is natively coded (and I am skipping steps in code browsing through many files) in jdk/src/share/native/java/io/UnixFileSystem_md.c in function : Java_java_io_UnixFileSystem_getBooleanAttributes0(). Here the macro WITH_FIELD_PLATFORM_STRING(env, file, ids.path, path) converts path string to platform's encoding.

所以转换为错误的编码实际上会发送一个错误的C字符串(char数组)到后续调用 stat()方法。它将返回,结果找不到该文件。

So conversion to wrong encoding will actually send a wrong C string (char array) to subsequent call to stat() method. And it will return with result that file cannot be found.

课程: LC_ALL 非常重要

这篇关于file.encoding没有任何效果,LC_ALL环境变量是这样做的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆