java读写unicode/UTF-8文件名(不是内容) [英] java read write unicode / UTF-8 filenames (not contents)

查看:26
本文介绍了java读写unicode/UTF-8文件名(不是内容)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个带有日文字符的目录/文件.如果我尝试读取包含(例如) ク 的文件名(而不是内容),我会收到一个包含 的字符串.如果我尝试创建一个包含 ク 的文件/目录,则会出现一个包含 ? 的文件​​/目录.

i have a few directories/files with Japanese characters. If i try to read a filename (not the contents) containing (as example) a ク i receive a String containing a �. If i try to create a file/directory containing an ク a file/directory appears containing a ?.

例如:我列出了文件.

File file = new File(".");  
String[] filesAndDirs = file.list();

filesAndDirs 数组现在包含特殊字符的目录.字符串现在只包含 .它接缝没有什么可解码的,因为即使对于不同的字符,对于文件名中的每个字符,a getbytes 也仅显示-17 -65 -67".

the filesAndDirs array now contains the directories this the special characters. The String now only contains ����. It seams there is nothing to decode because the a getbytes shows only "-17 -65 -67" for every char in the filename even for different chars.

我使用 MacOS 10.8.2 Java 7_10 和 Netbeans.

I use MacOS 10.8.2 Java 7_10 and Netbeans.

有什么想法吗?

提前谢谢你:)

推荐答案

这些字节是 0xef 0xbf 0xbd,这是您看到的 ufffd 字符的 UTF-8 编码形式,而不是日文字符.看来 Java 用来列出文件的任何操作系统函数实际上都返回了那些不正确的字符.

Those bytes are 0xef 0xbf 0xbd, which is the UTF-8-encoded form of the ufffd character you're seeing instead of the Japanese characters. It appears whatever OS function Java is using to list the files is in fact returning those incorrect characters.

也许 Files.newDirectoryStream 会更可靠.试试这个:

Perhaps Files.newDirectoryStream will be more reliable. Try this instead:

try (DirectoryStream<Path> dir = Files.newDirectoryStream(Paths.get("."))) {
    for (Path child : dir) {
        String filename = child.getFileName().toString();

        System.out.println("name=" + filename);
        for (char c : filename.toCharArray()) {
            System.out.printf("%04x ", (int) c);
        }
        System.out.println();
    }
}

这篇关于java读写unicode/UTF-8文件名(不是内容)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆