Java无法在文件名中打开具有代理Unicode值的文件？ [英] Java Can't Open a File with Surrogate Unicode Values in the Filename?

查看：168 发布时间：2018/11/29 19:25:05 java file unicode filenames surrogate-pairs

本文介绍了Java无法在文件名中打开具有代理Unicode值的文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理使用文件执行各种IO操作的代码，我希望能够处理国际文件名。我正在使用Java 1.5处理Mac，如果文件名包含需要代理的Unicode字符，则JVM似乎无法找到该文件。例如，我的测试文件是：

I'm dealing with code that does various IO operations with files, and I want to make it able to deal with international filenames. I'm working on a Mac with Java 1.5, and if a filename contains Unicode characters that require surrogates, the JVM can't seem to locate the file. For example, my test file is:

草鸥外.gif哪个被分成了Java字符 \ u8349 \ uD85B \ uDFF6 \ u9DD7 \ u5916.gif

"草鷗外.gif" which gets broken into the Java characters \u8349\uD85B\uDFF6\u9DD7\u5916.gif

如果我创建文件从这个文件名，我无法打开它，因为我得到一个FileNotFound异常。即使在包含该文件的文件夹上使用它也会失败：

If I create a file from this filename, I can't open it because I get a FileNotFound exception. Even using this on the folder containing the file will fail:

File[] files = folder.listFiles(); 
for (File file : files) {
    if (!file.exists()) {
        System.out.println("Failed to find File"); //Fails on the surrogate filename
    }
}

大部分代码我实际上处理的形式是：

Most of the code I am actually dealing with are of the form:

FileInputStream instream = new FileInputStream(new File("草鷗外.gif"));
// operations follow

我是否有办法解决这个问题，要么逃避文件名或打开文件有何不同？

Is there some way I can address this problem, either escaping the filenames or opening files differently?

推荐答案

我怀疑Java或Mac之一正在使用 CESU-8 而不是正确的UTF-8。 Java使用修改过的UTF-8（这是CESU-8的一个细微变化）用于各种内部目的，但我不知道它可以将它用作文件系统/ defaultCharset。不幸的是我在这里没有测试Mac和Java。

I suspect one of Java or Mac is using CESU-8 instead of proper UTF-8. Java uses "modified UTF-8" (which is a slight variation of CESU-8) for a variety of internal purposes, but I wasn't aware it could use it as a filesystem/defaultCharset. Unfortunately I have neither Mac nor Java here to test with.

修改是一种改进的说法严重错误。而不是为补充（非BMP）字符输出四字节UTF-8序列，例如&＃x26FF6;：

"Modified" is a modified way of saying "badly bugged". Instead of outputting a four-byte UTF-8 sequence for supplementary (non-BMP) characters like 𦿶:

\xF0\xA6\xBF\xB6

它输出一个UTF-8编码的序列每个代理人：

it outputs a UTF-8-encoded sequence for each of the surrogates:

\xED\xA1\x9B\xED\xBF\xB6

这不是一个有效的UTF-8序列，但无论如何很多解码器都会允许它。问题是，如果你通过一个真正的UTF-8编码器往返，你有一个不同的字符串，上面的四字节字符串。尝试使用该名称和繁荣访问该文件！失败。

This isn't a valid UTF-8 sequence, but a lot of decoders will allow it anyway. Problem is if you round-trip that through a real UTF-8 encoder you've got a different string, the four-byte one above. Try to access the file with that name and boom! fail.

首先让我们检查文件名实际存储在当前文件系统下的方式，使用一个平台，使用文件名的字节，如Python 2.x：

So first let's just check how filenames are actually stored under your current filesystem, using a platform that uses bytes for filenames such as Python 2.x:

$ python
Python 2.x.something (blah blah)
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir('.')

在我的文件系统（Linux，ext4，UTF-8）上，文件名草&＃x26FF6;鸥外.gif出现：

On my filesystem (Linux, ext4, UTF-8), the filename "草𦿶鷗外.gif" comes out as:

['\xe8\x8d\x89\xf0\xa6\xbf\xb6\xe9\xb7\x97\xe5\xa4\x96.gif']

这就是你想要的。如果这就是你得到的，那可能是Java做错了。如果你得到更长的六字节字符版本：

which is what you want. If that's what you get, it's probably Java doing it wrong. If you get the longer six-byte-character version:

['\xe8\x8d\x89\xed\xa1\x9b\xed\xbf\xb6\xe9\xb7\x97\xe5\xa4\x96.gif']

它可能是OS X做错了...它总是存储这样的文件名吗？（或者这些文件最初来自其他地方？）如果您将文件重命名为正确版本怎么办？：

it's probably OS X doing it wrong... does it always store filenames like this? (Or did the files come from somewhere else originally?) What if you rename the file to the ‘proper’ version?:

os.rename('\xe8\x8d\x89\xed\xa1\x9b\xed\xbf\xb6\xe9\xb7\x97\xe5\xa4\x96.gif', '\xe8\x8d\x89\xf0\xa6\xbf\xb6\xe9\xb7\x97\xe5\xa4\x96.gif')

这篇关于Java无法在文件名中打开具有代理Unicode值的文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java无法在文件名中打开具有代理Unicode值的文件？ [英] Java Can't Open a File with Surrogate Unicode Values in the Filename?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java无法在文件名中打开具有代理Unicode值的文件？ [英] Java Can&#39;t Open a File with Surrogate Unicode Values in the Filename?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

Java无法在文件名中打开具有代理Unicode值的文件？ [英] Java Can't Open a File with Surrogate Unicode Values in the Filename?

登录关闭