在Java中将非ASCII文件名添加到zip [英] Add non-ASCII file names to zip in Java
问题描述
使用 Java 将非ASCII 文件名添加到 zip文件的最佳方法是使文件可以在 Windows 和 Linux中正确读取
What is the best way to add non-ASCII file names to a zip file using Java, in such a way that the files can be properly read in both Windows and Linux?
这是一个尝试,从 https://truezip.dev.java.net/tutorial-6.html#Example ,它在Windows Vista中工作,但在Ubuntu Hardy中失败。在Hardy文件夹中,文件名显示为abc-ЖДФ.txt。
Here is one attempt, adapted from https://truezip.dev.java.net/tutorial-6.html#Example, which works in Windows Vista but fails in Ubuntu Hardy. In Hardy the file name is shown as abc-ЖДФ.txt in file-roller.
import java.io.IOException;
import java.io.PrintStream;
import de.schlichtherle.io.File;
import de.schlichtherle.io.FileOutputStream;
public class Main {
public static void main(final String[] args) throws IOException {
try {
PrintStream ps = new PrintStream(new FileOutputStream(
"outer.zip/abc-åäö.txt"));
try {
ps.println("The characters åäö works here though.");
} finally {
ps.close();
}
} finally {
File.umount();
}
}
}
与java.util.zip不同,truezip允许指定zip文件编码。这是另一个示例,这次是明确指定编码。 IBM437,UTF-8和ISO-8859-1都不能在Linux中运行。 IBM437在Windows中工作。
Unlike java.util.zip, truezip allows specifying zip file encoding. Here's another sample, this time explicitly specifiying the encoding. Neither IBM437, UTF-8 nor ISO-8859-1 works in Linux. IBM437 works in Windows.
import java.io.IOException;
import de.schlichtherle.io.FileOutputStream;
import de.schlichtherle.util.zip.ZipEntry;
import de.schlichtherle.util.zip.ZipOutputStream;
public class Main {
public static void main(final String[] args) throws IOException {
for (String encoding : new String[] { "IBM437", "UTF-8", "ISO-8859-1" }) {
ZipOutputStream zipOutput = new ZipOutputStream(
new FileOutputStream(encoding + "-example.zip"), encoding);
ZipEntry entry = new ZipEntry("abc-åäö.txt");
zipOutput.putNextEntry(entry);
zipOutput.closeEntry();
zipOutput.close();
}
}
}
推荐答案
ZIP中的文件条目的编码最初指定为IBM代码页面437.在其他语言中使用的许多字符是不可能使用的。
The encoding for the File-Entries in ZIP is originally specified as IBM Code Page 437. Many characters used in other languages are impossible to use that way.
PKWARE规范指出了问题,并增加了一些。但是这是一个后来的补充(从2007年起,谢谢Cheeso清理,看评论)。如果该位被设置,则文件名条目必须以UTF-8编码。此扩展名在附录D - 语言编码(EFS)中进行了描述,即链接文档的末尾。
The PKWARE-specification refers to the problem and adds a bit. But that is a later addition (from 2007, thanks to Cheeso for clearing that up, see comments). If that bit is set, the filename-entry have to be encoded in UTF-8. This extension is described in 'APPENDIX D - Language Encoding (EFS)', that is at the end of the linked document.
对于Java,这是一个已知的错误,遇到非ASCII字符的麻烦。请参阅错误#4244499 和大量相关的错误。
For Java it is a known bug, to get into trouble with non-ASCII-characters. See bug #4244499 and the high number of related bugs.
我的同事用作解决方法URL-文件名的编码,然后将其存储到ZIP中并读取后进行解码。如果您同时控制,存储和读取,这可能是一种解决方法。
My colleague used as workaround URL-Encoding for the filenames before storing them into the ZIP and decoding after reading them. If you control both, storing and reading, that may be a workaround.
编辑:有人建议使用Apache Ant的ZipOutputStream作为解决方法。此实现允许编码的规范。
At the bug someone suggests using the ZipOutputStream from Apache Ant as workaround. This implementation allows the specification of an encoding.
这篇关于在Java中将非ASCII文件名添加到zip的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!