在Java中将非ASCII文件名添加到zip [英] Add non-ASCII file names to zip in Java

查看:146
本文介绍了在Java中将非ASCII文件名添加到zip的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 Java 非ASCII 文件名添加到 zip文件的最佳方法是使文件可以在 Windows Linux中正确读取

What is the best way to add non-ASCII file names to a zip file using Java, in such a way that the files can be properly read in both Windows and Linux?

这是一个尝试,从 https://truezip.dev.java.net/tutorial-6.html#Example ,它在Windows Vista中工作,但在Ubuntu Hardy中失败。在Hardy文件夹中,文件名显示为abc-ЖДФ.txt。

Here is one attempt, adapted from https://truezip.dev.java.net/tutorial-6.html#Example, which works in Windows Vista but fails in Ubuntu Hardy. In Hardy the file name is shown as abc-ЖДФ.txt in file-roller.

import java.io.IOException;
import java.io.PrintStream;

import de.schlichtherle.io.File;
import de.schlichtherle.io.FileOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        try {
            PrintStream ps = new PrintStream(new FileOutputStream(
                    "outer.zip/abc-åäö.txt"));
            try {
                ps.println("The characters åäö works here though.");
            } finally {
                ps.close();
            }
        } finally {
            File.umount();
        }
    }
}

与java.util.zip不同,truezip允许指定zip文件编码。这是另一个示例,这次是明确指定编码。 IBM437,UTF-8和ISO-8859-1都不能在Linux中运行。 IBM437在Windows中工作。

Unlike java.util.zip, truezip allows specifying zip file encoding. Here's another sample, this time explicitly specifiying the encoding. Neither IBM437, UTF-8 nor ISO-8859-1 works in Linux. IBM437 works in Windows.

import java.io.IOException;

import de.schlichtherle.io.FileOutputStream;
import de.schlichtherle.util.zip.ZipEntry;
import de.schlichtherle.util.zip.ZipOutputStream;

public class Main {

    public static void main(final String[] args) throws IOException {

        for (String encoding : new String[] { "IBM437", "UTF-8", "ISO-8859-1" }) {
            ZipOutputStream zipOutput = new ZipOutputStream(
                    new FileOutputStream(encoding + "-example.zip"), encoding);
            ZipEntry entry = new ZipEntry("abc-åäö.txt");
            zipOutput.putNextEntry(entry);
            zipOutput.closeEntry();
            zipOutput.close();
        }
    }
}


推荐答案

ZIP中的文件条目的编码最初指定为IBM代码页面437.在其他语言中使用的许多字符是不可能使用的。

The encoding for the File-Entries in ZIP is originally specified as IBM Code Page 437. Many characters used in other languages are impossible to use that way.

PKWARE规范指出了问题,并增加了一些。但是这是一个后来的补充(从2007年起,谢谢Cheeso清理,看评论)。如果该位被设置,则文件名条目必须以UTF-8编码。此扩展名在附录D - 语言编码(EFS)中进行了描述,即链接文档的末尾。

The PKWARE-specification refers to the problem and adds a bit. But that is a later addition (from 2007, thanks to Cheeso for clearing that up, see comments). If that bit is set, the filename-entry have to be encoded in UTF-8. This extension is described in 'APPENDIX D - Language Encoding (EFS)', that is at the end of the linked document.

对于Java,这是一个已知的错误,遇到非ASCII字符的麻烦。请参阅错误#4244499 和大量相关的错误。

For Java it is a known bug, to get into trouble with non-ASCII-characters. See bug #4244499 and the high number of related bugs.

我的同事用作解决方法URL-文件名的编码,然后将其存储到ZIP中并读取后进行解码。如果您同时控制,存储和读取,这可能是一种解决方法。

My colleague used as workaround URL-Encoding for the filenames before storing them into the ZIP and decoding after reading them. If you control both, storing and reading, that may be a workaround.

编辑:有人建议使用Apache Ant的ZipOutputStream作为解决方法。此实现允许编码的规范。

At the bug someone suggests using the ZipOutputStream from Apache Ant as workaround. This implementation allows the specification of an encoding.

这篇关于在Java中将非ASCII文件名添加到zip的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆