如何在不使用临时文件的情况下从 Java 中的嵌套 zip 文件中读取数据? [英] How to read data from nested zip files in Java without using temporary files?

查看:34
本文介绍了如何在不使用临时文件的情况下从 Java 中的嵌套 zip 文件中读取数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从嵌套的 zip 存档中提取文件并在内存中处理它们.

I am trying to to extract files out of a nested zip archive and process them in memory.

这个问题不是关于什么:

  1. 如何在 Java 中读取 zip 文件:不,问题是如何读取 zip 文件中的 zip 文件等等(如在嵌套的 zip 文件中).

  1. How to read a zip file in Java: NO, the question is how to read a zip file within a zip file within a zip and so on and so forth (as in nested zip files).

在磁盘上写入临时结果:不,我要求在内存中完成所有操作.我找到了许多使用临时将结果写入磁盘的效率不高的技术的答案,但这不是我想要做的.

Write temporary results on disk: NO, I'm asking about doing it all in memory. I found many answers using the not-so-efficient technique of writing results temporarily to disk, but that's not what I want to do.

示例:

Zipfile -> Zipfile1 -> Zipfile2 -> Zipfile3

Zipfile -> Zipfile1 -> Zipfile2 -> Zipfile3

目标:提取在每个嵌套 zip 文件中找到的数据,所有数据都在内存中并使用 Java.

Goal: extract the data found in each of the nested zip files, all in memory and using Java.

ZipFile 就是答案, 你说?不,它不是,它适用于第一次迭代,即:

ZipFile is the answer, you say? NO, it is not, it works for the first iteration, that is for:

Zipfile -> Zipfile1

Zipfile -> Zipfile1

但是一旦你到达 Zipfile2,并执行一个:

But once you get to Zipfile2, and perform a:

ZipInputStream z = new ZipInputStream(zipFile.getInputStream( zipEntry) ) ;

你会得到一个 NullPointerException.

you will get a NullPointerException.

我的代码:

public class ZipHandler {

    String findings = new String();
    ZipFile zipFile = null;

    public void init(String fileName) throws AppException{

        try {
        //read file into stream
        zipFile = new ZipFile(fileName);  
        Enumeration<?> enu = zipFile.entries();  
        exctractInfoFromZip(enu);

        zipFile.close();
        } catch (FileNotFoundException e) {
        e.printStackTrace();

        } catch (IOException e) {
            e.printStackTrace();
    }
}

//The idea was recursively extract entries using ZipFile
public void exctractInfoFromZip(Enumeration<?> enu) throws IOException, AppException{   

    try {
        while (enu.hasMoreElements()) { 
            ZipEntry zipEntry = (ZipEntry) enu.nextElement();

            String name = zipEntry.getName();
            long size = zipEntry.getSize();
            long compressedSize = zipEntry.getCompressedSize();

            System.out.printf("name: %-20s | size: %6d | compressed size: %6d\n", 
                    name, size, compressedSize);

            // directory ?
            if (zipEntry.isDirectory()) {
                System.out.println("dir found:" + name);
                findings+=", " + name; 
                continue;
            } 

            if (name.toUpperCase().endsWith(".ZIP") ||  name.toUpperCase().endsWith(".GZ")) {
                String fileType = name.substring(
                        name.lastIndexOf(".")+1, name.length());

                System.out.println("File type:" + fileType);
                System.out.println("zipEntry: " + zipEntry);

                if (fileType.equalsIgnoreCase("ZIP")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
                    ZipInputStream z = new ZipInputStream(zipFile.getInputStream(zipEntry) ) ;
                    System.out.println("Opening ZIP as stream: " + name);

                    findings+=", " + name;

                    exctractInfoFromZip(zipInputStreamToEnum(z));
                } else if (fileType.equalsIgnoreCase("GZ")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip      
                    GZIPInputStream z = new GZIPInputStream(zipFile.getInputStream(zipEntry) ) ;
                    System.out.println("Opening ZIP as stream: " + name);

                    findings+=", " + name;

                    exctractInfoFromZip(gZipInputStreamToEnum(z));
                } else
                    throw new AppException("extension not recognized!");
            } else {
                System.out.println(name);
                findings+=", " + name;
            }
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    System.out.println("Findings " + findings);
} 

public Enumeration<?> zipInputStreamToEnum(ZipInputStream zStream) throws IOException{

    List<ZipEntry> list = new ArrayList<ZipEntry>();    

    while (zStream.available() != 0) {
        list.add(zStream.getNextEntry());
    }

    return Collections.enumeration(list);
} 

推荐答案

我还没有尝试过,但使用了 ZipInputStream 您可以读取任何包含 ZIP 文件作为数据的 InputStream.遍历条目,当您找到正确的条目时,使用 ZipInputStream 来创建另一个嵌套的ZipInputStream`.

I have not tried it but using ZipInputStream you can read any InputStream that contains a ZIP file as data. Iterate through the entries and when you found the correct entry use the ZipInputStreamto create another nestedZipInputStream`.

以下代码演示了这一点.想象一下,我们在 0.zip 中有一个 readme.txt,它再次被压缩在 1.zip 中,它被压缩在 2.zip 中.现在我们从 readme.txt 中读取一些文本:

The following code demonstrates this. Imagine we have a readme.txt inside 0.zip which is again zipped in 1.zip which is zipped in 2.zip. Now we read some text from readme.txt:

try (FileInputStream fin = new FileInputStream("D:/2.zip")) {
    ZipInputStream firstZip = new ZipInputStream(fin);
    ZipInputStream zippedZip = new ZipInputStream(findEntry(firstZip, "1.zip"));
    ZipInputStream zippedZippedZip = new ZipInputStream(findEntry(zippedZip, "0.zip"));

    ZipInputStream zippedZippedZippedReadme = findEntry(zippedZippedZip, "readme.txt");
    InputStreamReader reader = new InputStreamReader(zippedZippedZippedReadme);
    char[] cbuf = new char[1024];
    int read = reader.read(cbuf);
    System.out.println(new String(cbuf, 0, read));
    .....

public static ZipInputStream findEntry(ZipInputStream in, String name) throws IOException {
    ZipEntry entry = null;
    while ((entry = in.getNextEntry()) != null) {
        if (entry.getName().equals(name)) {
            return in;
        }
    }
    return null;
}

注意代码真的很丑,它没有关闭任何东西,也没有检查错误.它只是一个最小化的版本,展示了它是如何工作的.

Note the code is really ugly and does not close anything nor does it checks for errors. It is just a minimized version that demonstrates how it works.

理论上,您级联到另一个的 ZipInputStreams 的数量没有限制.数据永远不会写入临时文件.解密仅在您读取每个 InputStream 时按需执行.

Theoretically there is no limit how many ZipInputStreams you cascade into another. The data is never written into a temporary file. The decryption is only performed on-demand when you read each InputStream.

这篇关于如何在不使用临时文件的情况下从 Java 中的嵌套 zip 文件中读取数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆