Java ZipInputStream跳过未使用的ZipEntry内容,而不是耗尽它 [英] Java ZipInputStream skipping unused ZipEntry content, rather than draining it

查看:58
本文介绍了Java ZipInputStream跳过未使用的ZipEntry内容,而不是耗尽它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从zip中最佳读取 ZipEntry 内容.为此,我需要标准的 ZipInputStream InputStream.skip 用于不需要的条目内容,而不是消耗掉它.

I'm trying to achieve an optimal reading of a ZipEntry content from zip. To achieve such I need the standard ZipInputStream to use InputStream.skip for not needed entry content rather than draining it.

只要我从 ZIP(文件格式) Wiki了解到,

As long as I understand from ZIP (file format) wiki:

由于ZIP归档文件中的文件是单独压缩的,因此可以提取文件或添加新文件,而无需对整个归档文件进行压缩或解压缩.这与压缩的tar文件的格式形成对照,后者不容易进行这种随机访问处理.

Because the files in a ZIP archive are compressed individually it is possible to extract them, or add new ones, without applying compression or decompression to the entire archive. This contrasts with the format of compressed tar files, for which such random-access processing is not easily possible.

据此,我认为在使用ZIP解压缩条目内容之前,跳过不需要的内容是确定性的.

From this I assume that skipping not needed content is deterministic before uncompressing the entry's content using ZIP.

但是我看到 ZipInputStream (Java标准)和 ZipArchiveInputStream (apache)都在耗尽流直到下一个条目,而不是跳过它,这使我可以使用效率极低.

I however see that both ZipInputStream(Java standard) and ZipArchiveInputStream(apache) are draining the stream until the next entry rather than skipping it, which makes my use of it super inefficient.

我不完全了解ZIP规范,看到两个主要使用的ZIP API的这种行为使我认为这可能是不可能的.

I'm not completely aware of ZIP specification and seeing such a behavior of two majorly used ZIP APIs makes me think that it might be impossible.

是我的理解不正确,并且这种最佳行为是不可能的吗?或者您建议使用哪种Java API来有效地读取Zip条目?

Is it my understanding incorrect and such optimal behavior is not possible or what Java API do you suggest for reading Zip entries efficiently?

推荐答案

此处的问题是 ZipInputStream 是流.首先,读取第一个条目的LOC(本地文件头),读取条目(解压缩,校验和等),重复直到没有更多条目(或LOC)为止.

The problem here is that ZipInputStream is a stream. You start by reading the LOC (local file header) for the first entry, read the entry (decompress, checksum, etc.), repeat until no more entries (or LOCs rather).

文件/流的末尾包含用于随机访问(或显示zip文件结构)的整个zip内容的目录.流数据时,您无法访问流的末尾.因此,即使您可以寻求,也不会知道在哪里寻找.您必须解压缩才能知道该条目的数据何时结束,然后获得下一个条目的LOC,依此类推.

The end of the file/stream contains the directory for the whole zip contents for random access (or displaying zip file structure). When streaming data, you can't access the end of the stream. So even if you could seek, you wouldn't know where to seek to. You have to decompress to know when the data for the entry ends, then you get the LOC for the next entry and so on.

在此重复据说唯一的事实来源是中央目录,因此无论如何我们不能依靠条目的压缩大小来跳过.

In this duplicate it's said that the only source of truth is the central directory, so we can't rely on compressed size of an entry for skipping anyway.

这篇关于Java ZipInputStream跳过未使用的ZipEntry内容,而不是耗尽它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆