如何从内存中的流式zip文件访问zipEntry [英] How to access a zipEntry from a streamed zip file in memory

查看:2168
本文介绍了如何从内存中的流式zip文件访问zipEntry的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在实施一个电子书库( skyepub ),该库要求我实施一个方法来检查zipEntry存在与否。在演示版本中,解决方案很简单:

I'm currently implementing an Ereader library (skyepub) that requires that I implement a method that checks if a zipEntry exists or not. In their demo version, the solution is simple:

public boolean isExists(String baseDirectory,String contentPath) {
    setupZipFile(baseDirectory,contentPath);
    if (this.isCustomFont(contentPath)) {
        String path = baseDirectory +"/"+ contentPath;
        File file = new File(path);
        return file.exists();
    }

    ZipEntry entry = this.getZipEntry(contentPath);
    if (entry==null) return false;
    else return true;       
}

// Entry name should start without / like META-INF/container.xml 

private ZipEntry getZipEntry(String contentPath) {

    if (zipFile==null) return null;

    String[] subDirs = contentPath.split(Pattern.quote(File.separator));

    String corePath = contentPath.replace(subDirs[1], "");

    corePath=corePath.replace("//", "");

    ZipEntry entry = zipFile.getEntry(corePath.replace(File.separatorChar, '/'));

    return entry;

}

正如你所看到的,你可以访问ZipEntry问题在O(1)时间使用 getZipEntry(contentPath);

So as you can see, you can access the ZipEntry in question in O(1) time using getZipEntry(contentPath);

zip文件直接从文件系统(它必须从内存中读取安全原因)..因此我的 ifExists 实现实际上通过zip文件一个条目一个时间,直到找到有问题的zipEntry,这里是相关的部分:

However, in my case I cannot read the zipfile straight from the file system (it must be read from in memory for security reasons).. And so my ifExists implementation actually goes through the zip file one entry at a time, until it finds the zipEntry in question, here is the relevant part:

try {
        final InputStream stream = dbUtil.getBookStream(bookEditionID);
        if( stream == null) return null;

        final ZipInputStream zip = new ZipInputStream(stream);

        ZipEntry entry;
        do {
            entry = zip.getNextEntry();
            if( entry == null) {
                zip.close();
                return null;
            }
        } while( !entry.getName().equals(zipEntryName));

    } catch( IOException e) {
        Log.e("demo", "Can't get content data for "+contentPath);
        return null;
    }

    return data;

因此如果数据存在, ifExists true,否则为false如果为null。

and so if data exists, ifExistsreturns true, otherwise false if null.

有没有办法找到相关的zip条目

Is there a way I can find the zip entry in question from the entire ZipInputStream in O(1) time rather than O(n) time?

请参阅问题
回答。

推荐答案

在O(1)时间内,zip存档中的条目不能真正加载。如果我们查看 zip存档的结构,它如下所示:

An entry in a zip archive cannot really be loaded in O(1) time. If we look at the structure of a zip archive, it looks like this:

  [local file header 1]
  [encryption header 1]
  [file data 1]
  [data descriptor 1]
  ... 
  [local file header n]
  [encryption header n]
  [file data n]
  [data descriptor n]
  [archive decryption header] 
  [archive extra data record] 
  [central directory header 1]
  .
  [central directory header n]
  [zip64 end of central directory record]
  [zip64 end of central directory locator] 
  [end of central directory record]

基本上,有一些头文件的压缩文件和一个中心目录,包含文件的所有元数据)。确定条目的唯一有效方法是扫描中心目录(更多信息 a>):

Basically, there are compressed files with some headers plus a "central directory" which contains all metadata about the files (central directory headers). The only valid way how to locate an entry is by scanning the central directory (more info):


...不得从ZIP文件顶部扫描条目,因为只有中央目录指定文件块启动

...must not scan for entries from the top of the ZIP file, because only the central directory specifies where a file chunk starts

因为没有索引超过中央目录头,所以只能在 O (n)其中 n 是存档中的文件数。

Because there is no index over central directory headers, you can only get an entry in O(n) where n is the number of files in the archive.

strong>更新:不幸的是,我知道哪些使用流而不是文件的所有zip库使用本地文件头,并扫描整个流包括内容。他们也不容易弯曲。唯一的方法如何避免扫描整个档案我发现是自适应库。

Update: Unfortunately, all zip libraries I know of which work with streams rather than files do use local file headers and scan the entire stream including contents. They cannot be easily bent either. The only way how to avoid scanning the entire archive I found is adapting a library yourself.

更新2:我已自由修改上述zip4j库供您使用。假设您的zip文件以字节数组读取,并且已经在zip4j版本1.3.2上添加了依赖关系,则可以使用 MemoryHeaderReader RandomByteStream ,如下所示:

Update 2: I have taken the liberty of modifying the aforementioned zip4j library for your purposes. Assuming you have your zip file read in a byte array and you have added a dependency on zip4j version 1.3.2, you can use MemoryHeaderReader and RandomByteStream like this:

String myZipFile = "...";
byte[] bytes = readFile();
MemoryHeaderReader headerReader = new MemoryHeaderReader(RandomAccessStream.fromBytes(bytes));
ZipModel zipModel = headerReader.readAllHeaders();
FileHeader myFile = Zip4jUtil.getFileHeader(zipModel, myZipFile)
boolean fileIsPresent = myFile != null;

它可以在 O(entryCount)中工作,相当快。我没有彻底测试它,但它应该给你一个想法如何可以调整zip4j为您的目的。

It works in O(entryCount) without reading the entire archive which should be reasonably fast. I haven't thoroughly tested it, but it should give you an idea how you can adjust zip4j for your purposes.

这篇关于如何从内存中的流式zip文件访问zipEntry的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆