为什么某些zip文件具有未知的文件内容 [英] why do certain zip files have unknown file content

查看:224
本文介绍了为什么某些zip文件具有未知的文件内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景



我偶然遇到这个问题此处



分析



根据 java docs ZipEntry,有时请求zip文件的大小只返回-1



但是,运行命令

  $ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub 
存档:b17c024e-89f1-42f7-a546-91d46610cedb.epub
长度日期时间名称
-------- ---- ---- ----
20 01-27-12 11:17 mimetype
2378 04-20-12 10 :12 OEBPS / hayat-ghayr.html
6436 02-06-12 11:06 OEBPS / content.opf
112579 01-27-12 11:25 OEBPS / images / 978-614-425- 313-7-hayat-ghayr-cover.png
182575 01-27-12 11:25 OEBPS / images / 978-614-425-313-7-hayat_fmt.png
7757 01-27- 12 11:21 OEBPS / template.css
5643 01-27-12 11:18 OEBPS / hayat-ghayr-2.html
20144 01-27-12 11:17 OEBPS / hayat-ghayr- 1.html
65543 01-27-12 11:17 OEBPS / hayat-ghayr-3.html
59434 01-27-12 11:17 OEBPS / hayat-ghayr-4.html
66768 01-27-12 11:17 OEBPS / hayat-ghayr-5.html
49117 01-27-12 11:17 OEBPS / hayat-ghayr-6.html
65346 01-27- 12 11:17 OEBPS / hayat-ghayr-7.html
74196 01-27-12 11:17 OEBPS / hayat-ghayr-8.html
73998 01-27-12 11:17 OEBPS / hayat-ghayr-9.html
61031 01-27-12 11:17 OEBPS / hayat-ghayr-10.html
68297 01-27-12 11:17 OEBPS / hayat-ghayr-11。 html
72084 01-27-12 11:17 OEBPS / hayat-ghayr-12.html
2386 01-27-12 11:17 OEBPS / hayat-ghayr-13.html
61132 01-27-12 11:17 OEBPS / hayat-ghayr-14.html
46320 01-27-12 11:17 OEBPS / hayat-ghayr-15.html
32673 01-27-12 11 :17 OEBPS / hayat-ghayr-16.html
88584 01-27-12 11:17 OEBPS / hayat-ghayr-17.html
56474 01-27-12 11:17 OEBPS / hayat- ghayr-18.html
52840 01-27-12 11:17 OEBPS / hayat-ghayr-19.html
80022 01-27-12 11:17 OEBPS / hayat-ghayr-20.html
50781 01-27-12 11:17 OEBPS / hayat-ghayr-21.html
2765 01-27-12 11:17 OEBPS / hayat-ghayr-22.html
265 01- 27-12 11:17 META-INF / container.xml
54942 01-27-12 11:17 OEBPS / images / 277.png
5549 01-27-12 11:17 OEBPS / toc。 ncx
1072 03-23-12 13:28 iTunesMetadata.plist
-------- -------
1529151 32个文件

显示所有章节都有一个内容长度..
,但是如果我们解压缩相同的文件并重新压缩它再次具有更强的压缩.. zipFile java命令返回正确的内容大小



问题



zip库的故障或原始压缩故障?我们如何知道?



跟进问题



请参阅如何从流式zip文件访问zipEntry



请参阅第4.4.8 / 4.4节.9在 https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 其中涉及大小字段


如果设置通用位标志的位3,
这些字段在本地头中设置为零,
正确的值放在数据描述符中,
放在中央目录中。


数据描述符紧跟在条目的压缩内容之后,因此在从不可寻找的流中读取时,在读取条目的实际内容之前不可用。



当使用 ZipArchiveInputStream 时,只要读取了本地文件头,就会获得 ZipEntry (因为基础流可能不可寻找),所以大小信息可能丢失。 ZipFile 在封面下使用 RandomAccessFile ,可以读取中央目录 - 和朋友 - 因此他们知道多于 ZipArchiveInputStream


background

I stumbled across this problem here

analysis

according to the java docs for ZipEntry, sometimes requesting the size of a zipfile entry simply returns -1

However, running the command

$ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub 
Archive:  b17c024e-89f1-42f7-a546-91d46610cedb.epub
  Length     Date   Time    Name
 --------    ----   ----    ----
       20  01-27-12 11:17   mimetype
     2378  04-20-12 10:12   OEBPS/hayat-ghayr.html
     6436  02-06-12 11:06   OEBPS/content.opf
   112579  01-27-12 11:25   OEBPS/images/978-614-425-313-7-hayat-ghayr-cover.png
   182575  01-27-12 11:25   OEBPS/images/978-614-425-313-7-hayat_fmt.png
     7757  01-27-12 11:21   OEBPS/template.css
     5643  01-27-12 11:18   OEBPS/hayat-ghayr-2.html
    20144  01-27-12 11:17   OEBPS/hayat-ghayr-1.html
    65543  01-27-12 11:17   OEBPS/hayat-ghayr-3.html
    59434  01-27-12 11:17   OEBPS/hayat-ghayr-4.html
    66768  01-27-12 11:17   OEBPS/hayat-ghayr-5.html
    49117  01-27-12 11:17   OEBPS/hayat-ghayr-6.html
    65346  01-27-12 11:17   OEBPS/hayat-ghayr-7.html
    74196  01-27-12 11:17   OEBPS/hayat-ghayr-8.html
    73998  01-27-12 11:17   OEBPS/hayat-ghayr-9.html
    61031  01-27-12 11:17   OEBPS/hayat-ghayr-10.html
    68297  01-27-12 11:17   OEBPS/hayat-ghayr-11.html
    72084  01-27-12 11:17   OEBPS/hayat-ghayr-12.html
     2386  01-27-12 11:17   OEBPS/hayat-ghayr-13.html
    61132  01-27-12 11:17   OEBPS/hayat-ghayr-14.html
    46320  01-27-12 11:17   OEBPS/hayat-ghayr-15.html
    32673  01-27-12 11:17   OEBPS/hayat-ghayr-16.html
    88584  01-27-12 11:17   OEBPS/hayat-ghayr-17.html
    56474  01-27-12 11:17   OEBPS/hayat-ghayr-18.html
    52840  01-27-12 11:17   OEBPS/hayat-ghayr-19.html
    80022  01-27-12 11:17   OEBPS/hayat-ghayr-20.html
    50781  01-27-12 11:17   OEBPS/hayat-ghayr-21.html
     2765  01-27-12 11:17   OEBPS/hayat-ghayr-22.html
      265  01-27-12 11:17   META-INF/container.xml
    54942  01-27-12 11:17   OEBPS/images/277.png
     5549  01-27-12 11:17   OEBPS/toc.ncx
     1072  03-23-12 13:28   iTunesMetadata.plist
 --------                   -------
  1529151                   32 files

shows that there is a content length for all the chapters.. but also, if we unzip the same file and rezip it again with stronger compression.. the zipFile java command returns the proper content size

question

is this the zip library's fault or the original compression fault? how can we know?

follow up question

see how to access a zipEntry from a streamed zipfile

解决方案

ZIP stores meta data inside the archive in a few different places ("local file header", "central directory" and sometimes a "data descriptor"). Only the "local file header" is in front of the file's content - the "central directory" is at the very end of the archive. Only the "central directory" holds the full truth, it is perfectly valid to not specify any size in the "local file header".

See section 4.4.8/4.4.9 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which talks about the size fields

If bit 3 of the general purpose bit flag is set, these fields are set to zero in the local header and the correct values are put in the data descriptor and in the central directory.

The "data descriptor" immediately follows the compressed content of the entry - and thus is not available before reading the actual content of the entry when reading from a non-seekable stream.

When using ZipArchiveInputStream you obtain the ZipEntry as soon as the "local file header" has been read (because the underlying stream may not be seekable), so the size information may be missing. ZipFile uses RandomAccessFile under the covers and can read the "central directory" - as does unzip and friends - so they know more than ZipArchiveInputStream.

这篇关于为什么某些zip文件具有未知的文件内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆