为什么某些zip文件具有未知的文件内容 [英] why do certain zip files have unknown file content
问题描述
背景
我偶然遇到这个问题此处
分析
根据 java docs ZipEntry,有时请求zip文件的大小只返回-1
但是,运行命令
$ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub
存档:b17c024e-89f1-42f7-a546-91d46610cedb.epub
长度日期时间名称
-------- ---- ---- ----
20 01-27-12 11:17 mimetype
2378 04-20-12 10 :12 OEBPS / hayat-ghayr.html
6436 02-06-12 11:06 OEBPS / content.opf
112579 01-27-12 11:25 OEBPS / images / 978-614-425- 313-7-hayat-ghayr-cover.png
182575 01-27-12 11:25 OEBPS / images / 978-614-425-313-7-hayat_fmt.png
7757 01-27- 12 11:21 OEBPS / template.css
5643 01-27-12 11:18 OEBPS / hayat-ghayr-2.html
20144 01-27-12 11:17 OEBPS / hayat-ghayr- 1.html
65543 01-27-12 11:17 OEBPS / hayat-ghayr-3.html
59434 01-27-12 11:17 OEBPS / hayat-ghayr-4.html
66768 01-27-12 11:17 OEBPS / hayat-ghayr-5.html
49117 01-27-12 11:17 OEBPS / hayat-ghayr-6.html
65346 01-27- 12 11:17 OEBPS / hayat-ghayr-7.html
74196 01-27-12 11:17 OEBPS / hayat-ghayr-8.html
73998 01-27-12 11:17 OEBPS / hayat-ghayr-9.html
61031 01-27-12 11:17 OEBPS / hayat-ghayr-10.html
68297 01-27-12 11:17 OEBPS / hayat-ghayr-11。 html
72084 01-27-12 11:17 OEBPS / hayat-ghayr-12.html
2386 01-27-12 11:17 OEBPS / hayat-ghayr-13.html
61132 01-27-12 11:17 OEBPS / hayat-ghayr-14.html
46320 01-27-12 11:17 OEBPS / hayat-ghayr-15.html
32673 01-27-12 11 :17 OEBPS / hayat-ghayr-16.html
88584 01-27-12 11:17 OEBPS / hayat-ghayr-17.html
56474 01-27-12 11:17 OEBPS / hayat- ghayr-18.html
52840 01-27-12 11:17 OEBPS / hayat-ghayr-19.html
80022 01-27-12 11:17 OEBPS / hayat-ghayr-20.html
50781 01-27-12 11:17 OEBPS / hayat-ghayr-21.html
2765 01-27-12 11:17 OEBPS / hayat-ghayr-22.html
265 01- 27-12 11:17 META-INF / container.xml
54942 01-27-12 11:17 OEBPS / images / 277.png
5549 01-27-12 11:17 OEBPS / toc。 ncx
1072 03-23-12 13:28 iTunesMetadata.plist
-------- -------
1529151 32个文件
显示所有章节都有一个内容长度..
,但是如果我们解压缩相同的文件并重新压缩它再次具有更强的压缩.. zipFile java命令返回正确的内容大小
问题
zip库的故障或原始压缩故障?我们如何知道?
跟进问题
请参阅第4.4.8 / 4.4节.9在 https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 其中涉及大小字段
如果设置通用位标志的位3,
这些字段在本地头中设置为零,
正确的值放在数据描述符中,
放在中央目录中。
数据描述符紧跟在条目的压缩内容之后,因此在从不可寻找的流中读取时,在读取条目的实际内容之前不可用。
当使用 ZipArchiveInputStream
时,只要读取了本地文件头,就会获得 ZipEntry
(因为基础流可能不可寻找),所以大小信息可能丢失。 ZipFile
在封面下使用 RandomAccessFile
,可以读取中央目录 -
和朋友 - 因此他们知道多于 ZipArchiveInputStream
。
background
I stumbled across this problem here
analysis
according to the java docs for ZipEntry, sometimes requesting the size of a zipfile entry simply returns -1
However, running the command
$ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub
Archive: b17c024e-89f1-42f7-a546-91d46610cedb.epub
Length Date Time Name
-------- ---- ---- ----
20 01-27-12 11:17 mimetype
2378 04-20-12 10:12 OEBPS/hayat-ghayr.html
6436 02-06-12 11:06 OEBPS/content.opf
112579 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat-ghayr-cover.png
182575 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat_fmt.png
7757 01-27-12 11:21 OEBPS/template.css
5643 01-27-12 11:18 OEBPS/hayat-ghayr-2.html
20144 01-27-12 11:17 OEBPS/hayat-ghayr-1.html
65543 01-27-12 11:17 OEBPS/hayat-ghayr-3.html
59434 01-27-12 11:17 OEBPS/hayat-ghayr-4.html
66768 01-27-12 11:17 OEBPS/hayat-ghayr-5.html
49117 01-27-12 11:17 OEBPS/hayat-ghayr-6.html
65346 01-27-12 11:17 OEBPS/hayat-ghayr-7.html
74196 01-27-12 11:17 OEBPS/hayat-ghayr-8.html
73998 01-27-12 11:17 OEBPS/hayat-ghayr-9.html
61031 01-27-12 11:17 OEBPS/hayat-ghayr-10.html
68297 01-27-12 11:17 OEBPS/hayat-ghayr-11.html
72084 01-27-12 11:17 OEBPS/hayat-ghayr-12.html
2386 01-27-12 11:17 OEBPS/hayat-ghayr-13.html
61132 01-27-12 11:17 OEBPS/hayat-ghayr-14.html
46320 01-27-12 11:17 OEBPS/hayat-ghayr-15.html
32673 01-27-12 11:17 OEBPS/hayat-ghayr-16.html
88584 01-27-12 11:17 OEBPS/hayat-ghayr-17.html
56474 01-27-12 11:17 OEBPS/hayat-ghayr-18.html
52840 01-27-12 11:17 OEBPS/hayat-ghayr-19.html
80022 01-27-12 11:17 OEBPS/hayat-ghayr-20.html
50781 01-27-12 11:17 OEBPS/hayat-ghayr-21.html
2765 01-27-12 11:17 OEBPS/hayat-ghayr-22.html
265 01-27-12 11:17 META-INF/container.xml
54942 01-27-12 11:17 OEBPS/images/277.png
5549 01-27-12 11:17 OEBPS/toc.ncx
1072 03-23-12 13:28 iTunesMetadata.plist
-------- -------
1529151 32 files
shows that there is a content length for all the chapters.. but also, if we unzip the same file and rezip it again with stronger compression.. the zipFile java command returns the proper content size
question
is this the zip library's fault or the original compression fault? how can we know?
follow up question
see how to access a zipEntry from a streamed zipfile
ZIP stores meta data inside the archive in a few different places ("local file header", "central directory" and sometimes a "data descriptor"). Only the "local file header" is in front of the file's content - the "central directory" is at the very end of the archive. Only the "central directory" holds the full truth, it is perfectly valid to not specify any size in the "local file header".
See section 4.4.8/4.4.9 in https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which talks about the size fields
If bit 3 of the general purpose bit flag is set, these fields are set to zero in the local header and the correct values are put in the data descriptor and in the central directory.
The "data descriptor" immediately follows the compressed content of the entry - and thus is not available before reading the actual content of the entry when reading from a non-seekable stream.
When using ZipArchiveInputStream
you obtain the ZipEntry
as soon as the "local file header" has been read (because the underlying stream may not be seekable), so the size information may be missing. ZipFile
uses RandomAccessFile
under the covers and can read the "central directory" - as does unzip
and friends - so they know more than ZipArchiveInputStream
.
这篇关于为什么某些zip文件具有未知的文件内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!