如何zlib,gzip和zip相关?他们有什么共同点,它们有什么不同? [英] How are zlib, gzip and zip related? What do they have in common and how are they different?

查看:657
本文介绍了如何zlib,gzip和zip相关?他们有什么共同点,它们有什么不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

zlib 中使用的压缩算法与 gzip zip 基本相同。 gzip 和 zip 是什么?

解决方案

短格式:



.zip 是归档格式,通常使用排除压缩方法 .gz gzip格式适用于单个文件,也使用放缩压缩方法。通常,gzip与 tar组合使用压缩归档格式 .tar.gz zlib库提供了用于zip,gzip, png (它使用了关于deflate数据的 zlib包装器)和许多其他应用程序



ZIP格式由Phil Katz开发为具有开放规范的开放格式,其中他的实现PKZIP是共享软件。它是一种存储文件及其目录结构的归档格式,其中每个文件单独压缩。文件类型为 .zip



ZIP格式支持多种压缩方法:

  0  - 文件存储(无压缩)
1 - 文件为Shrunk
2 - 文件以压缩系数1缩减
3 - 使用压缩系数2减小文件
4 - 使用压缩系数3减小文件
5 - 使用压缩系数4减小文件
6 - 文件被内嵌
7 - 保留用于令牌压缩算法
8 - 文件放气
9 - 使用Deflate64(tm)增强放气
10 - PKWARE数据压缩库压缩(旧IBM TERSE)
11 - 由PKWARE保留
12 - 文件使用BZIP2算法压缩
13 - 由PKWARE保留
14 - LZMA(EFS)
15 - 由PKWARE保留
16 - 由PKWARE保留
17 - 由PKWARE保留
18 - 文件使用IBM TERSE(新)压缩
19 - IBM LZ77 z体系结构(PFS)
97 - WavPack压缩数据
98 - PPMd版本I,修订1

方法1到7是历史不使用。方法9至98是相对新近的添加,并且在变化,少量的使用中。在ZIP格式中真正广泛使用的唯一方法是方法8, Deflate ,以及一些较小范围的方法0是根本没有压缩。事实上,你会在野外遇到的每个 .zip 文件将使用专门的方法8和0,可能只是方法8.(方法8也有一种方法有效地存储数据没有压缩和相对较小的扩展,方法0不能流式传输,而方法8可以。)



ISO / IEC 21320-1:2015文件容器标准是受限制的zip格式,例如用于Java归档文件(.jar),Office Open XML文件(Microsoft Office .docx,.xlsx,.pptx),Office文档格式文件(.odt,.ods,.odp)和EPUB文件(.epub)。该标准将压缩方法限制为0和8,以及其他约束,例如没有加密或签名。



在1990年左右,Info-ZIP group 撰写了 zip 解压缩的便携,免费,开源实现实用程序,支持使用Deflate格式的压缩,以及对那些和早期格式的解压缩。这大大扩展了 .zip 格式的使用。



在90年代初,gzip格式是为了替代 Unix compress 实用程序,派生自Info-ZIP实用程序中的Deflate代码。 Unix compress 旨在压缩单个文件或流,并将 .Z 附加到文件名。 compress 使用 LZW压缩算法,其当时处于专利之下,其自由使用受到专利持有人的争议。虽然Deflate的一些具体实现是由Phil Katz获得专利,但是格式不是,因此可以编写一个没有侵犯任何专利的Deflate实现。在过去20多年里,这种实施没有如此挑战。 Unix gzip 实用程序用于替换 compress ,实际上可以解压缩 compress 压缩数据(假设您能够解析该句子)。 gzip 在文件名后面附加一个 .gz gzip 使用Deflate压缩数据格式,比Unix compress 压缩得多一点,具有非常快的解压缩,添加CRC-32作为数据的完整性检查。标头格式还允许存储比允许的 compress 格式更多的信息,例如原始文件名和文件修改时间。



虽然 compress 只压缩一个文件,但通常使用 tar 将文件,其属性和目录结构存档到一个 .tar 文件中,然后使用 compress 创建一个 .tar.Z 文件。事实上, tar 实用程序已经并且仍然有一个选项可以同时进行压缩,而不必管道 tar compress 。这一切都转到gzip格式, tar 有一个选项直接压缩到 .tar.gz 格式。 tar.gz 格式的压缩优于 .zip 方法,因为压缩 .tar 可以利用跨文件的冗余,特别是许多小文件。 .tar.gz 是在Unix上使用的最常用的归档格式,因为它具有很高的可移植性,但是使用中还有更有效的压缩方法,请参阅 .tar.bz2 .tar.xz 档案。



.tar 不同, .zip 在结尾处有一个中央目录,它提供了一个内容列表。这和单独的压缩提供了对 .zip 文件中的各个条目的随机访问。 .tar 文件必须解压缩并从头到尾扫描才能构建一个目录,这是一个 .tar 文件。



在gzip引入后不久,大约在20世纪90年代中期,同一个专利纠纷引起了人们对 .gif 图像格式,广泛用于公告牌和万维网(当时的新事物)。因此,一个小组创建了PNG无损压缩图像格式,文件类型为 .png ,替换 .gif 。该格式还使用用于压缩的Deflate格式,该格式在图像数据上的过滤器暴露更多冗余之后应用。为了促进PNG格式的广泛使用,创建了两个免费的代码库。 libpng zlib 。 libpng处理了PNG格式的所有功能,zlib提供了压缩和解压缩代码供libpng以及其他应用程序使用。 zlib改编自 gzip 代码。



所有提及的专利已过期。



zlib库支持Deflate压缩和解压缩,以及三种在缩放流周围的包装。这些是:没有包装(原始放气), zlib包装,其用于PNG格式的数据块,和gzip包装,为程序员提供gzip例程。 zlib和gzip包装的主要区别在于zlib包装更紧凑,6个字节,而gzip最少为18个字节,完整性检查Adler-32的运行速度比gzip使用的CRC-32快。原始泄露由读取和写入 .zip 格式的程序使用,该格式是压缩压缩数据的另一种格式。



pb zlib现在广泛用于数据传输和存储。例如,服务器和浏览器的大多数HTTP事务使用zlib压缩和解压缩数据。



不同的deflate实现可能导致相同输入数据的不同压缩输出,如通过存在可选择的压缩级别,允许折衷CPU时间的压缩有效性证明。 zlib和PKZIP不是缩放压缩和解压缩的唯一实现。 7-Zip存档实用程序和Google的 zopfli库能够使用比zlib更多的CPU时间,以便在使用deflate格式时挤出最后几个位,与zlib的最高压缩级别相比将压缩大小减少了几个百分点。 pigz实用程序是gzip的并行实施,包括使用zlib(压缩级别1-9)或zopfli(压缩级别11)的选项),并通过在多个处理器和内核上分割大文件的压缩来稍微减轻使用zopfli的时间影响。


The compression algorithm used in zlib is essentially the same as that in gzip and zip. What are gzip and zip? How are they different and how are they same?

解决方案

Short form:

.zip is an archive format using, usually, the Deflate compression method. The .gz gzip format is for single files, also using the Deflate compression method. Often gzip is used in combination with tar to make a compressed archive format, .tar.gz. The zlib library provides Deflate compression and decompression code for use by zip, gzip, png (which uses the zlib wrapper on deflate data), and many other applications.

Long form:

The ZIP format was developed by Phil Katz as an open format with an open specification, where his implementation, PKZIP, was shareware. It is an archive format that stores files and their directory structure, where each file is individually compressed. The file type is .zip. The files, as well as the directory structure, can optionally be encrypted.

The ZIP format supports several compression methods:

0 - The file is stored (no compression)
1 - The file is Shrunk
2 - The file is Reduced with compression factor 1
3 - The file is Reduced with compression factor 2
4 - The file is Reduced with compression factor 3
5 - The file is Reduced with compression factor 4
6 - The file is Imploded
7 - Reserved for Tokenizing compression algorithm
8 - The file is Deflated
9 - Enhanced Deflating using Deflate64(tm)
10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
11 - Reserved by PKWARE
12 - File is compressed using BZIP2 algorithm
13 - Reserved by PKWARE
14 - LZMA (EFS)
15 - Reserved by PKWARE
16 - Reserved by PKWARE
17 - Reserved by PKWARE
18 - File is compressed using IBM TERSE (new)
19 - IBM LZ77 z Architecture (PFS)
97 - WavPack compressed data
98 - PPMd version I, Rev 1

Methods 1 to 7 are historical and are not in use. Methods 9 through 98 are relatively recent additions, and are in varying, small amounts of use. The only method in truly widespread use in the ZIP format is method 8, Deflate, and to some smaller extent method 0, which is no compression at all. Virtually every .zip file that you will come across in the wild will use exclusively methods 8 and 0, likely just method 8. (Method 8 also has a means to effectively store the data with no compression and relatively little expansion, and Method 0 cannot be streamed whereas Method 8 can be.)

The ISO/IEC 21320-1:2015 standard for file containers is a restricted zip format, such as used in Java archive files (.jar), Office Open XML files (Microsoft Office .docx, .xlsx, .pptx), Office Document Format files (.odt, .ods, .odp), and EPUB files (.epub). That standard limits the compression methods to 0 and 8, as well as other constraints such as no encryption or signatures.

Around 1990, the Info-ZIP group wrote portable, free, open source implementations of zip and unzip utilities, supporting compression with the Deflate format, and decompression of that and the earlier formats. This greatly expanded the use of the .zip format.

In the early 90's, the gzip format was developed as a replacement for the Unix compress utility, derived from the Deflate code in the Info-ZIP utilities. Unix compress was designed to compress a single file or stream, appending a .Z to the file name. compress uses the LZW compression algorithm, which at the time was under patent and its free use was in dispute by the patent holders. Though some specific implementations of Deflate were patented by Phil Katz, the format was not, and so it was possible to write a Deflate implementation that did not infringe on any patents. That implementation has not been so challenged in the last 20+ years. The Unix gzip utility was intended as a drop-in replacement for compress, and in fact is able to decompress compress-compressed data (assuming that you were able to parse that sentence). gzip appends a .gz to the file name. gzip uses the Deflate compressed data format, which compresses quite a bit better than Unix compress, has very fast decompression, and adds a CRC-32 as an integrity check for the data. The header format also permits the storage of more information than the compress format allowed, such as the original file name and the file modification time.

Though compress only compresses a single file, it was common to use the tar utility to create an archive of files, their attributes, and their directory structure into a single .tar file, and to then compress it with compress to make a .tar.Z file. In fact the tar utility had and still has an option to do the compression at the same time, instead of having to pipe the output of tar to compress. This all carried forward to the gzip format, and tar has an option to compress directly to the .tar.gz format. The tar.gz format compresses better than the .zip approach, since the compression of a .tar can take advantage of redundancy across files, especially many small files. .tar.gz is the most common archive format in use on Unix due to its very high portability, but there are more effective compression methods in use as well, so you will often see .tar.bz2 and .tar.xz archives.

Unlike .tar, .zip has a central directory at the end, which provides a list of the contents. That and the separate compression provides random access to the individual entries in a .zip file. A .tar file would have to be decompressed and scanned from start to end in order to build a directory, which is how a .tar file is listed.

Shortly after the introduction of gzip, around the mid-1990's, the same patent dispute called into question the free use of the .gif image format, very widely used on bulletin boards and the World Wide Web (a new thing at the time). So a small group created the PNG losslessly compressed image format, with file type .png, to replace .gif. That format also uses the Deflate format for compression, which is applied after filters on the image data expose more of the redundancy. In order to promote widespread usage of the PNG format, two free code libraries were created. libpng and zlib. libpng handled all of the features of the PNG format, and zlib provided the compression and decompression code for use by libpng, as well as for other applications. zlib was adapted from the gzip code.

All of the mentioned patents have since expired.

The zlib library supports Deflate compression and decompression, and three kinds of wrapping around the deflate streams. Those are: no wrapping at all ("raw" deflate), zlib wrapping, which is used in the PNG format data blocks, and gzip wrapping, to provide gzip routines for the programmer. The main difference between zlib and gzip wrapping is that the zlib wrapping is more compact, six bytes vs. a minimum of 18 bytes for gzip, and the integrity check, Adler-32, runs faster than the CRC-32 that gzip uses. Raw deflate is used by programs that read and write the .zip format, which is another format that wraps around deflate compressed data.

zlib is now in wide use for data transmission and storage. For example, most HTTP transactions by servers and browsers compress and decompress the data using zlib.

Different implementations of deflate can result in different compressed output for the same input data, as evidenced by the existence of selectable compression levels that allow trading off compression effectiveness for CPU time. zlib and PKZIP are not the only implementations of deflate compression and decompression. Both the 7-Zip archiving utility and Google's zopfli library have the ability to use much more CPU time than zlib in order to squeeze out the last few bits possible when using the deflate format, reducing compressed sizes by a few percent as compared to zlib's highest compression level. The pigz utility, a parallel implementation of gzip, includes the option to use zlib (compression levels 1-9) or zopfli (compression level 11), and somewhat mitigates the time impact of using zopfli by splitting the compression of large files over multiple processors and cores.

这篇关于如何zlib,gzip和zip相关?他们有什么共同点,它们有什么不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆