压缩后的输出不同于Go to Ruby的实现 [英] Compressed output differs from Go to Ruby Implementation
问题描述
我正在实现一个程序,将文件分解为git blob并适当地存储它.
我正尝试在此处
但是,我遇到了一个问题,即每个实现中存储的压缩数据略有不同.
vbindiff
显示前两个字节相同(如从此测试脚本)(如果我没看错的话).这些字节分别存储压缩方法,标志和标志(按照 https://tools.ietf.org /html/rfc1950 ).第三个字节是差异开始的地方,这可以是字典ID或原始输入数据的开始.数据将保持相似,直到接近文件末尾.我假设这可能是ADLER32校验和中的差异.
默认情况下,zlib的go和Ruby实现似乎都不会将字典传递给zlib(根据红宝石zlib源文件)
数据显示相同.
我不确定库中是否存在实现错误,或者我只是缺少什么.
为什么这些输出不同?
RFC 1951中定义的deflate算法(以RFC 1950定义的zlib格式以及RFC 1952定义的gzip格式使用)允许对压缩时可能导致不同结果的实现.但是这些结果仍将解压缩为相同的值.这样可以权衡压缩时间到压缩级别,并使 zopfli 之类的程序也可以实现更好的压缩比原始zlib库要大(但压缩时间要大得多).
Go使用Go编写的deflate算法自己的实现,而ruby使用 zlib库.这就是您的示例在同一输入上创建不同的压缩输出的原因.但是,如果您从Go或Ruby程序中获取输出并解压缩(无论是使用Ruby还是Go还是任何符合标准的实现),它将再次得到完全相同的值.
I'm implementing a program that deflates a file into a git blob and stores it appropriately.
I have a ruby reference implementation that's based on an article from the git book
I'm attempting to implement this in go here
However, I'm running into an issue where the stored compressed data differs slightly with each implementation.
vbindiff
shows that the first 2 bytes are identical (as run from this test script) (If I'm reading this right). These bytes store the compression method and flags, and flags respectively (as per https://tools.ietf.org/html/rfc1950). The third byte is where the difference begins, this is either the dictionary ID or the start of the original input data. The data remains similar until near the end of the file. I'm assuming this is probably the difference in the ADLER32 checksum.
It seems that both the go and Ruby implementations of zlib do not pass a dictionary to zlib by default (as per go zlib source and ruby zlib source)
The data appears identical.
I'm not sure if there's an implementation error in the libraries or if I'm just missing something.
Why are these outputs different?
The deflate algorithm as defined in RFC 1951 (which is used in the zlib format defined by RFC 1950 and also in gzip defined by RFC 1952) allows variations in the implementation which might lead to different results when compressing. But these results will still decompress to the same value. This allows for a tradeoff of compression time to compression level and makes also programs like zopfli possible which achieve better compression than the original zlib library (at the cost of significantly larger compression time).
Go uses its own implementation of the deflate algorithm written in Go while ruby uses the zlib library. This is the reason your examples create different compressed output on the same input. But if you take the output from the Go or Ruby program and decompress (no matter if done with Ruby or Go or whatever standard-conforming implementation) it again it will result in exactly the same value.
这篇关于压缩后的输出不同于Go to Ruby的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!