为什么两次压缩相同的内容会得到两个具有不同 SHA1 的文件? [英] Why does Zipping the same content twice gives two files with different SHA1?

查看:54
本文介绍了为什么两次压缩相同的内容会得到两个具有不同 SHA1 的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个关于 git 和 zip 文件的奇怪问题.我的构建脚本需要一堆文档 html 页面并将它们压缩到 docs.zip 中,然后我将此文件检查到 git 中.

I have run into a strange problem with git and zip files. My build script takes a bunch of documentation html pages and zips them into a docs.zip I then check this file into git.

我遇到的问题是,每次我重新运行构建脚本并获取新的 zip 文件时,新的 zip 文件的 SHA1 与之前运行的不同.我的构建脚本正在调用 ant zip 任务.但是,如果我将同一目录压缩两次,从 Mac OS X shell 手动调用 macOSX zip 会给我一个不同的 sha1.

The problem I am having is that every time I re-run the build script and get a new zip file the new zip file has a different SHA1 than the previous run. My build script is calling the ant zip task. However manualy calling the macOSX zip from the Mac OS X shell gives me a different sha1 if I zip up the same directory twice.

运行 1:

zip foo.zip *
openssl sha1 foo.zip 
rm foo.zip 

运行 2:

zip foo.zip *
openssl sha1 foo.zip

运行 1 和运行 2 给出不同的 SHA1,即使内容在运行之间没有改变.在这两种情况下,zip 都打印出完全相同的压缩文件,这并不表示 .DS_Store 等任何操作系统特定文件都包含在 zip 文件中.

Run 1 and run2 give different SHA1 even though the content did not change between runs. In both cases zip prints out exactly the same files being zipped it does not indicate that any OS specific files like .DS_Store are being included in the zip file.

zip 算法是确定性的吗?如果在相同的内容上运行它会产生完全相同的位吗?如果不是为什么不呢?

Is the zip algoritm deterministic? If run on the same content will it produce exactly the same bits? if not why not?

以确定性方式压缩文件有哪些选择?压缩文件中有数千个文件,我不希望这些文件有太大变化.我知道 git 会压缩您签入的任何文件,但压缩它们的动机只是为了不影响它们的质量.

What are my choices for zipping the files in a deterministic way? There are thousands of them in the zipped up file, I don't expect those files to change much. I know that git will zip up any files you checkin but the motivation to zip them is to just keep the mass of them out of the way.

推荐答案

根据维基百科 http://en.wikipedia.org/wiki/Zip_(file_format) 似乎 zip 文件有标题文件上次修改时间和文件上次修改日期,因此如果从同一内容重新构建 zip,任何检入 git 的 zip 文件在 git 中都将显示为已更改.而且似乎没有标志告诉它不要设置这些标题.

According to Wikipedia http://en.wikipedia.org/wiki/Zip_(file_format) seems that zip files have headers for File last modification time and File last modification date so any zip file checked into git will appear to git to have changed if the zip is rebuilt from the same content since. And it seems that there is no flag to tell it to not set those headers.

我只使用 tar,如果多次运行,它似乎为相同的输入生成相同的字节.

I am resorting to just using tar, it seems to produce the same bytes for the same input if run multiple times.

这篇关于为什么两次压缩相同的内容会得到两个具有不同 SHA1 的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆