python中的可重复gzip文件 [英] Repeatably gzip files in python

查看:98
本文介绍了python中的可重复gzip文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用python编写脚本,以将静态站点部署到AWS(S3,Cloudfront,Route53).因为我不想在每次部署中都上载每个文件,所以我通过比较md5哈希值和e-tag(将s3设置为对象的md5哈希值)来检查修改了哪些文件.这对所有文件都适用,除了我的构建脚本在上传之前使用gzip压缩的文件以外.看一下文件内部,看来gzip并不是真正的纯函数.每次运行gzip时,即使源文件没有更改,输出文件也会有很小的差异.

I'm writing a script in python for deploying static sites to aws (s3, cloudfront, route53). Because I don't want to upload every file on every deploy, I check which files were modified by comparing their md5 hash with their e-tag (which s3 sets to be the object's md5 hash). This works well for all files except for those that my build script gzips before uploading. Taking a look inside the files, it seems like gzip isn't really a pure function; there are very slight differences in the output file every time gzip is run, even if the source file hasn't changed.

我的问题是:有没有办法让gzip在给定完全相同的输入的情况下可靠且可重复地输出完全相同的文件?还是我最好只是检查文件是否已压缩,解压缩并计算md5哈希值/而是手动为其设置电子标签值?

My question is this: is there any way to get gzip to reliably and repeatably output the exact same file given the exact same input? Or am I better off just checking if the file is gzipped, unzipping it and computing the md5 hash/manually setting the e-tag value for it instead?

推荐答案

每次压缩的数据都是相同的.唯一不同的可能是标头中的修改时间. GzipFile的第五个参数(如果您使用的是该参数)允许您在标头中指定修改时间.第一个参数是文件名,它也出现在标题中,因此您希望保持不变.如果为源数据提供第四个参数,则第一个参数仅用于填充标题的文件名部分.

The compressed data is the same each time. The only thing that differs is likely the modification time in the header. The fifth argument of GzipFile (if that's what you're using) allows you to specify the modification time in the header. The first argument is the file name, which also goes in the header, so you want to keep that the same. If you provide a fourth argument for the source data, then the first argument is used only to populate the file name portion of the header.

这篇关于python中的可重复gzip文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆