使用python请求下载* .gz压缩文件会损坏它 [英] Downloading *.gz zipped file with python requests corrupt it

查看:347
本文介绍了使用python请求下载* .gz压缩文件会损坏它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用此代码(仅一部分)下载* .gz存档.

I use this code (it is only a part) to download *.gz archive.

with requests.session() as s:
    s.post(login_to_site_URL, payload)    
    load = s.get(scene, stream=True)

    with open(path_to_file, "wb") as save_command:
        for chunk in load.iter_content(chunk_size=1024, decode_unicode=False):
            if chunk:
                save_command.write(chunk)
                save_command.flush()

下载后,文件的大小是单击另存为"后下载文件的两倍.文件已损坏. 该文件的链接为: http://www.zsrcpod.aviales.ru/modistlm/archive/tlm/geo/00000/28325/terra_77835_20140806_060059.geo.hdf.gz

After download the size of the file is twice more than when I download file by clicking "save as" on it. And the file is corrupted. Link for the file is:http://www.zsrcpod.aviales.ru/modistlm/archive/tlm/geo/00000/28325/terra_77835_20140806_060059.geo.hdf.gz

文件需要登录名和密码,因此我添加了以下链接的屏幕截图: http://i.stack.imgur.com/DGqtS.jpg

File require login and password, so I add a screenshot of what I see when I follow the link: http://i.stack.imgur.com/DGqtS.jpg

好像设置了一些选项来将该档案定义为文本.

Looks like some options set to define this archive as a text.

file.header是:

file.header is:

{'content-length': '58277138',
'content-encoding': 'gzip',
'set-cookie': 'cidaviales=53616c7465645f5fc8f0abdb26f7b0536784ae4e8b302410a288f1f67ccc0afd13ce067d97ba237dc27749d9957f30457f1a1d9763b03637; path=/,
 avialestime=1407386483; path=/; expires=Wed,
 05-Nov-2014 04:41:23 GMT,
ciddaviales=53616c7465645f5fc8f0abdb26f7b0536784ae4e8b302410a288f1f67ccc0afd13ce067d97ba237dc27749d9957f30457f1a1d9763b03637; domain=aviales.ru; path=/',
'accept-ranges': 'bytes',
'server': 'Apache/1.3.37 (Unix) mod_perl/1.30',
'last-modified': 'Wed, 06 Aug 2014 06:17:14 GMT',
'etag': '"21d4e63-3793d12-53e1c86a"',
'date': 'Thu, 07 Aug 2014 04:41:23 GMT',
'content-type': 'text/plain; charset=windows-1251'}

如何使用python请求库正确下载此文件?

How to properly download this file using python requests library?

推荐答案

请求看起来会自动为您解压缩内容.请参见此处

It looks like requests automatically decompresses the content for you. See here

请求会自动解压缩gzip编码的响应,然后执行 尽可能将响应内容解码为unicode.你可以 如果需要,可以直接访问原始响应(甚至套接字) 以及

Requests automatically decompresses gzip-encoded responses, and does its best to decode response content to unicode when possible. You can get direct access to the raw response (and even the socket), if needed as well

如果 Accept-Encoding 请求标头包含 gzip ,则这是默认行为.您可以通过打印 s.request.headers 进行检查.为了能够获取原始数据,您应该修改此 headers 字典以排除 gzip ,但是在您的情况下,解压缩后的数据看起来像有效的 hdf 文件-因此,只需使用此扩展名保存并使用它即可!

This is default behaviour if Accept-Encoding request header contains gzip. You can check this by printing s.request.headers. To be able to get raw data you should modify this headers dict to exclude gzip, however in your case the decompressed data looks like valid hdf file - so, just save it with this extension and use it!

这篇关于使用python请求下载* .gz压缩文件会损坏它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆