使用 python 请求下载 *.gz 压缩文件会损坏它 [英] Downloading *.gz zipped file with python requests corrupt it

查看:47
本文介绍了使用 python 请求下载 *.gz 压缩文件会损坏它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用这个代码(它只是一部分)下载*.gz档案.

I use this code (it is only a part) to download *.gz archive.

with requests.session() as s:
    s.post(login_to_site_URL, payload)    
    load = s.get(scene, stream=True)

    with open(path_to_file, "wb") as save_command:
        for chunk in load.iter_content(chunk_size=1024, decode_unicode=False):
            if chunk:
                save_command.write(chunk)
                save_command.flush()

下载后文件的大小是我点击另存为"下载文件时的两倍.并且文件已损坏.该文件的链接是:<代码>http://www.zsrcpod.aviales.ru/modistlm/archive/tlm/geo/00000/28325/terra_77835_20140806_060059.geo.hdf.gz

After download the size of the file is twice more than when I download file by clicking "save as" on it. And the file is corrupted. Link for the file is:http://www.zsrcpod.aviales.ru/modistlm/archive/tlm/geo/00000/28325/terra_77835_20140806_060059.geo.hdf.gz

文件需要登录名和密码,所以我添加了我点击链接时看到的截图:http://i.stack.imgur.com/DGqtS.jpg

File require login and password, so I add a screenshot of what I see when I follow the link: http://i.stack.imgur.com/DGqtS.jpg

似乎设置了一些选项来将此存档定义为文本.

Looks like some options set to define this archive as a text.

file.header 是:

file.header is:

{'content-length': '58277138',
'content-encoding': 'gzip',
'set-cookie': 'cidaviales=53616c7465645f5fc8f0abdb26f7b0536784ae4e8b302410a288f1f67ccc0afd13ce067d97ba237dc27749d9957f30457f1a1d9763b03637; path=/,
 avialestime=1407386483; path=/; expires=Wed,
 05-Nov-2014 04:41:23 GMT,
ciddaviales=53616c7465645f5fc8f0abdb26f7b0536784ae4e8b302410a288f1f67ccc0afd13ce067d97ba237dc27749d9957f30457f1a1d9763b03637; domain=aviales.ru; path=/',
'accept-ranges': 'bytes',
'server': 'Apache/1.3.37 (Unix) mod_perl/1.30',
'last-modified': 'Wed, 06 Aug 2014 06:17:14 GMT',
'etag': '"21d4e63-3793d12-53e1c86a"',
'date': 'Thu, 07 Aug 2014 04:41:23 GMT',
'content-type': 'text/plain; charset=windows-1251'}

如何使用python请求库正确下载此文件?

How to properly download this file using python requests library?

推荐答案

看起来 requests 会自动为您解压内容.请参阅此处

It looks like requests automatically decompresses the content for you. See here

请求会自动解压缩 gzip 编码的响应,并且不会最好尽可能将响应内容解码为 un​​icode.你可以如果需要,可以直接访问原始响应(甚至是套接字)还有

Requests automatically decompresses gzip-encoded responses, and does its best to decode response content to unicode when possible. You can get direct access to the raw response (and even the socket), if needed as well

如果 Accept-Encoding 请求标头包含 gzip,这是默认行为.您可以通过打印 s.request.headers 来检查这一点.为了能够获得原始数据,您应该修改此 headers dict 以排除 gzip,但是在您的情况下,解压缩的数据看起来像有效的 hdf 文件- 所以,只需使用此扩展程序保存并使用它!

This is default behaviour if Accept-Encoding request header contains gzip. You can check this by printing s.request.headers. To be able to get raw data you should modify this headers dict to exclude gzip, however in your case the decompressed data looks like valid hdf file - so, just save it with this extension and use it!

这篇关于使用 python 请求下载 *.gz 压缩文件会损坏它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆