以python错误下载大文件:压缩文件在达到流末标记之前结束 [英] Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached

查看:11852
本文介绍了以python错误下载大文件:压缩文件在达到流末标记之前结束的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从互联网下载一个压缩文件:

I am downloading a compressed file from the internet:

with lzma.open(urllib.request.urlopen(url)) as file:
    for line in file:
        ...

下载并处理了大部分文件,我最终得到错误:

After having downloaded and processed a a large part of the file, I eventually get the error:


文件/usr/lib/python3.4/ (压缩文件在EOFError:压缩文件之前结束
在到达流标记结束之前结束

File "/usr/lib/python3.4/lzma.py", line 225, in _fill_buffer raise EOFError("Compressed file ended before the " EOFError: Compressed file ended before the end-of-stream marker was reached

我认为这可能是由于互联网连接丢失或服务器没有响应一段时间,如果是这样,是无论如何它不断尝试,直到连接重新建立,而不是抛出一个异常
我不认为这是一个问题的文件,因为我已经手动下载了很多文件,它从同一个网站手动和解压缩。我也能够用Python下载和解压缩一些较小的文件。我试图下载的文件的压缩大小约为20 GB。

I am thinking that it might be caused by an internet connection that drops or the server not responding for some time. If that is the case, is there anyway to make it keep trying, until connection is reestablished, instead of throwing an exception. I don't think it is a problem with the file, as I have manually downloaded many files like it from the same website manually and decompressed it. I have also been able to download and decompress some smaller files with Python. The file I am trying to download has a compressed size of about 20 GB.

推荐答案

您是否尝试过使用请求库?我相信它提供了一个对urllib的抽象。

Have you tried using the requests library? I believe it provides an abstraction over urllib.

下面的解决方案应该为你工作,但它使用请求库而不是urllib(但请求> urllib反正!让我知道您是否喜欢继续使用urllib。

The following solution should work for you, but it uses the requests library instead of urllib (but requests > urllib anyway!). Let me know if you prefer to continue using urllib.

import os
import requests
def download(url, chunk_s=1024, fname=None):
    if not fname:
        fname = url.split('/')[-1]
    req = requests.get(url, stream=True)
    with open(fname, 'wb') as fh:
        for chunk in req.iter_content(chunk_size=chunk_s):
            if chunk:
                fh.write(chunk)
    return os.path.join(os.getcwd(), fname)

这篇关于以python错误下载大文件:压缩文件在达到流末标记之前结束的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆