如何在 Python 3.5 中恢复文件下载? [英] How to resume file download in Python 3.5?

查看:72
本文介绍了如何在 Python 3.5 中恢复文件下载?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 python 3.5 requests 模块使用以下代码下载文件,如何使此代码自动恢复"从部分下载的文件中下载.

I am using python 3.5 requests module to download a file using the following code, how to make this code "auto-resume" the download from partially downloaded file.

response = requests.get(url, stream=True)

total_size = int(response.headers.get('content-length'))  

with open(file_path + file_name, "wb") as file:
    for data in tqdm(iterable = response.iter_content(chunk_size = 1024), total = total_size//1024, unit = 'KB'):
        file.write(data)

如果可能,我宁愿只使用 requests 模块来实现这一点.

I would prefer to use only requests module to achieve this if possible.

推荐答案

我不认为 requests 内置了这个——但是你可以很容易地手动完成(只要服务器支持它).

I don't think requests has this built in—but you can do it manually pretty easily (as long as the server supports it).

关键是范围请求.要获取从字节 12345 开始的部分资源,请添加以下标头:

The key is Range requests. To fetch part of a resource starting at byte 12345, you add this header:

Range: bytes=12345-

然后您可以将结果附加到您的文件中.

And then you can just append the results onto your file.

理想情况下,您应该验证您返回的是 206 Partial Content 而不是 200,并且标题包含您想要的范围:

Ideally, you should verify that you get back a 206 Partial Content instead of a 200, and that the headers include the range you wanted:

Content-Range: bytes 12345-123456/123456
Content-Length: 111112

您可能还想预先验证服务器处理范围.您可以通过查看初始响应中的标头或执行 HEAD 来检查此内容:

You also may want to pre-validate that the server handles ranges. You can do this by looking at the headers in your initial response, or by doing a HEAD, which checks for this:

Accept-Ranges: bytes

如果标头完全丢失,或者有 none 作为值,或者有一个不包含 bytes 的值列表,则服务器不支持继续.

If the header is missing entirely, or has none as a value, or has a list of values that doesn't include bytes, the server doesn't support resuming.

也许还可以检查 Content-Length 以验证您在被中断之前没有完成整个文件.

And also maybe check the Content-Length to verify that you didn't already finish the whole file right before getting interrupted.

所以,代码看起来像这样:

So, the code would look something like this:

def fetch_or_resume(url, filename):
    with open(filename, 'ab') as f:
        headers = {}
        pos = f.tell()
        if pos:
            headers['Range'] = f'bytes={pos}-'
        response = requests.get(url, headers=headers, stream=True)
        if pos:
            validate_as_paranoid_as_you_want_to_be_(pos, response)
        total_size = int(response.headers.get('content-length'))  
        for data in tqdm(iterable = response.iter_content(chunk_size = 1024), total = total_size//1024, unit = 'KB'):
            file.write(data)

人们编写下载管理器类型软件的一个常见错误是试图跟踪在以前的请求中读取了多少.不要这样做,只是使用文件本身来告诉您您拥有多少.毕竟,如果您读取了 23456 个字节,但只将 12345 个刷新到文件中,那么 12345 就是您想要开始的地方.

One common bug from people writing download manager type software is trying to keep track of how much has been read in previous requests. Don't do that's just use the file itself to tell you how much you have. After all, if you read 23456 bytes but only flushed 12345 to the file, that 12345 is where you want to start.

这篇关于如何在 Python 3.5 中恢复文件下载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆