使用wget -c功能在Python中使用urllib下载文件 [英] Download file using urllib in Python with the wget -c feature
问题描述
有时下载停止与此消息:
检索不完整:只有3617232从10689634字节
如何使用 206部分内容$请求下载重新启动停止位置c $ c> HTTP功能?
我可以使用 wget -c
来做,它的效果很好,但是我想在我的Python软件中直接实现它。
任何想法?
谢谢
您可以通过发送带有范围
标头的GET来请求部分下载:
import urllib2
req = urllib2.Request('http://www.python.org/')
#
#这里我们要求下载18000--19000字节。
#范围是包含的,从0开始
#
req.headers ['Range'] ='bytes =%s-%s'%(18000,19000)
f = urllib2.urlopen(req)
#这将显示已下载的*实际*字节。
range = f.headers.get('Content-Range')
print(range)
#bytes 18000-18030 / 18031
print(repr(f.read() ))
#'< / div> \\\
< / body> \\\
< / html> \\\
\\\
\\\
\\\
\\\
\\\
\\\
'
请小心检查 Content-Range
了解实际下载哪些字节,因为您的范围可能超出范围,和/或不是所有服务器似乎都尊重 Range
头。
I am programming a software in Python to download HTTP PDF from a database. Sometimes the download stop with this message :
retrieval incomplete: got only 3617232 out of 10689634 bytes
How can I ask the download to restart where it stops using the 206 Partial Content
HTTP feature ?
I can do it using wget -c
and it works pretty well, but I would like to implement it directly in my Python software.
Any idea ?
Thank you
You can request a partial download by sending a GET with the Range
header:
import urllib2
req = urllib2.Request('http://www.python.org/')
#
# Here we request that bytes 18000--19000 be downloaded.
# The range is inclusive, and starts at 0.
#
req.headers['Range'] = 'bytes=%s-%s' % (18000, 19000)
f = urllib2.urlopen(req)
# This shows you the *actual* bytes that have been downloaded.
range=f.headers.get('Content-Range')
print(range)
# bytes 18000-18030/18031
print(repr(f.read()))
# ' </div>\n</body>\n</html>\n\n\n\n\n\n\n'
Be careful to check the Content-Range
to learn what bytes have actually been downloaded, since your range may be out of bounds, and/or not all servers seem to respect the Range
header.
这篇关于使用wget -c功能在Python中使用urllib下载文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!