Python使用HTTP在远程文件上搜索 [英] Python seek on remote file using HTTP
问题描述
如何在远程(HTTP)文件上寻找特定位置,以便我只能下载该部分?
How do I seek to a particular position on a remote (HTTP) file so I can download only that part?
让我们说远程文件上的字节数是:1234567890
Lets say the bytes on a remote file were: 1234567890
我想寻求4并从那里下载3个字节,所以我会:456
I wanna seek to 4 and download 3 bytes from there so I would have: 456
和另外,如何检查远程文件是否存在?
我试过,os.path.isfile()但是当我传递一个远程文件url时它返回False。
and also, how do I check if a remote file exists? I tried, os.path.isfile() but it returns False when I'm passing a remote file url.
推荐答案
如果您通过HTTP下载远程文件,则需要设置范围
标题。
If you are downloading the remote file through HTTP, you need to set the Range
header.
检查在此示例中如何完成。看起来像这样:
Check in this example how it can be done. Looks like this:
myUrlclass.addheader("Range","bytes=%s-" % (existSize))
编辑:我刚刚找到了更好的实施方案。这个类使用起来非常简单,因为它可以在docstring中看到。
EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.
class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.
This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the
"206 Partial Content" reponse from the HTTP server is what we
expected.
Example:
import urllib2
import byterange
range_handler = range.HTTPRangeHandler()
opener = urllib2.build_opener(range_handler)
# install it
urllib2.install_opener(opener)
# create Request and set Range header
req = urllib2.Request('http://www.python.org/')
req.header['Range'] = 'bytes=30-50'
f = urllib2.urlopen(req)
"""
def http_error_206(self, req, fp, code, msg, hdrs):
# 206 Partial Content Response
r = urllib.addinfourl(fp, hdrs, req.get_full_url())
r.code = code
r.msg = msg
return r
def http_error_416(self, req, fp, code, msg, hdrs):
# HTTP's Range Not Satisfiable error
raise RangeError('Requested Range Not Satisfiable')
更新:更好的实施已移至 byterange.py中的nofollow noreferrer> github:excid3 / urlgrabber 文件。
Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.
这篇关于Python使用HTTP在远程文件上搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!