Python使用HTTP在远程文件上搜索 [英] Python seek on remote file using HTTP

查看:425
本文介绍了Python使用HTTP在远程文件上搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在远程(HTTP)文件上寻找特定位置,以便我只能下载该部分?

How do I seek to a particular position on a remote (HTTP) file so I can download only that part?

让我们说远程文件上的字节数是:1234567890

Lets say the bytes on a remote file were: 1234567890

我想寻求4并从那里下载3个字节,所以我会:456

I wanna seek to 4 and download 3 bytes from there so I would have: 456

和另外,如何检查远程文件是否存在?
我试过,os.path.isfile()但是当我传递一个远程文件url时它返回False。

and also, how do I check if a remote file exists? I tried, os.path.isfile() but it returns False when I'm passing a remote file url.

推荐答案

如果您通过HTTP下载远程文件,则需要设置范围标题。

If you are downloading the remote file through HTTP, you need to set the Range header.

检查在此示例中如何完成。看起来像这样:

Check in this example how it can be done. Looks like this:

myUrlclass.addheader("Range","bytes=%s-" % (existSize))

编辑我刚刚找到了更好的实施方案。这个类使用起来非常简单,因为它可以在docstring中看到。

EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.

class HTTPRangeHandler(urllib2.BaseHandler):
"""Handler that enables HTTP Range headers.

This was extremely simple. The Range header is a HTTP feature to
begin with so all this class does is tell urllib2 that the 
"206 Partial Content" reponse from the HTTP server is what we 
expected.

Example:
    import urllib2
    import byterange

    range_handler = range.HTTPRangeHandler()
    opener = urllib2.build_opener(range_handler)

    # install it
    urllib2.install_opener(opener)

    # create Request and set Range header
    req = urllib2.Request('http://www.python.org/')
    req.header['Range'] = 'bytes=30-50'
    f = urllib2.urlopen(req)
"""

def http_error_206(self, req, fp, code, msg, hdrs):
    # 206 Partial Content Response
    r = urllib.addinfourl(fp, hdrs, req.get_full_url())
    r.code = code
    r.msg = msg
    return r

def http_error_416(self, req, fp, code, msg, hdrs):
    # HTTP's Range Not Satisfiable error
    raise RangeError('Requested Range Not Satisfiable')

更新:更好的实施已移至 byterange.py中的nofollow noreferrer> github:excid3 / urlgrabber 文件。

Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.

这篇关于Python使用HTTP在远程文件上搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆