具有超时,最大大小和连接池的http请求 [英] http request with timeout, maximum size and connection pooling

查看:360
本文介绍了具有超时,最大大小和连接池的http请求的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找Python(2.7)中的一种方法来执行具有3个要求的HTTP请求:

I'm looking for a way in Python (2.7) to do HTTP requests with 3 requirements:


  • 超时(可靠性)

  • 内容最大尺寸(安全性)

  • 连接池(性能)

我已经检查了所有python HTTP库,但没有一个符合我的要求。例如:

I've checked quite every python HTTP librairies, but none of them meet my requirements. For instance:

urllib2:好,但没有合并

import urllib2
import json

r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100+1)
if len(content) > 100: 
    print 'too large'
    r.close()
else:
    print json.loads(content)

r = urllib2.urlopen('https://github.com/timeline.json', timeout=5)
content = r.read(100000+1)
if len(content) > 100000: 
    print 'too large'
    r.close()
else:
    print json.loads(content)

请求:无最大尺寸

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)
r.headers['content-length'] # does not exists for this request, and not safe
content = r.raw.read(100000+1)
print content # ARF this is gzipped, so not the real size
print json.loads(content) # content is gzipped so pretty useless
print r.json() # Does not work anymore since raw.read was used

urllib3:即使使用50Mo文件,也无法使用读取方法...

httplib:httplib.HTTPConnection不是一个池(只有一个连接)

我很难相信urllib2是最好的HTTP库我可以使用 !所以如果有人知道librairy可以做什么,或者如何使用之前的librairy ...

I can hardly belive that urllib2 is the best HTTP library I can use ! So if anyone knows what librairy can do this or how to use one of the previous librairy ...

编辑:

我发现感谢Martijn Pieters的最佳解决方案(StringIO即使对于大文件也不会减速,其中str添加很多)。

The best solution I found thanks to Martijn Pieters (StringIO does not slow down even for huge files, where str addition does a lot).

r = requests.get('https://github.com/timeline.json', stream=True)
size = 0
ctt = StringIO()


for chunk in r.iter_content(2048):
    size += len(chunk)
    ctt.write(chunk)
    if size > maxsize:
        r.close()
        raise ValueError('Response too large')

content = ctt.getvalue()


推荐答案

你可以用的请求来做到这一点;但你需要知道 raw 对象是 urllib3 guts的一部分,并使用额外的参数< a href =http://urllib3.readthedocs.org/en/latest/helpers.html#urllib3.response.HTTPResponse.read =noreferrer> HTTPResponse.read()调用支持,允许您指定要读取已解码的数据:

You can do it with requests just fine; but you need to know that the raw object is part of the urllib3 guts and make use of the extra arguments the HTTPResponse.read() call supports, which lets you specify you want to read decoded data:

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

content = r.raw.read(100000+1, decode_content=True)
if len(content) > 100000:
    raise ValueError('Too large a response')
print content
print json.loads(content)

或者,您可以在原始对象上设置 decode_content 标志在阅读之前:

Alternatively, you can set the decode_content flag on the raw object before reading:

import requests
r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

r.raw.decode_content = True
content = r.raw.read(100000+1)
if len(content) > 100000:
    raise ValueError('Too large a response')
print content
print json.loads(content)

如果您不想进入 urllib3 这样的胆量,请使用 response.iter_content() 要迭代以块为单位的解码内容;这也使用底层的 HTTPResponse (使用 .stream()生成器版本

If you don't like reaching into urllib3 guts like that, use the response.iter_content() to iterate over the decoded content in chunks; this uses the underlying HTTPResponse too (using the .stream() generator version:

import requests

r = requests.get('https://github.com/timeline.json', timeout=5, stream=True)

maxsize = 100000
content = ''
for chunk in r.iter_content(2048):
    content += chunk
    if len(content) > maxsize:
        r.close()
        raise ValueError('Response too large')

print content
print json.loads(content)

此处处理压缩数据大小的方法有细微差别; r.raw.read (100000 + 1)将只读取100k字节的压缩数据;未压缩的数据将根据您的最大大小进行测试。 iter_content()方法将阅读更多解压缩ed数据在极少数情况下,压缩流比未压缩数据更大

There is of subtle difference here in how compressed data sizes are handled here; r.raw.read(100000+1) will only ever read 100k bytes of compressed data; the uncompressed data is tested against your max size. The iter_content() method will read more uncompressed data in the rare case the compressed stream is larger than the uncompressed data.

两种方法都不允许 r.json()上班; response._content 属性不是由这些设置的;你当然可以手动完成。但由于 .raw.read() .iter_content()调用已经允许您访问相关内容,确实没有必要。

Neither method allows r.json() to work; the response._content attribute isn't set by these; you can do so manually of course. But since the .raw.read() and .iter_content() calls already give you access to the content in question, there is really no need.

这篇关于具有超时,最大大小和连接池的http请求的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆