使用 urllib2 或任何其他 http 库读取超时 [英] Read timeout using either urllib2 or any other http library

查看:25
本文介绍了使用 urllib2 或任何其他 http 库读取超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有读取这样的网址的代码:

I have code for reading an url like this:

from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
    req.add_header(key, val)
res = urlopen(req, timeout = timeout)
# This line blocks
content = res.read()

超时适用于 urlopen() 调用.但随后代码进入 res.read() 调用,我想在其中读取响应数据并且超时未在那里应用.因此,读取调用可能几乎永远挂起,等待来自服务器的数据.我发现的唯一解决方案是使用信号中断 read() ,因为我使用的是线程,所以这不适合我.

The timeout works for the urlopen() call. But then the code gets to the res.read() call where I want to read the response data and the timeout isn't applied there. So the read call may hang almost forever waiting for data from the server. The only solution I've found is to use a signal to interrupt the read() which is not suitable for me since I'm using threads.

还有哪些选择?是否有用于处理读取超时的 Python 的 HTTP 库?我查看了 httplib2 和 requests,它们似乎遇到了与上述相同的问题.我不想使用 socket 模块编写自己的非阻塞网络代码,因为我认为应该已经有一个用于此的库.

What other options are there? Is there a HTTP library for Python that handles read timeouts? I've looked at httplib2 and requests and they seem to suffer the same issue as above. I don't want to write my own nonblocking network code using the socket module because I think there should already be a library for this.

更新:以下解决方案均不适合我.下载大文件的时候可以自己看看设置socket或者urlopen超时没有作用:

Update: None of the solutions below are doing it for me. You can see for yourself that setting the socket or urlopen timeout has no effect when downloading a large file:

from urllib2 import urlopen
url = 'http://iso.linuxquestions.org/download/388/7163/http/se.releases.ubuntu.com/ubuntu-12.04.3-desktop-i386.iso'
c = urlopen(url)
c.read()

至少在使用 Python 2.7.3 的 Windows 上,超时被完全忽略.

At least on Windows with Python 2.7.3, the timeouts are being completely ignored.

推荐答案

如果不通过线程或其他方式使用某种异步定时器,任何库都不可能做到这一点.原因是httpliburllib2等库中使用的timeout参数在底层timeout上设置了timeout代码>套接字.这实际上做了什么在文档中有解释.

It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout parameter used in httplib, urllib2 and other libraries sets the timeout on the underlying socket. And what this actually does is explained in the documentation.

SO_RCVTIMEO

SO_RCVTIMEO

设置超时值,该值指定输入函数在完成之前等待的最长时间.它接受带有秒数和微秒数的 timeval 结构,指定等待输入操作完成的时间限制.如果接收操作阻塞了这么长时间没有接收额外的数据,它应该返回一个部分计数或 errno 设置为 [EAGAIN] 或 [EWOULDBLOCK] 如果没有收到数据.

Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.

粗体部分是关键.socket.timeout 仅在 timeout 窗口期间未收到任何字节时才会引发.换句话说,这是接收字节之间的 timeout.

The bolded part is key. A socket.timeout is only raised if not a single byte has been received for the duration of the timeout window. In other words, this is a timeout between received bytes.

一个使用 threading.Timer 的简单函数如下.

A simple function using threading.Timer could be as follows.

import httplib
import socket
import threading

def download(host, path, timeout = 10):
    content = None
    
    http = httplib.HTTPConnection(host)
    http.request('GET', path)
    response = http.getresponse()
    
    timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])
    timer.start()
    
    try:
        content = response.read()
    except httplib.IncompleteRead:
        pass
        
    timer.cancel() # cancel on triggered Timer is safe
    http.close()
    
    return content

>>> host = 'releases.ubuntu.com'
>>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)
>>> print content is None
True
>>> content = download(host, '/15.04/MD5SUMS', 1)
>>> print content is None
False

除了检查 None 之外,还可以在函数之外捕获 httplib.IncompleteRead 异常.如果 HTTP 请求没有 Content-Length 标头,后一种情况将不起作用.

Other than checking for None, it's also possible to catch the httplib.IncompleteRead exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length header.

这篇关于使用 urllib2 或任何其他 http 库读取超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆