使用urllib2或任何其他http库读取超时 [英] Read timeout using either urllib2 or any other http library

查看:166
本文介绍了使用urllib2或任何其他http库读取超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有读取这样的网址的代码:

I have code for reading an url like this:

from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
    req.add_header(key, val)
res = urlopen(req, timeout = timeout)
# This line blocks
content = res.read()

超时适用于urlopen()呼叫。但是然后代码进入res.read()调用,我想读取响应数据,并且不会在那里应用超时。因此,读取调用可能几乎永远挂起,等待来自服务器的数据。我发现的唯一解决方案是使用一个信号来中断read(),因为我正在使用线程,所以不适合我。

The timeout works for the urlopen() call. But then the code gets to the res.read() call where I want to read the response data and the timeout isn't applied there. So the read call may hang almost forever waiting for data from the server. The only solution I've found is to use a signal to interrupt the read() which is not suitable for me since I'm using threads.

还有哪些其他选项那里?是否有用于处理读取超时的Python的HTTP库?我看过httplib2和请求,他们似乎遇到了与上面相同的问题。我不想使用套接字模块编写自己的非阻塞网络代码,因为我认为应该已经有了这个库。

What other options are there? Is there a HTTP library for Python that handles read timeouts? I've looked at httplib2 and requests and they seem to suffer the same issue as above. I don't want to write my own nonblocking network code using the socket module because I think there should already be a library for this.

更新:以下所有解决方案都不适用于我。下载大文件时,您可以自己查看设置套接字或urlopen超时:

Update: None of the solutions below are doing it for me. You can see for yourself that setting the socket or urlopen timeout has no effect when downloading a large file:

from urllib2 import urlopen
url = 'http://iso.linuxquestions.org/download/388/7163/http/se.releases.ubuntu.com/ubuntu-12.04.3-desktop-i386.iso'
c = urlopen(url)
c.read()

至少在使用Python 2.7.3的Windows上,超时被完全忽略。

At least on Windows with Python 2.7.3, the timeouts are being completely ignored.

推荐答案

如果没有使用某种异步计时器,任何库都不可能这样做线程或其他。原因是 httplib 中使用的 timeout 参数, urllib2 和其他库在底层套接字上设置超时。实际操作的内容在文档中进行了解释。

It's not possible for any library to do this without using some kind of asynchronous timer through threads or otherwise. The reason is that the timeout parameter used in httplib, urllib2 and other libraries sets the timeout on the underlying socket. And what this actually does is explained in the documentation.


SO_RCVTIMEO

SO_RCVTIMEO

设置超时值,指定输入函数等待的最长时间它完成了。它接受一个timeval结构,其中包含秒数和微秒数,指定等待输入操作完成的时间限制。如果接收操作在很长时间内被阻止而没有接收到其他数据,则如果没有收到数据,它将返回部分计数或errno设置为[EAGAIN]或[EWOULDBLOCK]。

Sets the timeout value that specifies the maximum amount of time an input function waits until it completes. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data is received.

粗体部分是关键。如果在 timeout 窗口的持续时间内没有收到单个字节,则仅引发 socket.timeout 。换句话说,这是接收字节之间的超时

The bolded part is key. A socket.timeout is only raised if not a single byte has been received for the duration of the timeout window. In other words, this is a timeout between received bytes.

使用的简单函数threading.Timer 可以如下。

import httplib
import socket
import threading

def download(host, path, timeout = 10):
    content = None

    http = httplib.HTTPConnection(host)
    http.request('GET', path)
    response = http.getresponse()

    timer = threading.Timer(timeout, http.sock.shutdown, [socket.SHUT_RD])
    timer.start()

    try:
        content = response.read()
    except httplib.IncompleteRead:
        pass

    timer.cancel() # cancel on triggered Timer is safe
    http.close()

    return content

>>> host = 'releases.ubuntu.com'
>>> content = download(host, '/15.04/ubuntu-15.04-desktop-amd64.iso', 1)
>>> print content is None
True
>>> content = download(host, '/15.04/MD5SUMS', 1)
>>> print content is None
False

除了检查,也可以捕获函数内部的 httplib.IncompleteRead 异常,但不在其中。如果HTTP请求没有 Content-Length 标题,则后一种情况不起作用。

Other than checking for None, it's also possible to catch the httplib.IncompleteRead exception not inside the function, but outside of it. The latter case will not work though if the HTTP request doesn't have a Content-Length header.

这篇关于使用urllib2或任何其他http库读取超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆