Python grequests 需要很长时间才能完成 [英] Python grequests takes a long time to finish
问题描述
我正在尝试取消我在 urlSet 中的很多 URL.以下代码大部分时间都有效.但有时需要很长时间才能完成.例如,我在 urlSet 中有 2950.stderr 告诉我 2900 已完成,但 getUrlMapping 未完成.
I am trying to unshort a lot of URLs which I have in a urlSet. The following code works most of the time. But some times it takes a very long time to finish. For example I have 2950 in urlSet. stderr tells me that 2900 is done, but getUrlMapping does not finish.
def getUrlMapping(urlSet):
# get the url mapping
urlMapping = {}
#rs = (grequests.get(u) for u in urlSet)
rs = (grequests.head(u) for u in urlSet)
res = grequests.imap(rs, size = 100)
counter = 0
for x in res:
counter += 1
if counter % 50 == 0:
sys.stderr.write('Doing %d url_mapping length %d
' %(counter, len(urlMapping)))
urlMapping[ getOriginalUrl(x) ] = getGoalUrl(x)
return urlMapping
def getGoalUrl(resp):
url=''
try:
url = resp.url
except:
url = 'NULL'
return url
def getOriginalUrl(resp):
url=''
try:
url = resp.history[0].url
except IndexError:
url = resp.url
except:
url = 'NULL'
return url
推荐答案
可能它不会帮助你,因为它已经过去了很长时间但仍然......
Probably it won't help you as it has passed a long time but still..
我在请求方面遇到了一些问题,与您遇到的问题类似.对我来说,问题是请求需要很长时间才能下载一些页面,但使用任何其他软件(浏览器、curl、wget、python 的 urllib)一切正常......
I was having some issues with Requests, similar to the ones you are having. To me the problem was that Requests took ages to download some pages, but using any other software (browsers, curl, wget, python's urllib) everything worked fine...
在浪费了很多时间之后,我注意到服务器正在发送一些无效的标头,例如,在其中一个慢"页面中,在 Content-type: text/html
之后它开始了以 Header-name : header-value
形式发送标题(注意冒号前的空格).这以某种方式破坏了 Python 的 email.header
功能,用于通过请求解析 HTTP 标头,因此 Transfer-encoding: chunked
标头没有被解析.
Afer a LOT of time wasted, I noticed that the server was sending some invalid headers, for example, in one of the "slow" pages, after Content-type: text/html
it began to send header in the form Header-name : header-value
(notice the space before the colon). This somehow breaks Python's email.header
functionality used to parse HTTP headers by Requests so the Transfer-encoding: chunked
header wasn't being parsed.
长话短说:在请求内容之前手动将 chunked
属性设置为 Response 对象的 True
解决了问题.例如:
Long story short: manually setting the chunked
property to True
of Response objects before asking for the content solved the issue. For example:
response = requests.get('http://my-slow-url')
print(response.text)
花了很长时间但是
response = requests.get('http://my-slow-url')
response.raw.chunked = True
print(response.text)
效果很好!
这篇关于Python grequests 需要很长时间才能完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!