使用grequest向sourceforge发出数千个获取请求,获取"URL超过最大重试次数". [英] Using grequests to make several thousand get requests to sourceforge, get "Max retries exceeded with url"

查看:187
本文介绍了使用grequest向sourceforge发出数千个获取请求,获取"URL超过最大重试次数".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对所有这一切都很陌生;我需要为我正在写的论文获取数千个Sourceforge项目的数据.可以在url http://sourceforge.net/api/project/name/[项目名称]/json上以json格式免费获得所有数据.我有数千个这些URL的列表,并且正在使用以下代码.

I am very new to all of this; I need to obtain data on several thousand sourceforge projects for a paper I am writing. The data is all freely available in json format at the url http://sourceforge.net/api/project/name/[project name]/json. I have a list of several thousand of these URL's and I am using the following code.

import grequests
rs = (grequests.get(u) for u in ulist)
answers = grequests.map(rs)

使用此代码,我可以获取我喜欢的任何200个左右项目的数据,即rs = (grequests.get(u) for u in ulist[0:199])可以工作,但是一旦我查看了所有尝试,便可以满足所有要求

Using this code I am able to obtain the data for any 200 or so projects I like, i.e. rs = (grequests.get(u) for u in ulist[0:199]) works, but as soon as I go over that, all attempts are met with

ConnectionError: HTTPConnectionPool(host='sourceforge.net', port=80): Max retries exceeded with url: /api/project/name/p2p-fs/json (Caused by <class 'socket.gaierror'>: [Errno 8] nodename nor servname provided, or not known)
<Greenlet at 0x109b790f0: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x10999ef50>>(stream=False)> failed with ConnectionError

在退出python之前,我无法再发出任何请求,但是一旦我重新启动python,就可以再发出200个请求.

I am then unable to make any more requests until I quit python, but as soon as I restart python I can make another 200 requests.

我尝试使用grequests.map(rs,size=200),但这似乎无济于事.

I've tried using grequests.map(rs,size=200) but this seems to do nothing.

推荐答案

就我而言,这不是目标服务器的速率限制,而是简单得多:我没有明确关闭响应,因此它们保留了响应.套接字打开,并且python进程用尽了文件句柄.

In my case, it was not rate limiting by the destination server, but something much simpler: I didn't explicitly close the responses, so they were keeping the socket open, and the python process ran out of file handles.

我的解决方案(不确定是哪个问题解决了-理论上应该解决)应该是:

My solution (don't know for sure which one fixed the issue - theoretically either of them should) was to:

  • grequests.get中设置stream=False:

 rs = (grequests.get(u, stream=False) for u in urls)

  • 在阅读response.content后明确调用response.close():

     responses = grequests.map(rs)
     for response in responses:
           make_use_of(response.content)
           response.close()
    

  • 注意:仅仅破坏response对象(将None分配给它,调用gc.collect())是不够的-这并没有关闭文件句柄.

    Note: simply destroying the response object (assigning None to it, calling gc.collect()) was not enough - this did not close the file handles.

    这篇关于使用grequest向sourceforge发出数千个获取请求,获取"URL超过最大重试次数".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆