cURL多挂/忽略超时 [英] cURL multi hanging/ignoring timeout

查看:138
本文介绍了cURL多挂/忽略超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用滚动cURL多实现(如此SO帖)基于此cURL代码)。它工作正常处理成千上万的URL使用多达100个请求在同一时间,5个脚本作为守护程序运行的实例(是啊,我知道,这应该用C或某事写)。

I'm using a 'rolling' cURL multi implementation (like this SO post, based on this cURL code). It works fine to process thousands of URLs using up to 100 requests at the same time, with 5 instances of the script running as daemons (yeah, I know, this should be written in C or something).

这里的问题:处理〜200,000个url(跨越5个实例) curl_multi_exec()似乎打破了脚本的所有实例。我试过关闭脚本,然后重新启动,同样的事情发生(不是在20万个网址后,但在重新启动),脚本挂起调用 curl_multi_exec()

Here's the problem: after processing ~200,000 urls (across the 5 instances) curl_multi_exec() seems to break for all instances of the script. I've tried shutting down the scripts, then restarting, and the same thing happens (not after 200,000 urls, but right on restart), the script hangs calling curl_multi_exec().

我将脚本放入单一模式,处理一个正常的cURL句柄,并且工作正常速度我需要)。我的日志记录导致我怀疑它可能打了一堆缓慢/有问题的连接(因为经常看起来处理在URL然后挂起再次),但这将意味着我的 CURLOPT_TIMEOUT 正在为各个句柄被忽略。或者也许它只是通过cURL运行许多请求的东西。

I put the script into 'single' mode, processing one regular cURL handle at time, and that works fine (but it's not quite the speed I need). My logging leads me to suspect that it may have hit a patch of slow/problematic connections (since every so often it seems to process on URL then hang again), but that would mean my CURLOPT_TIMEOUT is being ignored for the individual handles. Or maybe it's just something with running that many requests through cURL.

任何人听说过这样的事情?

Anyone heard of anything like this?

示例代码):

//some logging shows it hangs right here, only looping a time or two
//so the hang seems to be in the curl call
while(($execrun = 
    curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);

//code to check for error or process whatever returned

CURLOPT_TIMEOUT 设置为 120 ,但在 curl_multi_exec()最后返回一些数据,它是在10分钟的等待后。

I have CURLOPT_TIMEOUT set to 120, but in the cases where curl_multi_exec() finally returns some data, it's after 10 minutes of waiting.

我有一堆测试/检查还没做,但也许这可能会响起一个钟

I have a bunch of testing/checking yet to do, but thought maybe this might ring a bell with someone.

推荐答案

经过多次测试,我相信我已经找到了导致这个特殊问题的原因。我不是说另一个答案是不正确的,只是在这种情况下不是我有的问题。

After much testing, I believe I've found what is causing this particular problem. I'm not saying the other answer is incorrect, just in this case not the issue I am having.

从我可以知道, curl_multi_exec()不会返回,直到所有DNS(失败或成功) 。如果有一堆有坏域的网址 curl_multi_exec()至少不返回:

From what I can tell, curl_multi_exec() does not return until all DNS (failure or success) is resolved. If there are a bunch of urls with bad domains curl_multi_exec() doesn't return for at least:

(time it takes to get resolve error) * (number of urls with bad domain)

这里是别人谁有发现了这个


只需要注意cURL多功能的异步特性:DNS查找不是我知道今天)异步。因此,如果您的组的一个DNS查找失败,之后的URL列表中的所有内容也会失败。我们实际上更新我们的hosts.conf(我想?)文件在我们的服务器每天为了解决这个问题。它获取的IP地址,而不是查找它们。

Just a note on the asynchronous nature of cURL’s multi functions: the DNS lookups are not (as far as I know today) asynchronous. So if one DNS lookup of your group fails, everything in the list of URLs after that fails also. We actually update our hosts.conf (I think?) file on our server daily in order to get around this. It gets the IP addresses there instead of looking them up. I believe it’s being worked on, but not sure if it’s changed in cURL yet.

此外,测试显示cURL(至少是我的版本)遵循 CURLOPT_CONNECTTIMEOUT 设置。当然,多周期的第一步可能需要很长时间,因为cURL等待每个url解析或超时。

Also, testing shows that cURL (at least my version) does follow the CURLOPT_CONNECTTIMEOUT setting. Of course the first step of a multi cycle may still take a long time, since cURL waits for every url to resolve or timeout.

这篇关于cURL多挂/忽略超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆