PHP - 是否有一种安全的方法来执行深度递归? [英] PHP - Is the there a safe way to perform deep recursion?

查看:38
本文介绍了PHP - 是否有一种安全的方法来执行深度递归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我说的是执行大约 5 分钟以上的深度递归,这可能是您让爬虫执行的操作.为了提取页面的url链接和子url链接

Im talking about performing a deep recursion for around 5+ mins, something that you may have a crawler perform. in order to extract url links and and sub-url links of pages

PHP 中的深度递归似乎不太现实

it seems that deep recursion in PHP does not seem realistic

例如

getInfo("www.example.com");

function getInfo($link){
   $content = file_get_content($link)

   if($con = $content->find('.subCategories',0)){
      echo "go deeper<br>";
      getInfo($con->find('a',0)->href);
   }

   else{
      echo "reached deepest<br>";
   }
}

推荐答案

用递归来做这样的事情实际上在任何语言中都是一个坏主意.您无法知道该爬虫会走多远,因此可能会导致堆栈溢出.如果没有,它仍然会为巨大的堆栈浪费大量内存,因为 PHP 没有尾调用(除非必要,否则不会保留任何堆栈信息).

Doing something like this with recursion is actually a bad idea in any language. You cannot know how deep that crawler will go so it might lead to a Stack Overflow. And if not it still wastes a bunch of memory for the huge stack since PHP has no tail-calls (not keeping any stack information unless necessary).

将找到的 URL 推送到爬行"队列中,该队列会反复检查:

Push the found URLs into a "to crawl" queue which is checked iteratively:

$queue = array('www.example.com');
$done = array();
while($queue) {
    $link = array_shift($queue);
    $done[] = $link;
    $content = file_get_contents($link);
    if($con = $content->find('.subCategories', 0)) {
        $sublink = $con->find('a', 0)->href;
        if(!in_array($sublink, $done) && !in_array($sublink, $queue)) {
            $queue[] = $sublink;
        }
    }
}

这篇关于PHP - 是否有一种安全的方法来执行深度递归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆