PhantomJS使用太多线程 [英] PhantomJS using too many threads

查看：92 发布时间：2019/6/6 1:27:34 javascript web-crawler phantomjs

本文介绍了PhantomJS使用太多线程的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了一个PhantomJS应用程序来抓取我构建的网站并检查要包含的JavaScript文件。 JavaScript类似于Google，其中一些内联代码加载到另一个JS文件中。该应用程序查找其他JS文件，这就是我使用Phantom的原因。

I wrote a PhantomJS app to crawl over a site I built and check for a JavaScript file to be included. The JavaScript is similar to Google where some inline code loads in another JS file. The app looks for that other JS file which is why I used Phantom.

预期结果是什么？

控制台输出应读取大量的URL，然后判断脚本是否已加载。

The console output should read through a ton of URLs and then tell if the script is loaded or not.

实际发生了什么？

控制台输出将按预期读取大约50个请求，然后才开始吐出此错误：

The console output will read as expected for about 50 requests and then just start spitting out this error:

2013-02-21T10:01:23 [FATAL] QEventDispatcherUNIXPrivate(): Can not continue without a thread pipe
QEventDispatcherUNIXPrivate(): Unable to create thread pipe: Too many open files

这是打开页面并搜索脚本的代码块包括：

This is the block of code that opens a page and searches for the script include:

page.open(url, function (status) {
    console.log(YELLOW, url, status, CLEAR);
    var found =  page.evaluate(function () {
      if (document.querySelectorAll("script[src='***']").length) {
        return true;
      } else { return false; }
    });

    if (found) {
      console.log(GREEN, 'JavaScript found on', url, CLEAR);
    } else {
      console.log(RED, 'JavaScript not found on', url, CLEAR);
    }
    self.crawledURLs[url] = true;
    self.crawlURLs(self.getAllLinks(page), depth-1);
  });

crawledURLs对象只是我已经抓取的网址对象。 crawlURLs函数只是遍历来自getAllLinks函数的链接，并在具有爬虫开始的域的基本域的所有链接上调用open函数。

The crawledURLs object is just an object of urls that I've already crawled. The crawlURLs function just goes through the links from the getAllLinks function and calls the open function on all links that have the base domain of the domain that the crawler started on.

编辑

我修改了代码的最后一个块，如下所示，但仍有相同的问题。我已将page.close（）添加到文件中。

I modified the last block of the code to be as follows, but still have the same issue. I have added page.close() to the file.

if (!found) {
  console.log(RED, 'JavaScript not found on', url, CLEAR);
}
self.crawledURLs[url] = true;
var links = self.getAllLinks(page);
page.close();
self.crawlURLs(links, depth-1);

PhantomJS使用太多线程 [英] PhantomJS using too many threads

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

PhantomJS使用太多线程 [英] PhantomJS using too many threads

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭