能不能通过阵列项PhantomJS发挥作用 [英] Can't pass array items to function in PhantomJS

查看:131
本文介绍了能不能通过阵列项PhantomJS发挥作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拉源$ C ​​$ C到几个网页一次。链接被送入经由源的文本文件的阵列。我能够通过数组进行迭代并打印出的联系和确认它们的存在,而是试图通过函数来​​传递他们的时候,他们成为第一个迭代后不确定。

I am trying to pull the source code to several webpages at once. The links are fed into the array via a source text file. I am able to iterate through the array and print out the links and confirm they are there, but when trying to pass them through a function, they become undefined after the first iteration.

我的最终目标是有它的每个页面的源代码保存到它自己的文档。它正确地做的第一页,但随后的尝试是不确定的。我搜索了几个小时,但会AP preciate,如果有人可以点我在正确的方向。

My ultimate goal is to have it save the source of each page to its own document. It does the first page correctly, but subsequent attempts are undefined. I've searched for hours but would appreciate it if someone could point me in the right direction.

var fs = require('fs');
var pageContent = fs.read('input.txt');
var arrdata = pageContent.split(/[\n]/);
var system = require('system');
var page = require('webpage').create();
var args = system.args;
var imagelink;
var content = " ";

function handle_page(file, imagelink){
    page.open(file,function(){
        var js = page.evaluate(function (){
            return document;
        });
        fs.write(imagelink, page.content, 'w');
        setTimeout(next_page(),500);
    });
}
function next_page(imagelink){
    var file = imagelink;
    if(!file){phantom.exit(0);}
    handle_page(file, imagelink);
}

for(var i in arrdata){
    next_page(arrdata[i]);
}

我现在具有的循环只会重复一次,然后其他两个函数使自己的认识循环,这样才有意义,但仍然有问题,得到它的运行。

I realize now that having that the for loop will only iterate once, then the other two functions make their own loop, so that makes sense, but still having issues getting it running.

推荐答案

PhantomJS的 page.open()是异步的(这就是为什么有回调)。另一件事是, page.open()是一个长期的操作。如果两个这样的电话是由第二将覆盖第一个,因为你在同一个对象上运行。

PhantomJS's page.open() is asynchronous (that's why there is a callback). The other thing is that page.open() is a long operation. If two such calls are made the second will overwrite the first one, because you're operating on the same page object.

最好的办法是使用递归:

The best way would be to use recursion:

function handle_page(i){
    if (arrdata.length === i) {
        phantom.exit();
        return;
    }
    var imageLink = arrdata[i];
    page.open(imageLink, function(){
        fs.write("file_"+i+".html", page.content, 'w');
        handle_page(i+1);
    });
}
handle_page(0);

其他的事情夫妇:

Couple of other things:


  • 的setTimeout(next_page(),500); 立即调用 next_page()无需等待。你想的setTimeout(next_page,500); ,但它也不会工作,因为没有一个参数 next_page 简单地退出。

  • fs.write(ImageLink的,page.content,'W') ImageLink的可能是一个网址这种情况下,你可能要定义另一种方式来制定一个文件名。

  • 为(VAR我在arrdata){next_page(arrdata [I]); } 作品在这里请注意,这不是在所有的数组和类似数组的对象。使用哑像循环为(VAR I = 0; I<长度;我+ +) array.forEach(功能(项指数){ ...})(如果可用)。

  • page.evaluate()是沙箱,并提供访问DOM,但一切不是JSON序列化不能被传递出来。你将不得不把那个成序列化格式传递出来的评价()之前。

  • setTimeout(next_page(),500); immediately invokes next_page() without waiting. You wanted setTimeout(next_page, 500);, but then it also wouldn't work, because without an argument next_page simply exits.
  • fs.write(imagelink, page.content, 'w') that imagelink is probably a URL in which case, you probably want to define another way to devise a filename.
  • While for(var i in arrdata){ next_page(arrdata[i]); } works here be aware that this doesn't work on all arrays and array-like objects. Use dumb for loops like for(var i = 0; i < length; i++) or array.forEach(function(item, index){...}) if it is available.
  • page.evaluate() is sandboxed and provides access to the DOM, but everything that is not JSON serializable cannot be passed out of it. You will have to put that into a serializable format before passing it out of evaluate().

这篇关于能不能通过阵列项PhantomJS发挥作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆