如何告诉 CasperJS 循环遍历一系列页面 [英] How to tell CasperJS to loop through a series of pages

查看:23
本文介绍了如何告诉 CasperJS 循环遍历一系列页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试让 CasperJS 实现以下目标:

I try to make CasperJS achieve the following:

  • 浏览一系列按日期顺序命名的页面.
  • 在每个页面上,找到一个 PDF 链接.
  • 下载 PDF.

我得到了一些工作代码,但我不明白 CasperJS 如何处理事件序列.

I got some working code, but I don't understand how CasperJS is going through the sequence of events.

例如,在下面的代码示例中,CasperJS 尝试处理第 2 步,并抛出ReferenceError: Can't find variable: formDate",而第 1 步由于某种原因根本没有执行.

For instance, in the code sample below, CasperJS tries to process step 2, and throws a "ReferenceError: Can't find variable: formDate", while step 1 isn't executed at all for some reason.

我的推理有什么问题?

在我看来,while 循环的执行速度与 casper.then 方法不同.

It seems to me that the while loop is executed at a different speed than the casper.then methods.

casper.start();

casper.thenOpen('http://www.example.com', function() {
    this.echo(this.getTitle());
});

casper.then(function() {

    var start = new Date('2013-01-01T00:00:00');
    var end = new Date('2013-01-31T00:00:00');

    while(start < end) {

          // step 1: define formDate  
          casper.then(function() {
            var formDate = start.getFullYear()+"-"+("0" + (start.getMonth() + 1)).slice(-2) +"-"+("0" + start.getDate()).slice(-2) ;
            casper.echo(formDate);

          });

          // Step 2: open the page and download the file
          casper.thenOpen('http://www.example.com/' + formDate, function() {

                        var url = this.getElementAttribute('div#pdffulllink a.pdf', 'href');
                        this.echo(url);
                        this.download(url, 'Downloaded_' + formDate + '.pdf');

          });

          casper.then(function() {
          // Step 3: redefine start
            var newDate = start.setDate(start.getDate() + 1);
            start = new Date(newDate);

          });

    }

});


casper.run(function() {
    this.echo('Done.').exit();
});

推荐答案

经过一番研究,我找到了解决这个问题的方法.

After some research, I found a solution to this problem.

该问题是由于 casper.thenOpen 是一个异步进程,而 javascript 的其余部分是同步.

The issue is caused by casper.thenOpen being an asynchronous process, and the rest of the javascript being synchronous.

我应用了在该线程中找到的一种优雅方法(javascript for 循环中的异步进程).

I applied an elegant method found in this thread (Asynchronous Process inside a javascript for loop).

按照该方法,这是一个适用于 CasperJS 的示例:

Following that method, here is an example that works with CasperJS:

var casper = require('casper').create({
    pageSettings: {
        webSecurityEnabled: false
    }
});

casper.start();

casper.then(function() {
    var current = 1;
    var end = 4;

    for (;current < end;) {

      (function(cntr) {
        casper.thenOpen('http://example.com/page-' + cntr +'.html', function() {
              this.echo('casper.async: '+cntr);
              // here we can download stuff
        });
      })(current);

      current++;

    }

});

casper.run(function() {
    this.echo('Done.').exit();
});

此示例将输出以下内容:

This example will output the following:

casper.async: 1
casper.async: 2
casper.async: 3
Done.

循环正常运行!:)

这篇关于如何告诉 CasperJS 循环遍历一系列页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆