CasperJS:遍历URL [英] CasperJS: Iterating through URL's

查看:49
本文介绍了CasperJS:遍历URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对CasperJS还是陌生的,但是没有办法在for循环中打开URL并执行CasperJS命令吗?例如,此代码无法正常运行:

I'm pretty new to CasperJS, but isn't there a way to open a URL and execute CasperJS commands in for loops? For example, this code doesn't work as I expected it to:

casper.then(function() {
    var counter = 2013;
    for (i = counter; i < 2014; i++) {
        var file_name = "./Draws/wimbledon_draw_" + counter + ".json";
        // getting some local json files
        var json = require(file_name);
        var first_round = json["1"];
        for (var key in first_round) {
            var name = first_round[key].player_1.replace(/\s+/g, '-');
            var normal_url = "http://www.atpworldtour.com/Tennis/Players/" + name;
            // the casper command below only executes AFTER the for loop is done
            casper.thenOpen(normal_url, function() {
                this.echo(normal_url);
            });
        }
    }
});

而不是Casper分别调用 thenOpen 每次迭代使用新的URL,则仅在执行for循环之后才调用它。然后,Casper thenOpen 被调用,且最后一个值normal_url设置为。

Instead of Casper is calling thenOpen on each new URL per iteration, it gets only called AFTER the for loop executes. Casper thenOpen then gets called with the last value normal_url is set to. Is there no Casper command to have it work each iteration within the for loop?

是否没有Casper命令让它在for循环中的每个迭代中都能正常工作?后续:如何使casper thenOpen返回当前迭代中的值for循环?

例如,我需要在 thenOpen 上返回一个值也许如果HTTP状态为404,则我需要评估另一个URL,因此我想返回false)。

Say for example, I needed a return value on that thenOpen (maybe if the HTTP status is 404 I need to evaluate another URL so I want to return false). Is this possible to do?

编辑 casper.thenOpen 上面的调用:

    var status;
    // thenOpen() only executes after the console.log statement directly below
    casper.thenOpen(normal_url, function() {
        status = this.status(false)['currentHTTPStatus'];
        if (status == 200) {
            return true;
        } else {
            return false;
        }
    });
    console.log(status); // This prints UNDEFINED the same number of times as iterations.


推荐答案

法郎达伦·库克说明,您可以使用IIFE在 thenOpen 步骤中修复url值。

As Fanch and Darren Cook stated, you could use an IIFE to fix the url value inside of the thenOpen step.

将使用 getCurrentUrl 检查网址。因此,更改行

An alternative would be to use getCurrentUrl to check the url. So change the line

this.echo(normal_url);

this.echo(this.getCurrentUrl());

问题是 normal_url 引用了最后一个设置的值,而不是当前值,因为稍后执行。使用 casper.thenOpen(normal_url,function(){...}); 不会发生这种情况,因为当前引用已传递给该函数。您只是看到了错误的URL,但是实际上打开了正确的URL。

The problem is that normal_url references the last value that was set but not the current value because it is executed later. This does not happen with casper.thenOpen(normal_url, function(){...});, because the current reference is passed to the function. You just see the wrong url, but the correct url is actually opened.

关于您更新的问题:

所有然后* 等待* casperjs API中的函数是步骤函数。您传递给它们的函数将在以后安排和执行(由 casper.run()触发)。您不应该在步骤之外使用变量。只需在 thenOpen 调用中添加其他步骤即可。他们将以正确的顺序安排。同样,您也不能从 thenOpen 返回任何内容。

All then* and wait* functions in the casperjs API are step functions. The function that you pass into them will be scheduled and executed later (triggered by casper.run()). You shouldn't use variables outside of steps. Just add further steps inside of the thenOpen call. They will be scheduled in the correct order. Also you cannot return anything from thenOpen.

var somethingDone = false;
var status;
casper.thenOpen(normal_url, function() {
    status = this.status(false)['currentHTTPStatus'];
    if (status != 200) {
        this.thenOpen(alternativeURL, function(){
            // do something
            somethingDone = true;
        });
    }
});
casper.then(function(){
    console.log("status: " + status);
    if (somethingDone) {
        // something has been done
        somethingDone = false;
    }
});

在此示例中, this.thenOpen 将为在 casper.thenOpen somethingDone 之后安排的将是 true casper.then ,因为它紧随其后。

In this example this.thenOpen will be scheduled after casper.thenOpen and somethingDone will be true inside casper.then because it comes after it.

有您需要修复一些问题:

There are some things that you need to fix:


  • 您不用计数器 i :您可能是说。/Draws / wimbledon_draw_ + i + .json 不是。/Draws / wimbledon_draw_ +计数器+ .json

  • 您不能需要 JSON字符串。有趣的是,您可以要求一个JSON文件。我仍然会使用 fs.read 读取文件并解析其中的JSON( JSON.parse )。

  • You don't use your counter i: you probably mean "./Draws/wimbledon_draw_" + i + ".json" not "./Draws/wimbledon_draw_" + counter + ".json"
  • You cannot require a JSON string. Interestingly, you can require a JSON file. I still would use fs.read to read the file and parse the JSON inside it (JSON.parse).

关于您的问题...

您没有安排任何命令。只需在 thenOpen then * wait * )即可。 c $ c>。

You didn't schedule any commands. Just add steps (then* or wait*) behind or inside of thenOpen.

这篇关于CasperJS:遍历URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆