casperjs:import json和cycle export result json [英] casperjs: import json and cycle export result json

查看:276
本文介绍了casperjs:import json和cycle export result json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我需要导入casperjs中的一些链接和导出结果html在json或任何,但如果我有一个文件中的1milion链接我需要一些自动,像一个循环调用每个链接从json示例),然后使用casperjs提取html,然后在文件json或其他任何地方写入。这是我的脚本,但写入文件,而不是在某些文件db,json或csv中的链接。如何根据我的需要修改此脚本:


I need import some links in casperjs and export result html in json or whatever, but if I have 1milion link in one file I need something automatic, like a cycle for call each link from json(example) then extract html with casperjs, then write in file json or whatever. This is my script but write in file and not take links in some file db, json or csv. How I can modify this script for my needs?:

var casper = require('casper').create({
    pageSettings: {
        loadImages: true,
        loadPlugins: false,
        userAgent: 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36',
        javascriptEnabled: true,
        verbose: true,
        logLevel: "debug",
        cookiesEnabled: true
    }

});

var fs = require('fs');
var x = require("casper").selectXPath;
phantom.cookiesEnabled = true;
phantom.javascriptEnabled = true;

casper.start().thenOpen("LINK_LOGIN", function() {
    console.log("Link opened...");
    });
casper.then(
    function() {
        casper.echo("clicking..");
        casper.click(x("/html/body/div[@id='whais']/ul[@id='undest']/li[@id='login-you']/a"));
    });

casper.then(function(){
    console.log("Login...");
    this.sendKeys('input[id="login"]', 'USER');
    this.sendKeys('input[id="password"]', 'PASSWORD');
    casper.echo("click");
    casper.click('input[type="submit"][name="form"]');  
    this.evaluate(function(){
        document.getElementById("button-send").click();
    });

});

casper.thenOpen("OTHER_LINK_SAME_DOMAIN", function() {    
    console.log("page loading...");
    console.log("...write html");
    var html = this.getHTML();
    var f = fs.open('my.html', 'w');
    f.write(html);
    f.close();

}).waitForText("how are you?", function() {
    this.echo('Found the answer.');
},
function() {
    this.echo('not found answer, time out!');
},60000
);

casper.run();

谢谢!!!

推荐答案


我在一个文件中有1百万个链接。

I have 1 milion link in one files.

应该读取该文件的内容。请查看 fs.read

First you should read the content of that file. Take a look at fs.read.


我需要一些自动的,比如一个循环,从json(示例)调用每个链接,然后提取html与casperjs然后写入文件json或任何。

I need something automatic, like a cycle for call each link from json(example) then extract html with casperjs then write in file json or whatever.

使用循环打开每个网址,然后保存。示例代码:

Use a loop to open every url, then save it. A example code:

var url_list = [...]// contains the url from the local file

casper.start()
var index = 0;
casper.then(function () {
        for (var i = 0; i < url_list.length; i++) {
            casper.thenOpen(url_list[i], function () {
                fs.write(index + '.html', this.getHTML(), 'w')
                index ++
            })
        }
    }
)

这篇关于casperjs:import json和cycle export result json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆