使用casperjs从灯箱刮取文本 [英] Scraping text from lightbox using casperjs

查看:64
本文介绍了使用casperjs从灯箱刮取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用casperjs从网站上抓取文本,到目前为止它工作正常。但是,我正在搜索的这个页面上有数百种产品,其中一些产品旁边有一个橙色按钮。

I'm using casperjs to scrape text from a website and so far it works fine. However, this page that I'm scraping from has hundreds of products on it and some of these products have an orange button next to them.

橙色按钮有一个类按钮小橙色。如果你点击这个橙色按钮,它会弹出一个带有产品描述的灯箱。

The orange button has a class of button small orange. If you click on this orange button it will bring up a light box with a description of the product.

如果那里有橙色按钮,我怎样才能点击橙色按钮刮掉描述,然后退出灯箱然后继续迭代100多个产品?

How would I have casper click on the orange button if it's there then scrape the description, then exit the light box then keep on iterating through the 100s of products?

推荐答案

你需要确定每个步骤中涉及的元素。你可以使用Firefox或Chrome中的开发者工具来做到这一点。

You would need to determine the elements that are involved in each step. You can do that with the developer tools in Firefox or Chrome.

你可以找到这样的元素数量:

You can find the number of elements like this:

var buttonNumber = casper.getElementsInfo(".button.small.orange").length;

然后以最大值为基础迭代按钮:

You then iterate over the buttons with the maximum in mind:

var x = require('casper').selectXPath
for(var i = 0; i < buttonNumber; i++) {
    casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
    scheduleScrapeAndClose();
}

// * [包含(@class, 'button')和...] XPath表达式的一部分基本上相当于 .button.small.orange CSS选择器。它返回一个节点列表,之后的索引就是你迭代的按钮。喜欢:(// * [...])[1]

The //*[contains(@class,'button') and ...] part of the XPath expression is basically the equivalent of the .button.small.orange CSS selector. It returns a node list and the index after that is then the button that you iterate over. Like: (//*[...])[1]

你唯一要做的就是do,定义 scheduleScrapeAndClose 函数。它可能看起来像这样:

The only thing that you have to do, is defining the scheduleScrapeAndClose function. It will probably look something like this:

function scheduleScrapeAndClose(){
    casper.waitUntilVisible("your light box selector");
    casper.then(function(){
        // scrape the description
        var descr = this.fetchText("your description selector");
        this.click("your light box close selector");
    });
    casper.waitWhileVisible("again, your light box selector");
}

我假设每次点击按钮只有一个灯箱。

I assume that there exists only one lightbox for every button click.

将它们放在一起它看起来像这样:

Putting it all together it would look like this:

var x = require('casper').selectXPath,
    casper = require('casper').create();

function scheduleScrapeAndClose(){
    // stuff from above
}
casper.start(url);
casper.then(function(){
    var buttonNumber = casper.getElementsInfo(".button.small.orange").length;
    for(var i = 0; i < buttonNumber; i++) {
        casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
        scheduleScrapeAndClose();
    }
});
casper.run(function(){this.exit();});

这篇关于使用casperjs从灯箱刮取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆