使用casperjs从灯箱刮取文本 [英] Scraping text from lightbox using casperjs
问题描述
我正在使用casperjs从网站上抓取文本,到目前为止它工作正常。但是,我正在搜索的这个页面上有数百种产品,其中一些产品旁边有一个橙色按钮。
I'm using casperjs to scrape text from a website and so far it works fine. However, this page that I'm scraping from has hundreds of products on it and some of these products have an orange button next to them.
橙色按钮有一个类按钮小橙色
。如果你点击这个橙色按钮,它会弹出一个带有产品描述的灯箱。
The orange button has a class of button small orange
. If you click on this orange button it will bring up a light box with a description of the product.
如果那里有橙色按钮,我怎样才能点击橙色按钮刮掉描述,然后退出灯箱然后继续迭代100多个产品?
How would I have casper click on the orange button if it's there then scrape the description, then exit the light box then keep on iterating through the 100s of products?
推荐答案
你需要确定每个步骤中涉及的元素。你可以使用Firefox或Chrome中的开发者工具来做到这一点。
You would need to determine the elements that are involved in each step. You can do that with the developer tools in Firefox or Chrome.
你可以找到这样的元素数量:
You can find the number of elements like this:
var buttonNumber = casper.getElementsInfo(".button.small.orange").length;
然后以最大值为基础迭代按钮:
You then iterate over the buttons with the maximum in mind:
var x = require('casper').selectXPath
for(var i = 0; i < buttonNumber; i++) {
casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
scheduleScrapeAndClose();
}
// * [包含(@class, 'button')和...]
XPath表达式的一部分基本上相当于 .button.small.orange
CSS选择器。它返回一个节点列表,之后的索引就是你迭代的按钮。喜欢:(// * [...])[1]
The //*[contains(@class,'button') and ...]
part of the XPath expression is basically the equivalent of the .button.small.orange
CSS selector. It returns a node list and the index after that is then the button that you iterate over. Like: (//*[...])[1]
你唯一要做的就是do,定义 scheduleScrapeAndClose
函数。它可能看起来像这样:
The only thing that you have to do, is defining the scheduleScrapeAndClose
function. It will probably look something like this:
function scheduleScrapeAndClose(){
casper.waitUntilVisible("your light box selector");
casper.then(function(){
// scrape the description
var descr = this.fetchText("your description selector");
this.click("your light box close selector");
});
casper.waitWhileVisible("again, your light box selector");
}
我假设每次点击按钮只有一个灯箱。
I assume that there exists only one lightbox for every button click.
将它们放在一起它看起来像这样:
Putting it all together it would look like this:
var x = require('casper').selectXPath,
casper = require('casper').create();
function scheduleScrapeAndClose(){
// stuff from above
}
casper.start(url);
casper.then(function(){
var buttonNumber = casper.getElementsInfo(".button.small.orange").length;
for(var i = 0; i < buttonNumber; i++) {
casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
scheduleScrapeAndClose();
}
});
casper.run(function(){this.exit();});
这篇关于使用casperjs从灯箱刮取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!