如何在Crawljax中获取抓取内容 [英] How to Get Crawl content in Crawljax
问题描述
我使用Crawljax抓取了动态网页.我可以获取当前的ID,状态和DOM.但我无法获得网站内容.任何人都可以帮助我吗?
I have crawl Dynamic webpage using Crawljax. i can able to get crawl current id, status and dom. but i can't get the Website content.. Any one help me??
CrawljaxConfigurationBuilder builder =
CrawljaxConfiguration.builderFor("http://demo.crawljax.com/");
builder.addPlugin(new OnNewStatePlugin() {
@Override
public String toString() {
return "Our example plugin";
}
@Override
public void onNewState(CrawlerContext cc, StateVertex sv) {
LOG.info("Found a new dom! Here it is:\n{}", cc.getBrowser().getStrippedDom());
String name = cc.getCurrentState().getName();
String url = cc.getBrowser().getCurrentUrl();
System.out.println(cc.getCurrentState().getDom());
System.out.println("New State: " + name + "; url: " + url);
}
});
CrawljaxRunner crawljax = new CrawljaxRunner(builder.build());
crawljax.call();
如何获取动态/java脚本网页内容.
How to get dynamic/java script Webpage content..
推荐答案
我们可以获取网站源代码cc.getBrowser().getStrippedDom());或cc.getCurrentState().getDocument();或此编码是返回源代码(css/java脚本文件).
We can able to get website source code cc.getBrowser().getStrippedDom()); or cc.getCurrentState().getDocument(); This coding are Return Source code (css/java script file)..
不可能.由于使用了它的测试工具.该工具仅检查文本"是否可用,将临时数据分配给字段".
Not possible.Because its testing tool.This tool only check Text are available, assign temp data to Fields.
这篇关于如何在Crawljax中获取抓取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!