使用htmlunit -Java访问Javascript生成的html [英] Accessing html generated by Javascript with htmlunit -Java
问题描述
我正在尝试测试使用javascript渲染大部分HTML的网站。使用HTMLUNIT浏览器,您如何能够访问javascript生成的html?我正在浏览他们的文档,但不确定最佳方法是什么。
I am trying to be able to test a website that uses javascript to render most of the HTML. With the HTMLUNIT browser how would you be able to access the html generated by the javascript? I was looking through their documentation but wasn't sure what the best approach might be.
WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("some url");
String Source = currentPage.asXml();
System.out.println(Source);
这是一种简单的方法来获取页面的html,但是你会使用domNode或其他方法来访问由javascript生成的HTML?
This is an easy way to get back the html of the page but would you use the domNode or another way to access the html generated by the javascript?
推荐答案
你需要花点时间让JavaScript执行。
You gotta give some time for the JavaScript to execute.
检查下面的示例工作代码。 存储桶
div
s不在原始来源中。
Check a sample working code below. The bucket
div
s aren't in the original source.
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class GetPageSourceAfterJS {
public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); /* comment out to turn off annoying htmlunit warnings */
WebClient webClient = new WebClient();
String url = "http://www.futurebazaar.com/categories/Home--Living-Luggage--Travel-Airbags--Duffel-bags/cid-CU00089575.aspx";
System.out.println("Loading page now: "+url);
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(30 * 1000); /* will wait JavaScript to execute up to 30s */
String pageAsXml = page.asXml();
System.out.println("Contains bucket? --> "+pageAsXml.contains("bucket"));
//get divs which have a 'class' attribute of 'bucket'
List<?> buckets = page.getByXPath("//div[@class='bucket']");
System.out.println("Found "+buckets.size()+" 'bucket' divs.");
//System.out.println("#FULL source after JavaScript execution:\n "+pageAsXml);
}
}
输出:
Loading page now: http://www.futurebazaar.com/categories/Mobiles-Mobile-Phones/cid-CU00089697.aspx?Rfs=brandZZFly001PYXQcurtrayZZBrand
Contains bucket? --> true
Found 3 'bucket' divs.
使用的HtmlUnit版本:
HtmlUnit version used:
<dependency>
<groupId>net.sourceforge.htmlunit</groupId>
<artifactId>htmlunit</artifactId>
<version>2.12</version>
</dependency>
这篇关于使用htmlunit -Java访问Javascript生成的html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!