使用htmlunit -Java访问Javascript生成的html [英] Accessing html generated by Javascript with htmlunit -Java

查看:138
本文介绍了使用htmlunit -Java访问Javascript生成的html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试测试使用javascript渲染大部分HTML的网站。使用HTMLUNIT浏览器,您如何能够访问javascript生成的html?我正在浏览他们的文档,但不确定最佳方法是什么。

I am trying to be able to test a website that uses javascript to render most of the HTML. With the HTMLUNIT browser how would you be able to access the html generated by the javascript? I was looking through their documentation but wasn't sure what the best approach might be.

WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("some url");
String Source = currentPage.asXml();
System.out.println(Source);

这是一种简单的方法来获取页面的html,但是你会使用domNode或其他方法来访问由javascript生成的HTML?

This is an easy way to get back the html of the page but would you use the domNode or another way to access the html generated by the javascript?

推荐答案

你需要花点时间让JavaScript执行。

You gotta give some time for the JavaScript to execute.

检查下面的示例工作代码。 存储桶 div s不在原始来源中。

Check a sample working code below. The bucket divs aren't in the original source.

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class GetPageSourceAfterJS {
    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); /* comment out to turn off annoying htmlunit warnings */
        WebClient webClient = new WebClient();
        String url = "http://www.futurebazaar.com/categories/Home--Living-Luggage--Travel-Airbags--Duffel-bags/cid-CU00089575.aspx";
        System.out.println("Loading page now: "+url);
        HtmlPage page = webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(30 * 1000); /* will wait JavaScript to execute up to 30s */

        String pageAsXml = page.asXml();
        System.out.println("Contains bucket? --> "+pageAsXml.contains("bucket"));

        //get divs which have a 'class' attribute of 'bucket'
        List<?> buckets = page.getByXPath("//div[@class='bucket']");
        System.out.println("Found "+buckets.size()+" 'bucket' divs.");

        //System.out.println("#FULL source after JavaScript execution:\n "+pageAsXml);
    }
}

输出:

Loading page now: http://www.futurebazaar.com/categories/Mobiles-Mobile-Phones/cid-CU00089697.asp‌​x?Rfs=brandZZFly001PYXQcurtrayZZBrand
Contains bucket? --> true
Found 3 'bucket' divs.

使用的HtmlUnit版本:

HtmlUnit version used:

<dependency>
    <groupId>net.sourceforge.htmlunit</groupId>
    <artifactId>htmlunit</artifactId>
    <version>2.12</version>
</dependency>

这篇关于使用htmlunit -Java访问Javascript生成的html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆