如何将Jsoup文档转换为W3C文档? [英] How to convert a Jsoup Document to a W3C Document?

查看:144
本文介绍了如何将Jsoup文档转换为W3C文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经通过解析内部HTML页面来构建Jsoup文档,

I have build a Jsoup Document by parsing a in-house HTML page,

public Document newDocument(String path) throws IOException {

    Document doc = null;
    doc = Jsoup.connect(path).timeout(0).get();
            return new HtmlDocument<Document>(doc);
}

我想将Jsoup文档转换为我的org.w3c.dom.Document 我使用了一个可用的库

I would want to convert the Jsoup document to my org.w3c.dom.Document I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document as null. I am unable to understand the problem, tried searching but couldnt find any answer.

生成W3C DOM文档的代码:

Code to generate the W3C DOM Document :

Document jsoupDoc=factory.newDocument("http:localhost/testcases/test_2.html"));
org.w3c.dom.Document docu= DOMBuilder.jsoup2DOM(jsoupDoc);

有人可以帮我吗?

推荐答案

通过HTTP检索jsoup文档,拨打Jsoup.connect(...).get(). 要在本地加载jsoup文档,请调用Jsoup.parse(new File("..."), "UTF-8").

To retrieve a jsoup document via HTTP, make a call to Jsoup.connect(...).get(). To load a jsoup document locally, make a call to Jsoup.parse(new File("..."), "UTF-8").

DomBuilder的呼叫是正确的.

当你说时,

我为此使用了一个可用的库DOMBuilder,但是在解析时 将org.w3c.dom.Document设为null.

I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document as null.

我想你的意思是,为此,我使用了一个可用的库DOMBuilder,但是在打印结果时却得到了[#document: null]."至少,这是我尝试打印w3cDoc对象时看到的结果-但这并不意味着该对象为null.我可以通过调用getDocumentElementgetChildNodes来遍历文档.

I think you mean, "I used an available library, DOMBuilder, for this but when printing the result, I get [#document: null]." At least, that was the result I saw when I tried printing the w3cDoc object - but that doesn't mean the object is null. I was able to traverse the document by making calls to getDocumentElement and getChildNodes.

public static void main(String[] args) {
    Document jsoupDoc = null;

    try {
        jsoupDoc = Jsoup.connect("http://stackoverflow.com/questions/17802445").get();
    } catch (IOException e) {
        e.printStackTrace();
    }

    org.w3c.dom.Document w3cDoc= DOMBuilder.jsoup2DOM(jsoupDoc);
    Element e = w3cDoc.getDocumentElement();
    NodeList childNodes = e.getChildNodes();
    Node n = childNodes.item(2);
    System.out.println(n.getNodeName());
}

这篇关于如何将Jsoup文档转换为W3C文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆