如何将Jsoup文档转换为W3C文档? [英] How to convert a Jsoup Document to a W3C Document?
问题描述
我已经通过解析内部HTML页面来构建Jsoup文档,
I have build a Jsoup Document by parsing a in-house HTML page,
public Document newDocument(String path) throws IOException {
Document doc = null;
doc = Jsoup.connect(path).timeout(0).get();
return new HtmlDocument<Document>(doc);
}
我想将Jsoup文档转换为我的org.w3c.dom.Document
我使用了一个可用的库
I would want to convert the Jsoup document to my org.w3c.dom.Document
I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document
as null. I am unable to understand the problem, tried searching but couldnt find any answer.
生成W3C DOM文档的代码:
Code to generate the W3C DOM Document :
Document jsoupDoc=factory.newDocument("http:localhost/testcases/test_2.html"));
org.w3c.dom.Document docu= DOMBuilder.jsoup2DOM(jsoupDoc);
有人可以帮我吗?
推荐答案
通过HTTP检索jsoup文档,拨打Jsoup.connect(...).get()
. 要在本地加载jsoup文档,请调用Jsoup.parse(new File("..."), "UTF-8")
.
To retrieve a jsoup document via HTTP, make a call to Jsoup.connect(...).get()
. To load a jsoup document locally, make a call to Jsoup.parse(new File("..."), "UTF-8")
.
对DomBuilder
的呼叫是正确的.
当你说时,
我为此使用了一个可用的库DOMBuilder,但是在解析时 将org.w3c.dom.Document设为null.
I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document as null.
我想你的意思是,为此,我使用了一个可用的库DOMBuilder,但是在打印结果时却得到了[#document: null]
."至少,这是我尝试打印w3cDoc
对象时看到的结果-但这并不意味着该对象为null.我可以通过调用getDocumentElement
和getChildNodes
来遍历文档.
I think you mean, "I used an available library, DOMBuilder, for this but when printing the result, I get [#document: null]
." At least, that was the result I saw when I tried printing the w3cDoc
object - but that doesn't mean the object is null. I was able to traverse the document by making calls to getDocumentElement
and getChildNodes
.
public static void main(String[] args) {
Document jsoupDoc = null;
try {
jsoupDoc = Jsoup.connect("http://stackoverflow.com/questions/17802445").get();
} catch (IOException e) {
e.printStackTrace();
}
org.w3c.dom.Document w3cDoc= DOMBuilder.jsoup2DOM(jsoupDoc);
Element e = w3cDoc.getDocumentElement();
NodeList childNodes = e.getChildNodes();
Node n = childNodes.item(2);
System.out.println(n.getNodeName());
}
这篇关于如何将Jsoup文档转换为W3C文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!