使用Java将HTML文件读入DOM树 [英] Reading HTML file to DOM tree using Java
问题描述
DOM / Xpath
API。 大多数库似乎都有自定义API来解决这个任务。此外,转换HTML到XML-DOM似乎不被大多数可用的解析器支持。
任何想法或经验与一个好的HTML DOM解析器?
JTidy ,由将流处理到XHTML,然后使用您最喜欢的DOM实现来重新解析,或者使用parseDOM,如果有限的DOM imp给出足够的话。
或者 Neko 。
Is there a parser/library which is able to read an HTML document into a DOM tree using Java? I'd like to use the standard DOM/Xpath
API that Java provides.
Most libraries seem have custom API's to solve this task. Furthermore the conversion HTML to XML-DOM seems unsupported by the most of the available parsers.
Any ideas or experience with a good HTML DOM parser?
JTidy, either by processing the stream to XHTML then using your favourite DOM implementation to re-parse, or using parseDOM if the limited DOM imp that gives you is enough.
Alternatively Neko.
这篇关于使用Java将HTML文件读入DOM树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!