可能解析HTML文档并构建一个DOM树（java） [英] Possible to parse a HTML document and build a DOM tree(java)

查看：191 发布时间：2017/6/25 0:15:12 java html dom parsing html-content-extraction

本文介绍了可能解析HTML文档并构建一个DOM树（java）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

可以使用什么工具来将html文档作为字符串或文件解析，然后构建DOM树，以便开发人员可以通过某些API来遍历树。

Is it possible and what tools could be used to parse an html document as a string or from a file and then to construct a DOM tree so that a developer can walk the tree through some API.

例如：

DomRoot = parse("myhtml.html");

for (tags : DomRoot) {
}

注意：这是一个不是XHtml的HTML文件。

Note: this is a HTML document not XHtml.

推荐答案

可以使用 TagSoup - 它是一个SAX兼容解析器，可以将通用网页中的格式错误的内容（如HTML）清理成格式正确的XML。 >

You can use TagSoup - it is a SAX Compliant parser that can clean malformed content such as HTML from generic web pages into well-formed XML.

This is <B>bold, <I>bold italic, </b>italic, </i>normal text

gets correctly rewritten as:

This is <b>bold, <i>bold italic, </i></b><i>italic, </i>normal text.

这篇关于可能解析HTML文档并构建一个DOM树（java）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

可能解析HTML文档并构建一个DOM树（java） [英] Possible to parse a HTML document and build a DOM tree(java)

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

可能解析HTML文档并构建一个DOM树（java） [英] Possible to parse a HTML document and build a DOM tree(java)

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭