哪个 HTML 解析器是最好的? [英] Which HTML Parser is the best?

查看：31 发布时间：2021/11/25 12:32:21 java html parsing html-parsing web-scraping

本文介绍了哪个 HTML 解析器是最好的?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了很多解析器.到目前为止，我一直使用 HtmlUnit 无头浏览器进行解析和浏览器自动化.

I code a lot of parsers. Up until now, I was using HtmlUnit headless browser for parsing and browser automation.

现在，我想把这两个任务分开.

Now, I want to separate both the tasks.

由于我 80% 的工作只涉及解析，我想使用轻量级的 HTML 解析器，因为在 HtmlUnit 中首先加载页面，然后获取源代码然后解析它需要很多时间.

As 80% of my work involves just parsing, I want to use a light HTML parser because it takes much time in HtmlUnit to first load a page, then get the source and then parse it.

我想知道哪个 HTML 解析器最好.解析器如果接近HtmlUnit解析器会更好.

I want to know which HTML parser is the best. The parser would be better if it is close to HtmlUnit parser.

最好，我至少需要以下功能:

By best, I want at least the following features:

速度
通过id"或名称"或标签类型"轻松定位任何 HtmlElement.

如果它不清理肮脏的 HTML 代码对我来说没问题.我不需要清理任何 HTML 源代码.我只需要一种最简单的方法来移动 HtmlElements 并从中获取数据.

It would be ok for me if it doesn't clean the dirty HTML code. I don't need to clean any HTML source. I just need an easiest way to move across HtmlElements and harvest data from them.

推荐答案

自插:我刚刚发布了一个新的 Java HTML 解析器:jsoup.我在这里提到它是因为我认为它会满足您的需求.

Self plug: I have just released a new Java HTML parser: jsoup. I mention it here because I think it will do what you are after.

它的聚会技巧是一个用于查找元素的 CSS 选择器语法，例如:

Its party trick is a CSS selector syntax to find elements, e.g.:

String html = "<html><head><title>First parse</title></head>"
  + "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);
Elements links = doc.select("a");
Element head = doc.select("head").first();

有关详细信息，请参阅 Selector javadoc.

See the Selector javadoc for more info.

这是一个新项目，所以非常欢迎任何改进的想法！

This is a new project, so any ideas for improvement are very welcome!

这篇关于哪个 HTML 解析器是最好的?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

哪个 HTML 解析器是最好的? [英] Which HTML Parser is the best?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

哪个 HTML 解析器是最好的? [英] Which HTML Parser is the best?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭