用于HTML分析的Java库 [英] Java library for HTML analysis

查看:91
本文介绍了用于HTML分析的Java库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(我见过类似的问题,但我认为他们中没有一个能满足我的具体需求,因此...)



我想知道是否有是用于分析真实世界(阅读:不完整,格式不正确)的HTML的Java库。通过分析,我的意思是:


  • 弄清楚HTML块中最显眼的颜色

  • 改变颜色为其他颜色(因此,必须支持修改HTML)
  • 修剪不需要的标签修改 修改 HTML来产生格式良好的HTML代码片段



最后两部分代码由像Jericho和jTidy这样的库完成。在这些顶部的'插件'会很好。



预先感谢!

解决方案

您可能想查看TagSoup:



http://home.ccil.org/~cowan/XML/tagsoup/


(I've seen similar questions, but I think none of them cater to my specific needs, hence...)

I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like:

  • figuring out the most prominent color in an HTML chunk
  • changing that color to some other color (hence, has to support modification of the HTML as well)
  • pruning out unwanted tags
  • fixing up the HTML to result in a well formed HTML snippet

Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great.

Thanks in advance!

解决方案

You might want to check out TagSoup:

http://home.ccil.org/~cowan/XML/tagsoup/

这篇关于用于HTML分析的Java库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆