用于HTML到文本转换的开源java库 [英] Open source java library for HTML to text conversion
问题描述
您能推荐一个开源的Java库(最好是ASL / BSD / LGPL许可证),它将HTML转换为纯文本 - 清理所有标签,转换实体(& ;, 等)。 )并处理< br>和表格。
更多信息
< ,没有必要从网络上获取它。此外,我正在寻找的是这样的方法:
字符串convertHtmlToPlainText(字符串html)
code>
试试 Jericho 。
TextExtractor 类听起来像它会做你想做的。对不起,不能发布第二个链接,因为我是一个新用户,但向下滚动页面并且有链接。
Can you recommend an open source Java library (preferably ASL/BSD/LGPL license) that converts HTML to plain text - cleans all the tags, converts entities (&, , etc.) and handles <br> and tables properly.
More Info
I have the HTML as a string, there's no need to fetch it from the web. Also, what I'm looking is for a method like this:
String convertHtmlToPlainText(String html)
Try Jericho.
The TextExtractor class sounds like it will do what you want. Sorry can't post a 2nd link as I'm a new user but scroll down the homepage a bit and there's a link to it.
这篇关于用于HTML到文本转换的开源java库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!