Java RTF Parser [英] Java RTF Parser

查看:358
本文介绍了Java RTF Parser的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人知道我可以在Java中使用的健壮的RTF解析器?我需要提取纯文本,包括国际文本。提取嵌入的图像和文件也很不错。它也可以是我可以轻松调用的C ++或其他库,或者如果有很好的源代码,我可以转换为Java。

Does anyone know of a robust RTF parser I can use in Java? I need to extract plain text, including international text. It would also be nice to extract embedded images and files. It could also be a C++ or other library that I can easily call, or if there is good source code, I can convert to Java.

以下库不包括足够的RTF,或者无法解析一些有效的RTF

The following libraries do not cover enough of the RTF, or fail to parse some valid RTFs


  1. Java Swing的RTFEditorKit,相当基本和脆弱
    Apache Tikka,nutch许多其他工具都使用它。

  2. iText的一个RTF库(com.lowagie.etc ...),不太全面

  3. etranslate rtf库(这是java中最完整的)
    不确定是否有更新版本,但是我的部分rtf集合上的版本失败(RTF有效,至少它们是打开的MsWord和OpenOffice OK)。

有一个相当完整的C#库,但唉...它是C#而不是Java。
http://www.codeproject.com/Articles / 27431 /自己编写RTF转换器

There's a C# library that's reasonably complete, but alas ...it's C# and not Java. http://www.codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter

我也研究过OpenOffice,它对于我需要的东西来说太慢了,尽管它可能非常全面。

I also looked into OpenOffice, it is too slow for what I need, though it's probably very comprehensive.

(在发布此问题之前,我确实进行过网络搜索和堆栈溢出搜索,所以如果你指的是一个古老的已经问过的帖子,它可能不会那里有一个答案。但请随意指出,以防我错过了!)

(I did do web searches and stack overflow searches before posting this question, so if you are referring me to an ancient "already asked" post, it probably doesn't have an answer there. But feel free to point it out, in case I missed it!)

推荐答案

你可能会发现 RTF Parser Kit 非常有用。它提供了一个基于流的解析器,在解析文档时向您提供事件。提供了一个简单的示例文本提取器,演示了如何使用API​​。

You may find RTF Parser Kit useful. It provides a stream-based parser which delivers events to you as the document is parsed. There is a simple example text extractor provided which demonstrates how the API can be used.

这篇关于Java RTF Parser的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆