解析/转换旧版Word文档?(msword2/5) [英] Parsing / Converting legacy Word documents? (msword2 / 5)

查看:78
本文介绍了解析/转换旧版Word文档?(msword2/5)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一些非常老的.doc文档.通常,我们使用tika(我们的应用程序通常先提取文本,然后进行PDF/A转换),但是显然当前不支持msword2(和msword5).我找到的唯一替代方法是Libreoffice命令行.还有什么吗?

We got some really old .doc documents. Normally we use tika (our application normally does a text extract and then a PDF/A convert) but apparently msword2 (and msword5) are not supported currently. The only alternative I found was Libreoffice commandline. Is there anything else?

要进行搜索非常困难,因为其他所有人似乎都在像1995年那样寻找老字号".而不是< 1991

Searching for this is quite hard since everyone else seems to be looking for "old" as in 1995< and not <1991

推荐答案

我们对该问题进行了更多研究,似乎唯一的答案是我们需要使用某些版本的libwps库(相同LibreOffice使用).

We have looked into the issue a bit more and it seems that the only answer is that we need to use some version of the libwps library (which is the same LibreOffice uses).

我们将研究使用Libreoffice命令行或库本身的优缺点,并且可能只是为我们的应用程序创建一个微服务以供使用.

We will look into the pros and cons of using Libreoffice commandline or the library itself and will probably just create a microservice for our application to use.

这篇关于解析/转换旧版Word文档?(msword2/5)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆