解析/转换旧版Word文档?(msword2/5) [英] Parsing / Converting legacy Word documents? (msword2 / 5)
问题描述
我们有一些非常老的.doc文档.通常,我们使用tika(我们的应用程序通常先提取文本,然后进行PDF/A转换),但是显然当前不支持msword2(和msword5).我找到的唯一替代方法是Libreoffice命令行.还有什么吗?
We got some really old .doc documents. Normally we use tika (our application normally does a text extract and then a PDF/A convert) but apparently msword2 (and msword5) are not supported currently. The only alternative I found was Libreoffice commandline. Is there anything else?
要进行搜索非常困难,因为其他所有人似乎都在像1995年那样寻找老字号".而不是< 1991
Searching for this is quite hard since everyone else seems to be looking for "old" as in 1995< and not <1991
推荐答案
我们对该问题进行了更多研究,似乎唯一的答案是我们需要使用某些版本的libwps库(相同LibreOffice使用).
We have looked into the issue a bit more and it seems that the only answer is that we need to use some version of the libwps library (which is the same LibreOffice uses).
我们将研究使用Libreoffice命令行或库本身的优缺点,并且可能只是为我们的应用程序创建一个微服务以供使用.
We will look into the pros and cons of using Libreoffice commandline or the library itself and will probably just create a microservice for our application to use.
这篇关于解析/转换旧版Word文档?(msword2/5)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!