如何使用java将pdf文件正确转换为word文档 [英] How do I properly convert a pdf file to word document using java

查看:115
本文介绍了如何使用java将pdf文件正确转换为word文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将pdf文件转换为word(.rtf,.doc),但不应更改文档的结构。

I want to convert a pdf file to word (.rtf, .doc) but the structure of the document should not be changed.

推荐答案

要用词来表示PDF,您可以使用Java库iText:

https://en.wikipedia.org/wiki/IText [ ^ ],

iText [ ^ ]。



我不知道你为什么要用Java创建RTF或DOC(特别是专有DOC;我只能理解它是否是.DOCX)。我建议将其转换为HTML或一些基于XML的文档格式。事实上,您并不确切地知道您想要的文档格式,DOC的RTD以及您未提及DOCX的事实,强烈建议您不需要其中任何一种,并且HTML将是您的最佳选择。



但是,如果你真的想要RTF(我再次怀疑它),那也不算太糟糕:格式的描述是公开的,使用它。或者使用一些第三方库。其中一个是jRTF:

jRTF =用于构建RTF文档的新库| Java博客 [ ^ ]。



另一种选择是Apache RTFlib: Apache(tm)FOP开发:RTFLib(jfor) [ ^ ]。



您可以自己搜索并查找其他内容。



Microsoft DOCX格式要复杂得多。我甚至不想讨论DOC,这是过时的,搞砸了;没有正式的公共标准。



使用DOCX,您有ECMA标准,可公开获得:

办公室开放XML - 维基百科,免费的百科全书 [ ^ ],

Microsoft Office XML格式 - 维基百科,免费的百科全书 [ ^ ],

标准ECMA-376 [ ^ ]。



您可以使用开源docx4j: docx4j [ ^ ]。



这就是全部。但最好听一个好建议并创建HTML。



-SA
To word with PDF, you can use Java library iText:
https://en.wikipedia.org/wiki/IText[^],
iText[^].

I have no idea why would you want to create RTF or DOC with Java (especially proprietary DOC; I could only understand if it was .DOCX). I would suggest to convert it to HTML or some of the document formats based on XML. The fact that you don't know exactly what document format you want, RTD of DOC, and the fact you did not mention DOCX, strongly suggests that you don't really need any of them, and HTML would be your best choice.

However, if you really want RTF (again, I doubt it), it's not too bad: the description of the format is publicly available, use it. Or use some 3rd-party library. One of them is jRTF:
jRTF = a new library for building RTF documents | Java Blog[^].

Another option is Apache RTFlib: Apache(tm) FOP Development: RTFLib (jfor)[^].

You can do your own search and find something else.

Microsoft DOCX format is much more complicated. And I don't even want to discuss DOC, which is obsolete and messed up; there is no an official public standard.

With DOCX, you have the ECMA standard, which is publicly available:
Office Open XML - Wikipedia, the free encyclopedia[^],
Microsoft Office XML formats - Wikipedia, the free encyclopedia[^],
Standard ECMA-376[^].

You can use open-source docx4j: docx4j[^].

That's all. But better listen to a good advice and create HTML.

—SA


这篇关于如何使用java将pdf文件正确转换为word文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆