解析并生成Microsoft Office 2007文件(.docx,.xlsx,.pptx) [英] Parsing and generating Microsoft Office 2007 files (.docx, .xlsx, .pptx)

查看:162
本文介绍了解析并生成Microsoft Office 2007文件(.docx,.xlsx,.pptx)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Web项目,必须从用户提供的文档中导入文本和图像,Microsoft Office 2007是一种可能的格式.还需要以这种格式生成文档.

该服务器运行CentOS 5.2,并安装了PHP/Perl/Python.如果需要,我可以执行本地二进制文件和shell脚本.我们使用Apache 2.2,但将在Nginx上线后切换到Nginx.

我有什么选择?任何人都有经验吗?

解决方案

Office 2007文件格式已打开,并且详细记录.粗略地说,所有以"x"结尾的新文件格式都是zip压缩XML文档.例如:

打开Word 2007 XML文件创建一个 临时文件夹,用于存储 文件及其部分.

保存Word 2007文档,其中包含 文字,图片和其他元素,例如 .docx文件.

在.末尾添加.zip扩展名 文件名.

双击文件.它将在 ZIP应用程序.你可以看到 组成文件的部分.

将零件提取到以下文件夹中 您之前创建的.

其他文件格式大致相似.我还不知道有任何开放源代码库可以与它们进行交互-但是根据您的确切要求,读写简单的文档看起来并不难.当然,它比旧格式要容易得多.

如果您需要阅读较旧的格式,则OpenOffice有一个API,可以读写Office 2003和较旧的文档,或多或少地获得了成功.

I have a web project where I must import text and images from a user-supplied document, and one of the possible formats is Microsoft Office 2007. There's also a need to generate documents in this format.

The server runs CentOS 5.2 and has PHP/Perl/Python installed. I can execute local binaries and shell scripts if I must. We use Apache 2.2 but will be switching over to Nginx once it goes live.

What are my options? Anyone had experience with this?

解决方案

The Office 2007 file formats are open and well documented. Roughly speaking, all of the new file formats ending in "x" are zip compressed XML documents. For example:

To open a Word 2007 XML file Create a temporary folder in which to store the file and its parts.

Save a Word 2007 document, containing text, pictures, and other elements, as a .docx file.

Add a .zip extension to the end of the file name.

Double-click the file. It will open in the ZIP application. You can see the parts that comprise the file.

Extract the parts to the folder that you created previously.

The other file formats are roughly similar. I don't know of any open source libraries for interacting with them as yet - but depending on your exact requirements, it doesn't look too difficult to read and write simple documents. Certainly it should be a lot easier than with the older formats.

If you need to read the older formats, OpenOffice has an API and can read and write Office 2003 and older documents with more or less success.

这篇关于解析并生成Microsoft Office 2007文件(.docx,.xlsx,.pptx)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆