PHP OOXML库? [英] PHP OOXML Libraries?

查看:86
本文介绍了PHP OOXML库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个客户要我为他正在运行的Web应用程序构建一个模块,该模块可以加载docx文件并根据文档中的标题提取数据.我知道docx只是一个zip文件,我需要的大多数内容都可以在word/document.xml中找到,尽管我不希望解析列表/样式/图像/表格以及任何其他需要翻译的内容OOXML转换为HTML.

A customer is asking me to build a module for his running webapp that can load docx files and extract data based on the Headings found in the document. I know docx is just a zip file and most of what I need can be found in word/document.xml, though I'm not looking forward to parsing lists/styles/images/tables and whatever other things that need to be translated from OOXML to HTML.

是否存在此格式的PHP库?不过,我确实需要某种灵活性:只是一个OOXML到HTML的转换器不会削减它,我需要将文档分成几部分.

Are there any PHP libraries for this format? I do need some sort of flexibility though: just an OOXML to HTML converter is not going to cut it, I need to break the document up in parts.

推荐答案

如果纯粹是docx,则可以尝试 phpdocx ...不知道它是读取还是仅写入. PHPWord 尚未读,只能写(尽管我正在研究).

If it's purely docx, you can try phpdocx... don't know if it reads or only writes. PHPWord doesn't yet read, only writes (though I'm working on it).

如果您只需要属性信息,则可以在zip的/docProps/core.xml文件中找到它们(并可能在/docProps/app.xml中,具体取决于所需的属性),因此您可以绕过包含文本,样式,图像等的大多数文件.为了验证文件名,[Content_Types] .xml将核心和应用程序属性文件的文件名保存为application/vnd.openxmlformats-officedocument.spreadsheetml.sheet .main + xml和application/vnd.openxmlformats-officedocument.extended-properties + xml

If you only need the properties information, then you'll find it all within the /docProps/core.xml file within the zip (and possibly in /docProps/app.xml depending on exactly which properties you need), so you can bypass most of the files that hold text, style, images, etc. For verification of file names, [Content_Types].xml holds the filenames for the core and app properties files as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml and application/vnd.openxmlformats-officedocument.extended-properties+xml

如果需要标题,则需要解析文档,而不仅仅是属性.这将意味着识别标题样式,并解析具有这些样式的实体的文本.

If you need headings, then you will need to parse the document, not just the properties. That will mean identifying the heading styles, and parsing the text for entities with those styles.

这篇关于PHP OOXML库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆