如何使用java过滤Microsoft Word文档 [英] How To Filter a Microsoft word document using java

查看:174
本文介绍了如何使用java过滤Microsoft Word文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,



我正在开发一个包含以下功能的java应用程序



将Microsoft Word文档作为输入并从符号和其他任何附加过滤



]。,[''{\ / |(.. iec,还删除任何空行



以便得到一个只包含所有文档字的数组?



任何帮助请!!

解决方案

有一些着名的第二方库使用Microsoft文档:

http://poi.apache.org/ [ ^ ],

http://sourceforge.net/projects/ openxml4j / [ ^ ]。



如果你可以找到更多的东西谷歌吧。



如果你只能使用更新的OpenXML格式,那就是开放标准ECMA-376和ISO / IEC 29500: 2008年。请参阅:

http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats [< a href =http://en.wikipedia.org/wiki/Microsoft_Office_XML_formatstarget =_ blanktitle =New Window> ^ ],

http://en.wikipedia.org/wiki/Office_Open_XML [ ^ ],

http://www.ecma-international.org/publications/standards/Ecma-376.htm [ ^ ]。



-SA

Hello,

simply i''m developing a java application which will include the following functionality

takes a Microsoft Word Document as input and Filtering it from Symbols and any additional

"].,[''{\/|( ..etc , also Removing any empty lines

so as to get an Array that only contains All Document Words ??

Any Help Please !!

解决方案

There are some well-known 2rd-party libraries working with Microsoft documents:
http://poi.apache.org/[^],
http://sourceforge.net/projects/openxml4j/[^].

Probably you could find some more if you Google for it.

It would be much better if you could work only with the newer OpenXML formats, which is the open standards ECMA-376 and ISO/IEC 29500:2008. Please see:
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats[^],
http://en.wikipedia.org/wiki/Office_Open_XML[^],
http://www.ecma-international.org/publications/standards/Ecma-376.htm[^].

—SA


这篇关于如何使用java过滤Microsoft Word文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆