如何从Apache POI知道文件是.docx还是.doc格式 [英] how to know whether a file is .docx or .doc format from Apache POI

查看:137
本文介绍了如何从Apache POI知道文件是.docx还是.doc格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道我们可以通过扩展名或mime类型来完成它,是否有其他方法可以让我们了解文件类型的概念,无论是.docx还是.doc.

I know we can get it done by extension or by mime type, do we have any other way through which we can get the idea of type of file whether it is .docx or .doc.

推荐答案

如果只是要确定是已知为.doc还是.docx但没有相应地以扩展名标记的文件集合,您可以使用.docx文件是文件压缩集合的事实.进行如下调整可能会有所帮助:

If it is just a matter of decided whether a collection of files known to either be .doc or .docx but are not marked accordingly with an extension, you can use the fact that a .docx file is a zipped collection of files. Something to the tune as follows might help:

boolean isZip = new ZipInputStream( fileStream ).getNextEntry() != null;

其中,fileStream是您要评估的任何文件或其他输入流.您可以通过查找关键的.docx条目来进一步评估压缩文件.一个很好的开始参考是 Word文档(DOCX).同样,如果您知道它只是二进制文件,则可以测试Word的文件信息块(请参阅

where fileStream is whatever file or other input stream you wish to evaluate. You could further evaluate a zipped file by looking for key .docx entries. A good starting reference is Word Document (DOCX). Likewise, if you know it is just a binary file, you can test for Word's File Information Block (see Word (.doc) Binary File Format)

这篇关于如何从Apache POI知道文件是.docx还是.doc格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆