如何从Apache POI知道文件是.docx还是.doc格式 [英] how to know whether a file is .docx or .doc format from Apache POI
问题描述
我知道我们可以通过扩展名或mime类型来完成它,是否有其他方法可以让我们了解文件类型的概念,无论是.docx还是.doc.
I know we can get it done by extension or by mime type, do we have any other way through which we can get the idea of type of file whether it is .docx or .doc.
推荐答案
如果只是要确定是已知为.doc
还是.docx
但没有相应地以扩展名标记的文件集合,您可以使用.docx
文件是文件压缩集合的事实.进行如下调整可能会有所帮助:
If it is just a matter of decided whether a collection of files known to either be .doc
or .docx
but are not marked accordingly with an extension, you can use the fact that a .docx
file is a zipped collection of files. Something to the tune as follows might help:
boolean isZip = new ZipInputStream( fileStream ).getNextEntry() != null;
其中,fileStream
是您要评估的任何文件或其他输入流.您可以通过查找关键的.docx
条目来进一步评估压缩文件.一个很好的开始参考是 Word文档(DOCX).同样,如果您知道它只是二进制文件,则可以测试Word的文件信息块(请参阅
where fileStream
is whatever file or other input stream you wish to evaluate. You could further evaluate a zipped file by looking for key .docx
entries. A good starting reference is Word Document (DOCX). Likewise, if you know it is just a binary file, you can test for Word's File Information Block (see Word (.doc) Binary File Format)
这篇关于如何从Apache POI知道文件是.docx还是.doc格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!