使用Java阅读MS Word 2007 [英] Reading MS Word 2007 using Java

查看:254
本文介绍了使用Java阅读MS Word 2007的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过Java读取Microsoft Word文件。我已将Apache poi-3.8-beta1中的所有.jar文件都包含在我的类路径中。但是,当我尝试运行它时,我得到以下异常:

I am trying to read a Microsoft word file through Java. I have included all the .jar files from Apache poi-3.8-beta1 to my classpath. However, when I try running this, I get the following exception:

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
        at readingmsword07.Main.main(Main.java:27)

以下是我的代码:

import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.*;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.xwpf.usermodel.XWPFDocument;


public class Main {


    public static void main(String[] args) {
        try {
            FileInputStream fis = new FileInputStream("C:\\TrialDoc.docx");
            POIFSFileSystem fileSystem = new POIFSFileSystem(fis);            
            org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor =
            new XWPFWordExtractor(new XWPFDocument(fis));
            System.out.print(oleTextExtractor.getText());            
        } catch (Exception e) {
                e.printStackTrace();
        }
    }

}

我是使用XWPFWordExtractor,因为我正在尝试阅读2007年的word文档但由于某种原因我无法找出处理此问题的正确POI。

I am using the XWPFWordExtractor since I am trying to read a 2007 word document but for some reason I am unable to figure out the right POI that deals with this.

任何帮助都很多赞赏。提前致谢!

Any help is much appreciated. Thanks in advance!

~Woods

推荐答案

删除该行,

POIFSFileSystem fileSystem = new POIFSFileSystem(fis);

这篇关于使用Java阅读MS Word 2007的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆