我需要的Apache POI图片从Word文档转换为HTML文件 [英] I need Apache POI Pictures converted from a word document to a html file

查看：573 发布时间：2016/5/22 13:32:36 java html image ms-word apache-poi

本文介绍了我需要的Apache POI图片从Word文档转换为HTML文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个使用了Java的Apache POI库中打开一个Word文档，并将其转换为HTML，使用Apache的POI一些code和它也得到对文档图像的字节数组数据。但我需要这些信息转换为HTML写出来到HTML文件。任何提示或建议将是AP preciated。请记住，我是一个桌面开发开发商，而不是一个网络程序员，所以当你提出建议，请记住这一点。下面的code获取图像。

 私人无效parseWordText（档案文件）抛出IOException
      的FileInputStream FS =新的FileInputStream（文件）;
      DOC =新HWPFDocument（FS）;
      PicturesTable picTable = doc.getPicturesTable（）;
      如果（picTable！= NULL）{
           picList =新的ArrayList＆LT;图片和GT;（picTable.getAllPictures（））;
           如果（！picList.isEmpty（））{
           对于（图片图：picList）{
                字节[]的字节数组= pic.getContent（）;
                pic.suggestFileExtension（）;
                pic.suggestFullFileName（）;
                pic.suggestPictureType（）;
                pic.getStartOffset（）;
           }
        }
     }

然后就是下code转换为HTML文档。有没有一种办法的ByteArray添加到ByteArrayOutputStream下code？

 私人无效convertWordDoctoHTML（档案文件）抛出的ParserConfigurationException，TransformerConfigurationException，TransformerException中，IOException异常{
    HWPFDocumentCore wordDocument = NULL;
    尝试{
        wordDocument = WordToHtmlUtils.loadDoc（新的FileInputStream（文件））;
    }赶上（IOException异常前）{
        Exceptions.printStackTrace（除息）;
    }    WordToHtmlConverter wordToHtmlConverter =新WordToHtmlConverter（DocumentBuilderFactory.newInstance（）newDocumentBuilder（）新建文档（））;
    wordToHtmlConverter.processDocument（wordDocument）;
    org.w3c.dom.Document中的HTMLDocument = wordToHtmlConverter.getDocument（）;
    NamedNodeMap中的节点= htmlDocument.getAttributes（）;
    ByteArrayOutputStream出=新ByteArrayOutputStream（）;
    DOMSource的DOMSource的=新DOMSource的（HTMLDocument的）;
    StreamResult streamResult =新的StreamResult（出）;    TF的TransformerFactory = TransformerFactory.newInstance（）;
    变压器串行= tf.newTransformer（）;
    serializer.setOutputProperty（OutputKeys.ENCODING，UTF-8）;
    serializer.setOutputProperty（OutputKeys.INDENT，是）;
    serializer.setOutputProperty（OutputKeys.METHOD，HTML）;
    serializer.transform（DOMSource的，streamResult）;
    out.close（）;    字符串结果=新的String（out.toByteArray（））;
    acDocTextArea.setText（newDocText）;    的htmlText =结果;}

解决方案

查看源$ C $ C为 org.apache.poi.hwpf.converter.WordToHtmlConverter 在
结果
结果
<一href=\"http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/WordToHtmlConverter.java?view=markup&pathrev=1180740\" rel=\"nofollow\">http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/WordToHtmlConverter.java?view=markup&pathrev=1180740
结果
结果
它在JavaDoc规定：

的此实现不创建图像或链接到他们。这可以是
通过重写改变{@link #processImage（元素，布尔，图片）}方法的

如果你看看那个 processImage来（...）方法AbstractWordConverter.java在行790，它看起来像方法正在调用然后命名为另一种方法 processImageWithoutPicturesManager（...）。
结果
结果
<一href=\"http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/AbstractWordConverter.java?view=markup&pathrev=1180740\" rel=\"nofollow\">http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/AbstractWordConverter.java?view=markup&pathrev=1180740
结果
结果
这种方法在 WordToHtmlConverter 定义再次和详细的可疑看起来像你想要增加你的code（317线）的地方：

  @覆盖
保护无效processImageWithoutPicturesManager（元currentBlock，
    布尔内联，画中画）
{
    //没有默认的实现 - 跳
    currentBlock.appendChild（htmlDocumentFacade.document
    .createComment（图像链接'
    + picture.suggestFullFileName（）+可以在这里））;
}

我觉得你的地步，启动图像插入流。

创建转换器的一个子类，例如，搜索

 公共类InlineImageWordToHtmlConverter扩展WordToHtmlConverter

然后重写方法和地点的任何code进去。
结果
结果
我没有测试它，但它应该是从我所看到的理论上的正确方式。

I have some code that uses the Java Apache POI library to open a Microsoft word document and convert it to html, using the the Apache POI and it also gets the byte array data of images on the document. But I need to convert this information to html to write out to an html file. Any hints or suggestions would be appreciated. Keep in mind that I am a desktop dev developer and not a web programmer, so when you make suggestions, please remember that. The code below gets the image.

 private void parseWordText(File file) throws IOException {
      FileInputStream fs = new FileInputStream(file);
      doc = new HWPFDocument(fs);
      PicturesTable picTable = doc.getPicturesTable();
      if (picTable != null){
           picList = new ArrayList<Picture>(picTable.getAllPictures());
           if (!picList.isEmpty()) {
           for (Picture pic : picList) {
                byte[] byteArray = pic.getContent();
                pic.suggestFileExtension();
                pic.suggestFullFileName();
                pic.suggestPictureType();
                pic.getStartOffset();
           }
        }
     }

Then the code below this converts the document to html. Is there a way to add the byteArray to the ByteArrayOutputStream in the code below?

private void convertWordDoctoHTML(File file) throws ParserConfigurationException, TransformerConfigurationException, TransformerException, IOException {
    HWPFDocumentCore wordDocument = null;
    try {
        wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream(file));
    } catch (IOException ex) {
        Exceptions.printStackTrace(ex);
    }

    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
    wordToHtmlConverter.processDocument(wordDocument);
    org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
    NamedNodeMap node = htmlDocument.getAttributes();


    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());
    acDocTextArea.setText(newDocText);

    htmlText = result;

}

解决方案

Looking at the source code for the org.apache.poi.hwpf.converter.WordToHtmlConverter at

http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/WordToHtmlConverter.java?view=markup&pathrev=1180740

It states in the JavaDoc:

This implementation doesn't create images or links to them. This can be changed by overriding {@link #processImage(Element, boolean, Picture)} method

If you take a look at that processImage(...) method in AbstractWordConverter.java at line 790, it looks like the method is calling then another method named processImageWithoutPicturesManager(...).

http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/AbstractWordConverter.java?view=markup&pathrev=1180740

This method is defined in WordToHtmlConverter again and looks suspiciously exact like the place you want to grow your code (line 317):

@Override
protected void processImageWithoutPicturesManager(Element currentBlock,
    boolean inlined, Picture picture)
{
    // no default implementation -- skip
    currentBlock.appendChild(htmlDocumentFacade.document
    .createComment("Image link to '"
    + picture.suggestFullFileName() + "' can be here"));
}

I think you have the point where to start inserting the images into the flow.

Create a subclass of the converter, e.g.

    public class InlineImageWordToHtmlConverter extends WordToHtmlConverter

and then override the method and place whatever code into it.

I haven't tested it, but it should be the right way from what I see theoretically.

这篇关于我需要的Apache POI图片从Word文档转换为HTML文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我需要的Apache POI图片从Word文档转换为HTML文件 [英] I need Apache POI Pictures converted from a word document to a html file

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

我需要的Apache POI图片从Word文档转换为HTML文件 [英] I need Apache POI Pictures converted from a word document to a html file

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭