Java将DOC转换为PDF或HTML [英] Java Convert DOC to PDF or HTML

查看:125
本文介绍了Java将DOC转换为PDF或HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hello evryone,



我创建了一个将.doc转换为.pdf / .html的程序,发现这里的代码http://angelozerr.wordpress.com/2012/12/06/how-to -convert-docxodt-to-pdfhtml-with-java / [ ^ ]。我使用 XDocReport 库,样本工作正常。第一个库我不能运行它,第二个是有一些配置。



运行示例时,它会转换zip文件中的doc文件(下载)。它可以转换为pdf或html。但是当我尝试转换在我的计算机中创建的doc文件时,我收到了这个错误

线程中的异常 < span class =code-string> AWT-EventQueue-0 org.apache.poi.POIXMLException:org.apache.poi.openxml4j.exceptions.InvalidFormatException:包应包含内容类型部分[M1。< span class =code-digit> 13 ] 
at org.apache.poi.util.PackageHelper.open(PackageHelper.java: 41
at org.apache.poi.xwpf.usermodel.XWPFDocument。< init>(XWPFDocument.java: 120
at docconverter .Convert.ConvertToPDF(Convert.java: 32





转换代码:

  public   static   void  ConvertToPDF( String  docPath,字符串 pdfPath){
尝试 {
InputStream doc = new FileInputStream( new 文件(docPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream( new File(pdfPath));
PdfConverter.getInstance()。convert(document,out,options);
} catch (FileNotFoundException ex){
Logger.getLogger(转换。 class .getName())。log(Level.SEVERE,null,ex);
} catch (IOException ex){
Logger.getLogger(转换。 class .getName())。log(Level.SEVERE,null,ex);
}
}

public static void ConvertToHTML( String docPath, String htmlPath){
尝试 {
InputStream doc = new FileInputStream( new 文件(docPath));
XWPFDocument document = new XWPFDocument(doc);
XHTMLOptions options = XHTMLOptions.create();
OutputStream out = new FileOutputStream( new File(htmlPath));
XHTMLConverter.getInstance()。convert(document,out,options);
} catch (FileNotFoundException ex){
Logger.getLogger(转换。 class .getName())。log(Level.SEVERE,null,ex);
} catch (IOException ex){
Logger.getLogger(转换。 class .getName())。log(Level.SEVERE,null,ex);
}
}





此错误点

 XWPFDocument document =  new  XWPFDocument(doc); 





我不知道这是否是错误的原因

我想要转换的是.doc文件。如果它是真的,有人可以给我一个代码,想法或网址任何可以转换.doc / .docx到.pdf / .html

解决方案

XDocReport docx- > pdf转换器适用于docx而不适用于doc文件。



请注意,doc文件是二进制格式,尽管docx是一个由XML条目组成的zip。 />


因此错误包应包含内容类型部分[M1.13]意味着您的输入不是docx文件。


< blockquote>你可以在你的代码中添加这样的内容



  package  tcg .doc.web.managedBeans; 

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;

@Component
@ Scope session
@ Qualifier ConvertWord


public class ConvertWord {
private static final String docName = TestDocx.docx;
private static final < span class =code-sdkkeyword> String outputlFolderPath = d:/;


字符串 htmlNamePath = docHtml.html;
字符串 zipName = _ tmp.zip ;
文件docFile = 文件(outputlFolderPath + docName);
文件zipFile = 文件(zipName);




public void ConvertWordToHtml(){

尝试 {

// 1)将DOCX加载到XWPFDocument
InputStream doc = new FileInputStream( new 文件(outputlFolderPath + docName));
System.out.println( InputStream + doc);
XWPFDocument document = new XWPFDocument(doc);

// 2)准备XHTML选项(这里我们设置IURIResolver来加载来自的图像word / media文件夹)
XHTMLOptions options = XHTMLOptions.create(); // 。URIResolver(新FileURIResolver(新文件(word / media)));;

// 提取图片
字符串 root = target;
文件imageFolder = 文件(root + / images / + doc);
options.setExtractor( new FileImageExtractor(imageFolder));
// URI解析器
options.URIResolver( new FileURIResolver(imageFolder));


OutputStream out = new FileOutputStream( new 文件(htmlPath) ()));
XHTMLConverter.getInstance()。convert(document,out,options);


System.out.println( OutputStream + out.toString());
} catch (FileNotFoundException ex){

} catch ( IOException ex){

}
}

public static void main( String [] args){
ConvertWord cwoWord = new ConvertWord();
cwoWord.ConvertWordToHtml();
System.out.println();
}



public String htmlPath(){
// d:/docHtml.html
< span class =code-keyword> return
outputlFolderPath + htmlNamePath;
}

public String zipPath(){
// d:/ _ tmp.zip
return outputlFolderPath + zipName;
}

}





对于maven依赖于pom.xml



 <  依赖 >  
< groupid > fr.opensagres.xdocreport < / groupid >
< artifactid > org.apache.poi.xwpf.converter.xhtml < / artifactid >
< 版本 > 1.0.4 < / version >
< / dependency >









或从这里下载 http://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML [ ^ ]


Hello evryone,

I have create a program that converts .doc to .pdf/.html, found that code here http://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/[^]. I use the XDocReport library and samples works fine. The 1st library I can't run it, the 2nd is there are some configurations.

When running the samples, which converts the doc file that is in the zip file(downloaded). It can convert to pdf or html. But when I try converting doc file created in my computer I got this error

Exception in thread "AWT-EventQueue-0" org.apache.poi.POIXMLException: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
        at org.apache.poi.util.PackageHelper.open(PackageHelper.java:41)
        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:120)
        at docconverter.Convert.ConvertToPDF(Convert.java:32)



Convert Code:

public static void ConvertToPDF(String docPath, String pdfPath) {
    try {
        InputStream doc = new FileInputStream(new File(docPath));
        XWPFDocument document = new XWPFDocument(doc);
        PdfOptions options = PdfOptions.create();
        OutputStream out = new FileOutputStream(new File(pdfPath));
        PdfConverter.getInstance().convert(document, out, options);
    } catch (FileNotFoundException ex) {
        Logger.getLogger(Convert.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(Convert.class.getName()).log(Level.SEVERE, null, ex);
    }
}

 public static void ConvertToHTML(String docPath, String htmlPath) {
    try {
        InputStream doc = new FileInputStream(new File(docPath));
        XWPFDocument document = new XWPFDocument(doc);
        XHTMLOptions options = XHTMLOptions.create();
        OutputStream out = new FileOutputStream(new File(htmlPath));
        XHTMLConverter.getInstance().convert(document, out, options);
    } catch (FileNotFoundException ex) {
        Logger.getLogger(Convert.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(Convert.class.getName()).log(Level.SEVERE, null, ex);
    }
 }



the error points on this

XWPFDocument document = new XWPFDocument(doc);



I dont know if this is the cause of the error
What I'm trying to convert is .doc file. If its true, can someone give me an code, idea or url anything that can convert .doc/.docx to .pdf/.html

解决方案

The XDocReport docx->pdf converter works with docx and not with doc file.

Note that doc file is binary format although docx is a zip which is composed with XML entries.

So the error "Package should contain a content type part [M1.13]" means that your input is not a docx file.


You may add like this in your code

package tcg.doc.web.managedBeans;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;

@Component
@Scope("session")
@Qualifier("ConvertWord")


public class ConvertWord {
    private static final String docName = "TestDocx.docx";
    private static final String outputlFolderPath = "d:/";


    String htmlNamePath = "docHtml.html";
    String zipName="_tmp.zip";
    File docFile = new File(outputlFolderPath+docName);
    File zipFile = new File(zipName);




      public void ConvertWordToHtml() {

          try {

                // 1) Load DOCX into XWPFDocument
                InputStream doc = new FileInputStream(new File(outputlFolderPath+docName));
                System.out.println("InputStream"+doc);
                XWPFDocument document = new XWPFDocument(doc);

                // 2) Prepare XHTML options (here we set the IURIResolver to load images from a "word/media" folder)
                XHTMLOptions options = XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;

                // Extract image
                String root = "target";
                File imageFolder = new File( root + "/images/" + doc );
                options.setExtractor( new FileImageExtractor( imageFolder ) );
                // URI resolver
                options.URIResolver( new FileURIResolver( imageFolder ) );


                OutputStream out = new FileOutputStream(new File(htmlPath()));
                XHTMLConverter.getInstance().convert(document, out, options);


                System.out.println("OutputStream "+out.toString());
            } catch (FileNotFoundException ex) {

            } catch (IOException ex) {

            }
         }

      public static void main(String[] args) {
         ConvertWord cwoWord=new ConvertWord();
         cwoWord.ConvertWordToHtml();
         System.out.println();
    }



      public String htmlPath(){
        // d:/docHtml.html
          return outputlFolderPath+htmlNamePath;
      }

      public String zipPath(){
          // d:/_tmp.zip
          return outputlFolderPath+zipName;
      }

}



For maven Dependency on pom.xml

<dependency>
   <groupid>fr.opensagres.xdocreport</groupid>
   <artifactid>org.apache.poi.xwpf.converter.xhtml</artifactid>
   <version>1.0.4</version>
 </dependency>





or download it from here http://code.google.com/p/xdocreport/wiki/XWPFConverterXHTML[^]


这篇关于Java将DOC转换为PDF或HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆