.DOC转换为使用Apache POI在Java中.HTML [英] Converting .doc to .html in Java using Apache POI
问题描述
我要转换的文档 .DOC
包含一些图像。如何将其转换为 *。HTML
,以便图像将保持相同的位置?如何存储在名为单独的文件夹图片
的图像,并使用该文件夹作为图像的来源?
我的code:
进口java.io.BufferedWriter中;
进口java.io.DataOutputStream中;
进口的java.io.File;
进口java.io.FileInputStream中;
进口java.io.FileOutputStream中;
进口java.io.IOException异常;
进口java.io.OutputStreamWriter中;
进口java.io.StringWriter中;
进口javax.swing.JEditorPane中;
进口javax.swing.JFrame中;
进口javax.swing.JScrollPane中;
进口javax.xml.parsers.DocumentBuilderFactory中;
进口javax.xml.transform.OutputKeys;
javax.xml.transform.Transformer中的进口;
进口javax.xml.transform.TransformerFactory中;
进口javax.xml.transform.dom.DOMSource中;
javax.xml.transform.stream.StreamResult中的进口;
进口org.apache.poi.hwpf.HWPFDocument;
进口org.apache.poi.hwpf.converter.WordToHtmlConverter;
进口org.apache.poi.hwpf.extractor.WordExtractor;
进口org.apache.poi.xwpf.converter.core.FileImageExtractor;
进口org.apache.poi.xwpf.converter.core.FileURIResolver;
进口org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
进口org.w3c.dom.Document中;公共类TestWordToHtmlConverter {
私人文件DOCFILE;
私人档案文件; 公共TestWordToHtmlConverter(文件DOCFILE){
this.docFile = DOCFILE;
} 公共无效转换(档案文件){
this.file =文件; 尝试{
的FileInputStream finStream =新的FileInputStream(docFile.getAbsolutePath());
HWPFDocument DOC =新HWPFDocument(finStream);
WordExtractor wordExtract =新WordExtractor(DOC);
文档新建文档= DocumentBuilderFactory.newInstance().newDocumentBuilder()新建文档()。
WordToHtmlConverter wordToHtmlConverter =新WordToHtmlConverter(新建文档);
wordToHtmlConverter.processDocument(DOC); StringWriter的StringWriter的=新的StringWriter();
变压器变压器= TransformerFactory.newInstance()newTransformer()。 transformer.setOutputProperty(OutputKeys.INDENT,是);
transformer.setOutputProperty(OutputKeys.ENCODING,UTF-8);
transformer.setOutputProperty(OutputKeys.METHOD,HTML);
transformer.transform(新为DOMSource(wordToHtmlConverter.getDocument()),新StreamResult(StringWriter的)); 串的html = stringWriter.toString();
FOS的FileOutputStream =新的FileOutputStream(新文件(HTML / sample.html));
DataOutputStream类DOS; 尝试{
BufferedWriter将出=新的BufferedWriter(新OutputStreamWriter(FOS,UTF-8));
out.write(HTML);
out.close();
}
赶上(IOException异常五){
e.printStackTrace();
} / * JEditorPane中editorPane =新的JEditorPane();
editorPane.setContentType(text / html的);
editorPane.setEditable(假); editorPane.setPage(file.toURI()的toURL()); JScrollPane的滚动窗格=新JScrollPane的(editorPane);
JFrame的F =新的JFrame(显示HTML文件);
f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
f.getContentPane()加(滚动面板)。
f.setSize(512,342);
f.setVisible(真); * / }赶上(例外五){
e.printStackTrace();
}
} 公共静态无效的主要(字符串ARGS []){
TestWordToHtmlConverter TTC =新TestWordToHtmlConverter(新文件(DOCX /了Sample.doc));
TTC.convert(TTC.docFile);
}
}
这实现不创建图像或链接到他们。这可以
通过覆盖AbstractWordConverter.processImage(要素被改变,
布尔,图片)方法
块引用>解决方案作为API文档说:
WordToHtmlConverter
不产生图像或它们的链接。这可以
通过覆盖AbstractWordConverter.processImage(元素,布尔,资料图片)
方法来改变。
块引用>如何重写方法,你可以在这里找到:
您可以尝试使用基于Apache POI XWPF DOCX 2 XHTML转换器:
您也可以使用的Apache提卡,建的的Apache POI 。其中包括在露天可以在这里找到:
还有很多其他的转换器。
I want to convert a document
.doc
that contains some images. How to convert it to*.html
, so that the images will remain same position? How to store those images in separate folder namedimage
and use this folder as a source for image?My code:
import java.io.BufferedWriter; import java.io.DataOutputStream; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStreamWriter; import java.io.StringWriter; import javax.swing.JEditorPane; import javax.swing.JFrame; import javax.swing.JScrollPane; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.converter.WordToHtmlConverter; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.xwpf.converter.core.FileImageExtractor; import org.apache.poi.xwpf.converter.core.FileURIResolver; import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions; import org.w3c.dom.Document; public class TestWordToHtmlConverter { private File docFile; private File file; public TestWordToHtmlConverter(File docFile) { this.docFile = docFile; } public void convert(File file) { this.file = file; try { FileInputStream finStream=new FileInputStream(docFile.getAbsolutePath()); HWPFDocument doc=new HWPFDocument(finStream); WordExtractor wordExtract=new WordExtractor(doc); Document newDocument = DocumentBuilderFactory.newInstance() .newDocumentBuilder().newDocument(); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument) ; wordToHtmlConverter.processDocument(doc); StringWriter stringWriter = new StringWriter(); Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); transformer.setOutputProperty(OutputKeys.METHOD, "html"); transformer.transform(new DOMSource( wordToHtmlConverter.getDocument()), new StreamResult( stringWriter ) ); String html = stringWriter.toString(); FileOutputStream fos=new FileOutputStream(new File("html/sample.html")); DataOutputStream dos; try { BufferedWriter out = new BufferedWriter(new OutputStreamWriter(fos,"UTF-8")); out.write(html); out.close(); } catch (IOException e) { e.printStackTrace(); } /*JEditorPane editorPane = new JEditorPane(); editorPane.setContentType("text/html"); editorPane.setEditable(false); editorPane.setPage(file.toURI().toURL()); JScrollPane scrollPane = new JScrollPane(editorPane); JFrame f = new JFrame("Display Html File"); f.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); f.getContentPane().add(scrollPane); f.setSize(512, 342); f.setVisible(true);*/ } catch(Exception e) { e.printStackTrace(); } } public static void main(String args[]) { TestWordToHtmlConverter TTC=new TestWordToHtmlConverter(new File("docx/sample.doc")); TTC.convert(TTC.docFile); } }
This implementation doesn't create images or links to them. This can be changed by overriding AbstractWordConverter.processImage(Element, boolean, Picture) method
解决方案As said in API docs:
WordToHtmlConverter
doesn't create images or links to them. This can be changed by overridingAbstractWordConverter.processImage(Element, boolean, Picture)
method.How to override method you can found here:
You can try using DOCX 2 XHTML converter based on Apache POI XWPF:
Also you can use Apache Tika, built on top of Apache POI. An example that included in Alfresco can be found here:
There are also many other converters.
这篇关于.DOC转换为使用Apache POI在Java中.HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!