使用JAVA将.docx转换为HTML [英] Convert .docx to HTML using JAVA
本文介绍了使用JAVA将.docx转换为HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我尝试使用 WordToHtmlConverter
将.doc转换为HTML,并且效果很好。
I tried converting .doc to HTML by using WordToHtmlConverter
and it worked perfectly.
但是当我尝试要将.docx转换为HTML,我就会陷入困境。
But when i tried to convert .docx to HTML, i got stuck with it.
我尝试过:
我用过下面的代码将.docx转换为HTML:
I used the below code to convert .docx to HTML:
The code which i tried from : How to use Tika's XWPFWordExtractorDecorator class?
InputStream input = TikaInputStream.get(new File("C:\\Users\\Downloads\\filename.docx"));
Parser parser = new AutoDetectParser();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.setResult(new StreamResult(sw));
try {
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata, new ParseContext());
String xml = sw.toString();
System.out.print("tika : "+xml);
} finally {
input.close();
}
我得到的输出是,
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body/>
</html>
- 请说明我哪里出错?
- 有没有更好的方法将.docx转换为html字符串
感谢您的帮助,谢谢
推荐答案
此代码对我来说可以将.docx转换为html:
This code worked for me to convert .docx to html:
您还可以查看链接:代码链接
//convert .docx to HTML string
InputStream in= new FileInputStream(new File(path));
XWPFDocument document = new XWPFDocument(in);
XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("word/media")));
OutputStream out = new ByteArrayOutputStream();
XHTMLConverter.getInstance().convert(document, out, options);
String html=out.toString();
System.out.println(html);
这篇关于使用JAVA将.docx转换为HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文