docx4j转换html-> docx-> html [英] docx4j conversion html->docx->html

查看:350
本文介绍了docx4j转换html-> docx-> html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用docx4j进行第一个项目...我的目标是将xhtml从Webapp(ckeditor创建的html)导出到docx,在Word中进行编辑,然后将其重新导入ckeditor所见即所得。

I'm working on my first project using docx4j... My goal is to export xhtml from a webapp (ckeditor created html) into a docx, edit it in Word, then import it back into the ckeditor wysiwyg.

(*来自 http://www.docx4java.org/forums/xhtml-import-f28/html-docx-html-inserts- a-lot-of-space-t1966.html#p6791?sid = 78b64a02482926c4dbdbafbf50d0a914
会在回答时更新)

(*crosspost from http://www.docx4java.org/forums/xhtml-import-f28/html-docx-html-inserts-a-lot-of-space-t1966.html#p6791?sid=78b64a02482926c4dbdbafbf50d0a914 will update when answered)

我创建了一个html具有以下内容的测试文档:

I have created an html test document with the following contents:

<html><ul><li>TEST LINE 1</li><li>TEST LINE 2</li></ul></html>

然后,我的代码从html像这样创建一个docx:
b $ b .createPackage();

My code then creates a docx from this html like so: WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage .createPackage();

    NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
    wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
    ndp.unmarshalDefaultNumbering();

    XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
    xHTMLImporter.setHyperlinkStyle("Hyperlink");

    wordMLPackage.getMainDocumentPart().getContent()
            .addAll(xHTMLImporter.convert(new File("test.html"), null));

    System.out.println(XmlUtils.marshaltoString(wordMLPackage
            .getMainDocumentPart().getJaxbElement(), true, true));

    wordMLPackage.save(new java.io.File("test.docx"));

然后我的代码尝试将docx BACK转换为html,如下所示:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.createPackage();

My code then attempts to convert the docx BACK to html like so: WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage .createPackage();

    NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
    wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
    ndp.unmarshalDefaultNumbering();

    XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
    xHTMLImporter.setHyperlinkStyle("Hyperlink");

    WordprocessingMLPackage docx = WordprocessingMLPackage.load(new File("test.docx"));
    AbstractHtmlExporter exporter = new HtmlExporterNG2();
    OutputStream os = new java.io.FileOutputStream("test.html");
    HTMLSettings htmlSettings = new HTMLSettings();
    javax.xml.transform.stream.StreamResult result = new javax.xml.transform.stream.StreamResult(
            os);
    exporter.html(docx, result, htmlSettings);

返回的html是:

<?xml version="1.0" encoding="UTF-8"?><html xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<head>
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<style>
<!--/*paged media */ div.header {display: none }div.footer {display: none } /*@media print { */@page { size: A4; margin: 10%; @top-center {content: element(header) } @bottom-center {content: element(footer) } }/*element styles*/ .del  {text-decoration:line-through;color:red;} .ins {text-decoration:none;background:#c0ffc0;padding:1px;}
 /* TABLE STYLES */ 

 /* PARAGRAPH STYLES */ 
.DocDefaults {display:block;margin-bottom: 4mm;line-height: 115%;font-size: 11.0pt;}
.Normal {display:block;}

 /* CHARACTER STYLES */ span.DefaultParagraphFont {display:inline;}
-->
</style>
<script type="text/javascript">
<!--function toggleDiv(divid){if(document.getElementById(divid).style.display == 'none'){document.getElementById(divid).style.display = 'block';}else{document.getElementById(divid).style.display = 'none';}}
--></script>
</head>
<body>

  <!-- userBodyTop goes here -->




<div class="document">


<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">&bull;  <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 1</span>
</p>


<p class="Normal DocDefaults " style="text-align: left;position: relative; margin-left: 17mm;text-indent: -0.25in;margin-bottom: 0in;">&bull;  <span class="DefaultParagraphFont " style="font-weight: normal;color: #000000;font-style: normal;font-size: 11.0pt;">TEST LINE 2</span>
</p>
</div>







  <!-- userBodyTail goes here -->


</body>
</html>

现在每行之后都创建了很多额外的空间。不知道为什么会这样,转换似乎会增加很多额外的空格/回车符。

There is a lot of extra space created after each line now. Not sure why this is happening, the conversion appears to add a lot of extra white space/carriage returns.

推荐答案

目前尚不清楚您是否担心(X)HTML源文档中或呈现的页面中的空白(大概是在CKEditor中)。如果是后者,则浏览器和CK版本可能是相关的。

Its not clear from your question whether you are worried about whitespace in the (X)HTML source document, or in your page as rendered (presumably in CKEditor). If the latter, then the browser and CK version may be relevant.

空白可能 重要 ;

Whitespace may or may not be significant; try Googling 'xhtml significant whitespace' for more.

通过背景方式,具体取决于docx4j属性docx4j.Convert.Out.HTML.OutputMethodXML,docx4j将使用

By way of background, depending on docx4j property docx4j.Convert.Out.HTML.OutputMethodXML, docx4j will use

<xsl:output method="html" encoding="utf-8" omit-xml-declaration="no" indent="no" 
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
      doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="no" indent="no" 
        doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

请注意@method的值不同。如果您想要不同的东西,可以分别修改docx2html.xsl或docx2xhtml.xsl。

Note the different in the value of @method. If you want something different, you can modify docx2html.xsl or docx2xhtml.xsl respectively.

这篇关于docx4j转换html-&gt; docx-&gt; html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆