从HTML转换到DocX时如何处理特殊字符 [英] How to handle special characters when converting from HTML to DocX

查看:157
本文介绍了从HTML转换到DocX时如何处理特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用DocX4J将html文件转换为DocX的应用程序。
我有特殊字符的问题,例如ç,á,é,í,等等。
我在html文件中的文本字体是Arial,但是当我将它们转换为DocX时,前面提到的特殊字符被设置为calibri字体。因此,在同一个词中(例如Cláudio),我有用Arial字体书写的Cl,用Calibri字体的á字体和用Arial字体的udio。

我看到,也许我必须在w:r中设置字体属性,但是我很难看到如何对我的文本的所有运行进行转换。此外,我无法看到如何在我的转换代码中执行此操作,如下所示(使用示例html)。



关于如何操作的任何提示或建议这个转换和处理这些特殊字符会非常棒。



干杯。

  public WordprocessingMLPackage export(String xhtml){

WordprocessingMLPackage wordMLPackage = null;
尝试{
wordMLPackage = WordprocessingMLPackage.createPackage();
XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
列表< Object> content = importer.convert(xhtml,null);
wordMLPackage.getMainDocumentPart()。getContent()。addAll(content);
}
catch(Docx4JException e){
// ...
}
返回wordMLPackage;






 < HTML> 
< head>
< meta charset =ISO-8859-1/>
< style type =text / css>
h1 {
page-break-before:always;
}

p,h1 {
font-family:Arial;
font-size:12pt;
}

p {
line-height:150%;
}

h1 {
font-weight:bold;
line-height:130%
}
< / style>
< / head>
< body>
< h1> RESUMO< br />< / h1>
< p>
< span> Um resumo para orelatório。< / span>< br />
< / p>
< / body>
< / html>


解决方案

在JasonPlutext给出的提示之后,我找到了一个例子如何在DocX4J论坛上将字体映射到XHTMLImporter( http://www.docx4java.org/forums/docx-java-f6/docx-to-html-and-back-to-docx-t1913.html )。



现在我的代码正在运行!
查看下面的最终版本。






  public WordprocessingMLPackage export(String xhtml){

WordprocessingMLPackage wordMLPackage = null;
尝试{
RFonts arialRFonts = Context.getWmlObjectFactory()。createRFonts();
arialRFonts.setAscii(Arial);
arialRFonts.setHAnsi(Arial);
XHTMLImporterImpl.addFontMapping(Arial,arialRFonts);

wordMLPackage = WordprocessingMLPackage.createPackage();
XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
列表< Object> content = importer.convert(xhtml,null);
wordMLPackage.getMainDocumentPart()。getContent()。addAll(content);
}
catch(Docx4JException e){
// ...
}
返回wordMLPackage;
}


I have a application that converts html files to DocX using DocX4J. I´m having problems with special characters like ç,á,é,í,ã,etc. My text font in the html files is Arial but when I convert them to DocX the special characters mentioned before are set to calibri font. So, in the same word (e.g Cláudio), I have "Cl" written in Arial font, "á" character in Calibri font and "udio" in Arial font.

I saw that maybe I have to set font property in w:r but I´m having difficulty to see how to do it to all runs of my text been converted. Also, I can´t see how to do it in my conversion code, that is listed below (with a sample html).

Any tip or suggestion about how to do this conversion and handle those special characters would be really great.

Cheers.

public WordprocessingMLPackage export(String xhtml) {

WordprocessingMLPackage wordMLPackage = null;
try {
    wordMLPackage = WordprocessingMLPackage.createPackage();
    XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
    List<Object> content = importer.convert(xhtml,null);
    wordMLPackage.getMainDocumentPart().getContent().addAll(content);
}
catch (Docx4JException e) {
    // ...
}
return wordMLPackage;
}


<html>
<head>
<meta charset="ISO-8859-1" />
<style type="text/css">
h1 {
    page-break-before: always;
}

p, h1 {
    font-family: Arial;
    font-size: 12pt;
}

p {
    line-height: 150%;
}

h1 {
    font-weight: bold;
    line-height: 130%
}
</style>
</head>
<body>
    <h1>RESUMO<br /></h1>
<p>
    <span>Um resumo para o relatório.</span><br />
</p>
</body>
</html>

解决方案

Following the tip given by JasonPlutext, I found an example of how to map a font to the XHTMLImporter at the DocX4J forum (http://www.docx4java.org/forums/docx-java-f6/docx-to-html-and-back-to-docx-t1913.html).

Now my code is working! See the final version below.


public WordprocessingMLPackage export(String xhtml) {

WordprocessingMLPackage wordMLPackage = null;
try {
    RFonts arialRFonts = Context.getWmlObjectFactory().createRFonts();
    arialRFonts.setAscii("Arial");
    arialRFonts.setHAnsi("Arial");
    XHTMLImporterImpl.addFontMapping("Arial", arialRFonts);

    wordMLPackage = WordprocessingMLPackage.createPackage();
    XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
    List<Object> content = importer.convert(xhtml,null);
    wordMLPackage.getMainDocumentPart().getContent().addAll(content);
}
catch (Docx4JException e) {
    // ...
}
return wordMLPackage;
}

这篇关于从HTML转换到DocX时如何处理特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆