如何使用docx4j将带有Marathi文本的HTML文本写入PDF文档? [英] How to write HTML text with Marathi text to PDF document using docx4j?

查看:117
本文介绍了如何使用docx4j将带有Marathi文本的HTML文本写入PDF文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用docx4j从HTML文本创建PDF文档. HTML文本中包含一些英语和Marathi文本.英文文本正确出现在pdf中.但是marathi文本不会显示在生成的pdf中.

它代替文本显示方形框.

下面是我正在使用的代码.

import java.io.FileOutputStream;

import org.docx4j.Docx4J;
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

public class ConvertInXHTMLFragment {

    static String DEST_PDF = "/home/Downloads/Sample.pdf";

    public static void main(String[] args) throws Exception {

        // String content = "<html>Hello</html>";
        String content = "<html>पासवर्ड</html>";

        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();

        XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);

        wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(content, null));

        Docx4J.toPDF(wordMLPackage, new FileOutputStream(DEST_PDF));
    }

}

-

这是来自XSLFO的示例之一

import java.io.OutputStream;

import org.docx4j.Docx4J;
import org.docx4j.convert.out.FOSettings;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFont;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.model.fields.FieldUpdater;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.samples.AbstractSample;

public class ConvertOutPDFviaXSLFO extends AbstractSample {

    static {
        inputfilepath = "/home/Downloads/100.docx";;
        saveFO = true;
    }

    static boolean saveFO;

    public static void main(String[] args) 
            throws Exception {

        try {
            getInputFilePath(args);
        } catch (IllegalArgumentException e) {
        }

        String regex = null;
        PhysicalFonts.setRegex(regex);

        WordprocessingMLPackage wordMLPackage;
        System.out.println("Loading file from " + inputfilepath);
        wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));

        FieldUpdater updater = null;

        Mapper fontMapper = new IdentityPlusMapper();
        wordMLPackage.setFontMapper(fontMapper);

        PhysicalFont font = PhysicalFonts.get("Arial Unicode MS");
        fontMapper.put("Mangal", font);

        FOSettings foSettings = Docx4J.createFOSettings();
        if (saveFO) {
            foSettings.setFoDumpFile(new java.io.File(inputfilepath + ".fo"));
        }
        foSettings.setWmlPackage(wordMLPackage);

        String outputfilepath;
        if (inputfilepath==null) {
            outputfilepath = System.getProperty("user.dir") + "/OUT_FontContent.pdf";           
        } else {
            outputfilepath = inputfilepath + ".pdf";
        }
        OutputStream os = new java.io.FileOutputStream(outputfilepath);

        Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

        System.out.println("Saved: " + outputfilepath);

        if (wordMLPackage.getMainDocumentPart().getFontTablePart()!=null) {
            wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
        }

        // This would also do it, via finalize() methods
        updater = null;
        foSettings = null;
        wordMLPackage = null;
    }
}

现在,在输出PDF中,我得到了####代替Marathi文本.

解决方案

Docx4j v3.3通过两种完全不同的方式支持PDF输出.

默认为使用Plutext的PDF Converter.如果链接到的Mangal字体已安装在Conveter中并在docx中指定,则一切正常.

  <w:r>
    <w:rPr>
      <w:rFonts w:ascii="mangal" w:eastAsia="mangal" w:hAnsi="mangal" w:cs="mangal"/>
    </w:rPr>
    <w:t>पासवर्ड</w:t>
  </w:r>

同样适用于Arial Unicode MS.

另一种方式是通过XSL FO进行PDF;参见 https://github.com/plutext/docx4j-export-FO

如果您安装了相关的字体,它应该可以正常工作.如果不这样做,则需要告诉它使用哪种字体.

例如,假设docx指定了我没有的mangal字体.但是我有Arial Unicode MS.因此,我告诉XSL FO流程改用它:

fontMapper.put("mangal", PhysicalFonts.get("Arial Unicode MS"));

注意,您需要知道docx指定的字体,以及如何指定所需的字体.要在XHTML导入中执行此操作,请从我的答案复制到您之前的问题:-

字体由

EDIT 1:-

This is from one of the samples from XSLFO

import java.io.OutputStream;

import org.docx4j.Docx4J;
import org.docx4j.convert.out.FOSettings;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFont;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.model.fields.FieldUpdater;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.samples.AbstractSample;

public class ConvertOutPDFviaXSLFO extends AbstractSample {

    static {
        inputfilepath = "/home/Downloads/100.docx";;
        saveFO = true;
    }

    static boolean saveFO;

    public static void main(String[] args) 
            throws Exception {

        try {
            getInputFilePath(args);
        } catch (IllegalArgumentException e) {
        }

        String regex = null;
        PhysicalFonts.setRegex(regex);

        WordprocessingMLPackage wordMLPackage;
        System.out.println("Loading file from " + inputfilepath);
        wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));

        FieldUpdater updater = null;

        Mapper fontMapper = new IdentityPlusMapper();
        wordMLPackage.setFontMapper(fontMapper);

        PhysicalFont font = PhysicalFonts.get("Arial Unicode MS");
        fontMapper.put("Mangal", font);

        FOSettings foSettings = Docx4J.createFOSettings();
        if (saveFO) {
            foSettings.setFoDumpFile(new java.io.File(inputfilepath + ".fo"));
        }
        foSettings.setWmlPackage(wordMLPackage);

        String outputfilepath;
        if (inputfilepath==null) {
            outputfilepath = System.getProperty("user.dir") + "/OUT_FontContent.pdf";           
        } else {
            outputfilepath = inputfilepath + ".pdf";
        }
        OutputStream os = new java.io.FileOutputStream(outputfilepath);

        Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

        System.out.println("Saved: " + outputfilepath);

        if (wordMLPackage.getMainDocumentPart().getFontTablePart()!=null) {
            wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
        }

        // This would also do it, via finalize() methods
        updater = null;
        foSettings = null;
        wordMLPackage = null;
    }
}

Now, I get #### in place of Marathi texts in the output PDF.

解决方案

Docx4j v3.3 supports PDF output via 2 completely different ways.

The default is to use Plutext's PDF Converter. Things work if the mangal font you linked to is installed in the Conveter, and specified in the docx:

  <w:r>
    <w:rPr>
      <w:rFonts w:ascii="mangal" w:eastAsia="mangal" w:hAnsi="mangal" w:cs="mangal"/>
    </w:rPr>
    <w:t>पासवर्ड</w:t>
  </w:r>

Same would apply for Arial Unicode MS.

The other way is PDF via XSL FO; see https://github.com/plutext/docx4j-export-FO

If you have the relevant font installed it should just work. If you don't, then you need to tell it which font to use.

For example, suppose the docx specifies the mangal font, which I do not have. But I have Arial Unicode MS. So I tell the XSL FO process to use that instead:

fontMapper.put("mangal", PhysicalFonts.get("Arial Unicode MS"));

Note, you need to know which font your docx is specifying, and how to make specify the font you want. To do that in XHTML Import, copied from my answer to your earlier question:-

Fonts are handled by https://github.com/plutext/docx4j-ImportXHTML/blob/master/src/main/java/org/docx4j/convert/in/xhtml/FontHandler.java#L58

Marathi might be relying on one of the other attributes in the RFonts object. You'll need to look at a working docx to see. You can use https://github.com/plutext/docx4j-ImportXHTML/blob/master/src/main/java/org/docx4j/convert/in/xhtml/FontHandler.java#L54 to inject a suitable font mapping.

这篇关于如何使用docx4j将带有Marathi文本的HTML文本写入PDF文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆