html to pdf转换，西里尔字符不能正常显示 [英] html to pdf convert, cyrillic characters not displayed properly

查看：798 发布时间：2017/8/16 20:04:31 pdf encoding fonts itext

本文介绍了html to pdf转换，西里尔字符不能正常显示的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的pdf字体有问题。我使用了一种从html生成pdf的方法，在本地机器上运行正常，这是Windows操作系统，但现在在linux上，西里尔文本显示有问号。我检查了那里的字体，但事实证明，需要字体。现在我转到另一种方法，如下所示。

I have a problem with pdf fonts. I have used a method for generating pdf from html which worked fine on my local machine which is windows OS, but now on linux Cyrillic text is displayed with question marks. I checked for fonts there but it turned out that there were required fonts. Now I switched to another method which is shown below.

    Document document = new Document(PageSize.A4);
    String myFontsDir = "C:\\";
    String filePath = AppProperties.downloadLocation + "Order_" + orderID + ".pdf";
    try {
        OutputStream file = new FileOutputStream(new File(filePath));
        PdfWriter writer = PdfWriter.getInstance(document, file);
        int iResult = FontFactory.registerDirectory(myFontsDir);
        if (iResult == 0) {
            System.out.println("TestPDF(): Could not register font directory " + myFontsDir);
        } else {
            System.out.println("TestPDF(): Registered font directory " + myFontsDir);
        }

        document.open();
        String htmlContent = "<html><head>"
                + "<meta http-equiv=\"content-type\" content=\"application/xhtml+xml; charset=UTF-8\"/>"
                + "</head>"
                + "<body>"
                + "<h4 style=\"font-family: arialuni, arial; font-size:16px; font-weight: normal; \" >"
                + "Здраво Kristijan!"
                + "</h4></body></html>";
        InputStream inf = new ByteArrayInputStream(htmlContent.getBytes("UTF-8"));

        XMLWorkerFontProvider fontImp = new XMLWorkerFontProvider(myFontsDir);
        FontFactory.setFontImp(fontImp);
        XMLWorkerHelper.getInstance().parseXHtml(writer, document, inf, null, null, fontImp);

        document.close();
        System.out.println("Done.");
    } catch (Exception e) {
        e.printStackTrace();
    }

这个平安的代码我能够从拉丁文本生成正确的pdf，但西里尔字体与奇怪的字符一起显示。这在Windows上发生，我还没有在Linux上测试。任何编码或字体的建议？

with this peace of code I am able to generate proper pdf from latin text, but cyrillic is displayed with weird characters. This happens on Windows, I haven't yet test it on Linux. Any advice for encoding or font?

提前感谢

推荐答案

首先，很难相信您的字体目录是 C：\\ 。你假设你有一个路径 C：\\\arialuni.ttf 的文件，而我认为MS Arial Unicode的路径是 C ：\\windows\fonts\arialuni.ttf 。

First this: it is very hard to believe that your font directory is C:\\. You are assuming that you have a file with path C:\\arialuni.ttf whereas I assume that the path to MS Arial Unicode is C:\\windows\fonts\arialuni.ttf.

其次：我不认为 arialuni 是正确的名称。我确定它是 arial unicode ms 。您可以通过运行以下代码来检查此代码：

Secondly: I don't think arialuni is the correct name. I'm pretty sure it's arial unicode ms. You can check this by running this code:

XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("c:/windows/fonts/arialuni.ttf");
for (String s : fontProvider.getRegisteredFamilies()) {
    System.out.println(s);
}

输出应为：

courier
arial unicode ms
zapfdingbats
symbol
helvetica
times
times-roman

这些是你可以使用的值; arialuni 不是其中之一。

These are the values you can use; arialuni isn't one of them.

另外：你不是在错误的地方定义字符集？

Also: aren't you defining the character set in the wrong place?

我稍微调整了您的源代码，因为我将HTML存储在HTML文件中 cyrillic.html ：

I have slightly adapted your source code in the sense that I stored the HTML in an HTML file cyrillic.html:

<html>
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8"/>
</head>
<body>
<h4 style="font-family: Arial Unicode MS, FreeSans; font-size:16px; font-weight: normal; " >Здраво Kristijan!</h4>
</body>
</html>

请注意，我将 arialuni 替换为 Arial Unicode MS ，我使用 FreeSans 作为替代字体。在我的代码中，我使用 FreeSans.ttf 而不是 arialttf 。

Note that I replaced arialuni with Arial Unicode MS and that I used FreeSans as an alternative font. In my code, I used FreeSans.ttf instead of arialttf.

请参阅 ParseHtml11 ：

public static final String DEST = "results/xmlworker/cyrillic.pdf";
public static final String HTML = "resources/xml/cyrillic.html";
public static final String FONT = "resources/fonts/FreeSans.ttf";

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    XMLWorkerFontProvider fontImp = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
    fontImp.register(FONT);
    FontFactory.setFontImp(fontImp);
    XMLWorkerHelper.getInstance().parseXHtml(writer, document,
            new FileInputStream(HTML), null, Charset.forName("UTF-8"), fontImp);
    // step 5
    document.close();
}

如您所见，我使用 Charset 解析HTML。结果如下：

As you can see, I use the Charset when parsing the HTML. The result looks like this:

如果您坚持使用Arial Unicode，只需替换此行：

If you insist on using Arial Unicode, just replace this line:

public static final String FONT = "resources/fonts/FreeSans.ttf";

使用这一个：

public static final String FONT = "c:/windows/fonts/arialuni.ttf";

我已经在Windows机器上测试过，它也可以工作：

I have tested this on a Windows machine and it works too:

这篇关于html to pdf转换，西里尔字符不能正常显示的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

html to pdf转换，西里尔字符不能正常显示 [英] html to pdf convert, cyrillic characters not displayed properly

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

html to pdf转换，西里尔字符不能正常显示 [英] html to pdf convert, cyrillic characters not displayed properly

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭