使用ITextRenderer从HTML生成pdf文件时的编码问题 [英] Encoding issue while generating pdf file from HTML using ITextRenderer

查看:3358
本文介绍了使用ITextRenderer从HTML生成pdf文件时的编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用包含非拉丁字符的ITextRenderer生成一个pdf文档。在我的情况下,这里是保加利亚语。

I am trying to generate a pdf document using ITextRenderer that contains non-latin characters. In my case here is Bulgarian.

在调用ITextRenderer之前,我有一个 String内容,在一些进程(如解析整理) (我可以通过调试看到这个价值)

Before calling ITextRenderer, I have a String content that after some processes (like parsing with tidy) looks like that (I am able to see this value through debugging)

Sting content

td class="description">Вид на потока</td>
td class="description">Статус на потока</td>

上面只是我的String的一部分。这个内容包含一个有效的html语法。我只是放在这里的一小部分来澄清,直到这一部分,我的编码是正确的,因为我能够阅读保加利亚字符。

The above is just a part of my String. This content contains a valid html syntax. I just put here a small part of it to clarify that until this part, my encoding is right since I am able to read Bulgarian characters.

之后,以下代码将其创建文档,将其放在 itextrenderer 中,并生成 pdf 文件。因为我能够成功地生成英文版的pdf文件,因此这个代码已经被测试和 的内容。

After that, the following code takes place which creates a document, put it in itextrenderer and generate the pdf file. This code is already tested and working for contents of lating characters since I was able to successfully generate a pdf file for english language.

当我用另一种语言(保加利亚语)切换非拉丁字符时,问题出现。所生成的PDF忽略所有的保密字符,最终结果是一个带有大量空行的pdf。这是生成pdf的代码的一部分

The problem appears when I switch in another language (Bulgarian) with non latin characters. The generated PDF ignores all the bulgarian characters and the final result is a pdf with a lot of empty lines. This is the part of the code that generates the pdf

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

        dbf.setValidating(false);
        dbf.setNamespaceAware(false);
        dbf.setFeature("http://xml.org/sax/features/namespaces", false);
        dbf.setFeature("http://xml.org/sax/features/validation", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
        dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

        DocumentBuilder builder = dbf.newDocumentBuilder();

        Document doc = builder.parse(new ByteArrayInputStream(content.getBytes("UTF-8")));

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        InputStream is = null;

        ITextRenderer renderer = new ITextRenderer();

        renderer.getFontResolver().addFont("fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
        renderer.getFontResolver().addFont("fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
        renderer.getFontResolver().addFont("fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
        renderer.getFontResolver().addFont("fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);


        renderer.setDocument(doc, null);
        renderer.layout();
        renderer.createPDF(outputStream);
        outputStream.close();


        byte[] outputBytes = outputStream.toByteArray();
        is = new ByteArrayInputStream(outputBytes);
        response.setContentType("application");
        response.addHeader("Content-Disposition", "attachment; filename=\"" + "exported.pdf" + "\"");
        response.setContentLength(outputBytes.length);
        response.getOutputStream().write(inputStreamToBytes(is));

我尝试过几件事情(主要与编码有关),但遗憾的是我还没有找到解决方案。可能我在这里缺少一些明显的东西:)

I have tried several things (mainly related to encoding) but unfortunately I haven't found a solution yet. Probably I am missing something obvious here :)

我不知道这是否添加了任何值,但是我使用spring,而这个代码运行在Controller

I am not sure if this adds any value, but I am using spring and this code runs inside a Controller

任何帮助将不胜感激。

Thanx

推荐答案

您的HTML是否指定UTF-8编码?您的字体文件是否在该路径中找到?

Is your HTML specifying the UTF-8 encoding? Are your font files being found in that path?

查看这个要点说,它可以在Linux上为中文字符提供一个到系统默认字体位置的路径。

Take a look at this gist that says it works for Chinese characters on Linux by providing a path to the default location of fonts in the system.

这篇关于使用ITextRenderer从HTML生成pdf文件时的编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆