PDFBox 打乱文本 [英] PDFBox scrambling the text

查看:36
本文介绍了PDFBox 打乱文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试编辑 PDF 文档以预填表单条目.我已经让它工作了(有点).我添加的文字,很好.但是,已经存在的其他文本似乎已被替换为&%£!£! 符号.我发现它与下面代码中的contentStream"部分有关.它似乎是setFont"行.如果我删除它,页面仍然可以...除了不再显示Hello Richard"文本!

I have been trying to edit a PDF document to pre-fill form entries. I've got it working (sort of). The text I'm adding, goes in fine. However, other text that was already there seems to have gotten replaced with "&%£!£! symbols. I've worked out that it's something to do with the "contentStream" section in the code below. It seems to be the "setFont" line. If I remove it, the page remains OK... except that the "Hello Richard" text is no longer displayed!

请帮忙!

package pdfboxtest;

import java.awt.Color;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;

public class PDFFormFiller {

    private static final String R40_NEW_FORM_PATH = "c:\\temp\\hmrc-r40.pdf";
    private static final String R40_COMPLETED_FORM_PATH = "c:\\temp\\hmrc-r40-complete.pdf";

    public static void main(String[] args) throws Exception {
        PDDocument doc = PDDocument.load(R40_NEW_FORM_PATH);

        addTextToPage(doc);

        doc.save(R40_COMPLETED_FORM_PATH);
        doc.close();
    }

    private static void addTextToPage(PDDocument doc) throws Exception {
        List pages = doc.getDocumentCatalog().getAllPages();
        PDPage firstPage = (PDPage) pages.get(0);
        PDPageContentStream contentStream = new PDPageContentStream(doc, firstPage, true, true);

        contentStream.setFont(PDType1Font.HELVETICA_BOLD, 24);
        contentStream.beginText();
        contentStream.setNonStrokingColor(Color.BLACK);
        contentStream.moveTextPositionByAmount(100, 200);
        contentStream.drawString("HELLO RICHARD!!");
        contentStream.endText();
        contentStream.close();

    }
}

推荐答案

正如在评论中已经假设的那样,这是 由于 PDFBox 问题,我在 这个答案. 这个问题在 1.8.2 版本中仍然存在PDFBox 但同时已针对 1.8.3 和 2.0.0 版进行了修复,参见.PDFBOX-1753.

As already assumed in a comment, this is due to a PDFBox issue I described a workaround for in this answer. This issue is still present in the version 1.8.2 of PDFBox but meanwhile has been fixed for versions 1.8.3 and 2.0.0, cf. PDFBOX-1753.

在您的情况下,解决方法更改了 addTextToPage 方法,如下所示:

In your case the workaround changes the addTextToPage method like this:

private static void addTextToPage(PDDocument doc) throws IOException {
    List pages = doc.getDocumentCatalog().getAllPages();
    PDPage firstPage = (PDPage) pages.get(0);
    PDPageContentStream contentStream = new PDPageContentStream(doc, firstPage, true, true);

    firstPage.getResources().getFonts(); // <<<<<<

    contentStream.setFont(PDType1Font.HELVETICA_BOLD, 24);
    contentStream.beginText();
    contentStream.setNonStrokingColor(Color.BLACK);
    contentStream.moveTextPositionByAmount(100, 200);
    contentStream.drawString("HELLO RICHARD!!");
    contentStream.endText();
    contentStream.close();
}

添加的行强制执行一个初始化,new PDPageContentStream 忘记了,但 setFont 指望已经完成.您可以在上面引用的答案中找到详细信息.您可能想通知 PDFBox 开发.

The added line enforces an initialization which new PDPageContentStream forgets but setFont counts on having been done. You can find details in the answer referenced above. You might want to inform PDFBox development.

这篇关于PDFBox 打乱文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆