标记文件时 - 丹麦字符消失,PDF无效 [英] When stamping document - Danish characters disappear and PDF becomes invalid

查看:180
本文介绍了标记文件时 - 丹麦字符消失,PDF无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Oracle BI Publisher中生成了一个PDF。它包含一个图形和一些文本。当试图用图像标记文档时 - 图像被添加,但丹麦字符被破坏。

I have a PDF generated in Oracle BI Publisher. It contains a graph and some text. When trying to stamp the document with an image - The image gets added, but the Danish characters are destroyed.

我像这样运行iText Stamp:

I run iText Stamp like this:

static void stampPdf() throws IOException, DocumentException {
    PdfReader reader = new PdfReader(PDF_SOURCE_FILE);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(
            PDF_STAMPED_FILE));
    Image img = Image.getInstance(WATERMARK);
    img.setAbsolutePosition(10, 100);
    PdfContentByte under = stamper.getUnderContent(1);
    under.addImage(img);
    stamper.close();
}

因此,我收到以下消息:文件无效。但文档显示,包括添加的图像。丹麦字符已被替换。

As a result, I get the following the message: Document invalid. But the document displays, including the added image. The Danish characters have become substituted.

所有字体都已从文档属性中删除。

All fonts has been removed from Document properties.

有没有人看到过某些东西喜欢这样吗?我以前做了好几次没有问题。

Has anyone seen something like this before? I have done it several times before, without problems.

推荐答案

我看过PDF并且它不是iText问题。这是一个垃圾进,垃圾出的问题。请在Acrobat中打开PDF并分析它是否存在语法错误。您将收到以下消息:

I have taken a look at the PDF and it's not an iText problem. It's a "Garbage In, Garbage Out" problem. Please open the PDF in Acrobat and analyze it for syntax errors. You'll get the following message:

PDF的内容流是错误的,甚至Acrobat也无法分析它并告诉你出了什么问题。

The content stream of the PDF is wrong in a way that even Acrobat can't analyze it and tell you what is wrong.

所以我查看了文件内部,当看起来好像iText看不到页面的页面资源时。页面资源是指字体。如果iText无法看到页面资源,iText无法看到字体,并且在此过程中它们会丢失。

So I've looked inside the file, and when it looks as if iText can't see the page resources for the page. The page resources refer to the fonts. If iText can't see the page resources, iText can't see the fonts and they get lost in the process.

如果Acrobat允许我分析并修复,然后我可以创建一个固定的PDF并比较修复的内容。但是,由于Acrobat无法修复该文件,因此手动完成整个文件以找出它究竟是什么问题需要做很多工作。出于好奇,我在文本编辑器中打开了文档,我发现了这个:

If Acrobat would allow me to "Analyze and fix", then I could create a fixed PDF and compare what was fixed. But as Acrobat can't fix the file, it's a lot of work to go through the complete file manually to find out what exactly is wrong with it. Out of curiosity, I opened the document in a text editor, and I found this:

4 0 obj
<<
/ProcSet [ /PDF /Text ]
/Font << 
/F1 7 0 R
/F2 8 0 R
/F3 11 0 R
>>
/Shading << 
/grad0 10 0 R
/grad0#2 15 0 R
/grad1#2 17 0 R
/grad2#2 19 0 R
/grad3#2 21 0 R
/grad4#2 23 0 R
/grad5#2 25 0 R
>>
>>
endobj

问题是由名字 / grad0#引起的2 / grad1#2 等......这些都不是有效名称。让我引用ISO-32000-1:

The problem is caused by the names /grad0#2, /grad1#2, etc... Those aren't valid names. Let me quote from ISO-32000-1:


在PDF文件中写一个名字时,SOLIDUS(2Fh)(/)应使用
来引入一个名字。 SOLIDUS不是名称的一部分,但是是
前缀,表示后面是一个字符序列
代表PDF文件中的名称,并且应遵循以下规则:

When writing a name in a PDF file, a SOLIDUS (2Fh) (/) shall be used to introduce a name. The SOLIDUS is not part of the name but is a prefix indicating that what follows is a sequence of characters representing the name in the PDF file and shall follow these rules:

a)名称中的数字标志(23h)(#)应使用其
2位十六进制代码(23),前面带有数字标志。

a) A NUMBER SIGN (23h) (#) in a name shall be written by using its 2-digit hexadecimal code (23), preceded by the NUMBER SIGN.

b)名称中的常规字符(NUMBER
SIGN除外)中的任何字符都应自行编写或使用其2位十六进制
代码,前面是NUMBER SIGN。

b) Any character in a name that is a regular character (other than NUMBER SIGN) shall be written as itself or by using its 2-digit hexadecimal code, preceded by the NUMBER SIGN.

c)任何不是
正则字符的字符都应使用其2位十六进制代码编写,
仅以NUMBER SIGN开头。

c) Any character that is not a regular character shall be written using its 2-digit hexadecimal code, preceded by the NUMBER SIGN only.

在您的情况下,您有一个数字符号(#)后跟一位数字。这没有任何意义。 PDF无效。

In your case, you have a NUMBER SIGN (#) followed by a 1-digit number. That doesn't make any sense. The PDF is invalid.

长话短说:联系PDF制作人并要求他解决问题或再也不用他的工具。

Long story short: contact the producer of the PDF and ask him to fix the problem or never use his tools again.

这篇关于标记文件时 - 丹麦字符消失,PDF无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆