即使内容相同,为什么PDF文件也不同? [英] Why are PDF files different even if the content is the same?

查看:151
本文介绍了即使内容相同,为什么PDF文件也不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常有不止一种创建PDF的方法 在PDF查看器中打开时看起来像同卵双胞胎的文档.即使你 使用完全相同的代码创建两个相同的PDF文档,会有很小的差异 在两个结果文件之间.这是PDF格式固有的."

"There’s usually more than one way to create PDF documents that look like identical twins when opened in a PDF viewer. And even if you create two identical PDF documents using the exact same code, there will be small differences between the two resulting files. That’s inherent to the PDF format."

我在第二版Itext"中阅读了此段落.(p 17).任何人都可以请我解释作者所谈论的差异.如果可以的话,pdf格式有此缺陷的原因.

i read this paragraph in "Itext in action-second edition".(p 17).Can anyone please explain me what kind of differences the author's talking about.and reason why pdf format has this defect if i may say.

推荐答案

在不同时刻创建的文件,对于CreationDate具有不同的值,并且具有不同的文件标识符(具有两个文件,在不同的位置创建此时,应具有PDF规范中定义的ID.

Files that are created on a different moment, have a different value for the CreationDate and they have different file identifiers (having two files, created on a different moment, should have a different ID as defined in the PDF specification).

文件标识符通常是根据日期,路径名,文件大小,PDF文件内容的一部分(例如信息字典中的条目)创建的哈希.我引用ISO-32000-1:

The file identifier is usually a hash created based on the date, a path name, the size of the file, part of the content of the PDF file (e.g. the entries in the information dictionary). I quote ISO-32000-1:

文件标识符的计算不需要可重复;全部 重要的是标识符可能是唯一的.为了 例如,前面算法的两个实现可能会使用 当前不同的格式,导致它们产生 同时创建的同一文件的不同文件标识符, 但标识符的唯一性不会受到影响.

The calculation of the file identifier need not be reproducible; all that matters is that the identifier is likely to be unique. For example, two implementations of the preceding algorithm might use different formats for the current time, causing them to produce different file identifiers for the same file created at the same time, but the uniqueness of the identifier is not affected.

文件标识符在加密文档时是必需的,因为它们在加密过程中使用.结果,具有不同文件标识符的加密PDF文件将具有完全不同的流.这不是缺陷,这是设计使然.我是致力于PDF 2.0规范的ISO委员会的成员,可以向您保证没有更改此计划的计划.即使使用相同的代码,在不同时间点创建的文件也会有所不同. (我也是您所指的书的作者.)

File identifiers are mandatory when encrypting a document because they are used in the encryption process. As a result, encrypted PDF files with different file identifiers will have streams that are completely different. This is not a flaw, this is by design. I'm a member of the ISO committee that is working on the PDF 2.0 specification and I can assure you that there are no plans to change this. Files created on a different point in time will be different, even when using the same code. (I'm also the author of the book you refer to.)

ISO规范还允许其他差异. 例如:出于任何原因,都可以重新组织用于在页面上显示图形和文本的语法. 请参见ISO-32000-1的8.2节,内容如下:

The ISO specification also allows other differences. For instance: the syntax that is used to display graphics and text on a page can be reorganized for whatever reason. See section 8.2 of ISO-32000-1 where it says:

重要的一点是,图形状态运算符的确切排列没有语义上的意义. PDF内容流的合格读取器或写入器可能会更改图形状态运算符的排列 达到每个图形对象相关图形状态参数的相同值的任何其他布置.

The important point is that there is no semantic significance to the exact arrangement of graphics state operators. A conforming reader or writer of a PDF content stream may change an arrangement of graphics state operators to any other arrangement that achieves the same values of the relevant graphics state parameters for each graphics object.

在处理PDF内容流时,PDF处理器可能会更改图形的排列 状态运算符到实现相关图形状态的相同值的任何其他布置 每个图形对象的参数.可以这样做来优化页面,使其更快地呈现, 使其更易于调试,改善压缩或其他任何原因.

When processing a PDF content stream a PDF processor may change an arrangement of graphics state operators to any other arrangement that achieves the same values of the relevant graphics state parameters for each graphics object. This can be done to optimize the page, to make it render more quickly, to make it easier to debug, to improve the compression, or for any other reason.

两个看似相同的PDF内部可能不同的另一个原因是PDF字典. 字典中键的顺序在PDF中没有任何重要性. 对字母执行规范的软件,例如将使用HashMap来编写键/值对. 根据JVM的不同,相同的代码可能会导致两个PDF的字典在语义上是相同的, 但其中的条目以不同的方式排序.这不是错误.这完全符合ISO-32000-1.

Another reason why two seemingly identical PDFs may differ internally concerns PDF dictionaries. The order of keys in a dictionary doesn't have any importance in PDF. Software that implements the specification to the letter, will for instance use a HashMap to story key/value pairs. Depending on the JVM, the same code can lead to two PDFs with dictionaries that are semantically identical, but of which the entries are sorted in a different way. This is not an error. This is completely compliant with ISO-32000-1.

重要:使用相同代码但在不同时刻创建的两个PDF文件之间的内部差异可能不会导致在PDF查看器中打开文档或打印文档时的视觉差异.纸上文件.

Important: the internal differences between two PDF files created using the same code, but on a different moment, may not result in a visual difference when opening the document in a PDF viewer or when printing the document on paper.

这篇关于即使内容相同,为什么PDF文件也不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆