pdf文件中的ID字段是什么? [英] What is the ID field in a pdf file?

查看:123
本文介绍了pdf文件中的ID字段是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究在ApprovalTests框架中改进pdf净化器,并查看使用PdfSharp生成的简单pdf 我看到它的内容是如下.

I am working on improving the pdf scrubber in the ApprovalTests framework and looking at a simple pdf generated with PdfSharp I see that it's contents are as follows.

有人知道底部的ID字段是什么吗?

Does anyone know what the ID field toward the bottom is?

%PDF-1.4
%ÓôÌá
1 0 obj
<<
/CreationDate(D:20131119194420-06'00')
/Creator(PDFsharp 1.32.3057-g \(www.pdfsharp.net\))
/Producer(PDFsharp 1.32.3057-g \(www.pdfsharp.net\))
>>
endobj
2 0 obj
<<
/Type/Catalog
/Pages 3 0 R
>>
endobj
3 0 obj
<<
/Type/Pages
/Count 1
/Kids[4 0 R]
>>
endobj
4 0 obj
<<
/Type/Page
/MediaBox[0 0 612 792]
/Parent 3 0 R
/Contents 5 0 R
/Resources
<<
/ProcSet [/PDF/Text/ImageB/ImageC/ImageI]
/ExtGState
<<
/GS0 6 0 R
>>
/Font
<<
/F0 8 0 R
>>
>>
/Group
<<
/CS/DeviceRGB
/S/Transparency
/I false
/K false
>>
>>
endobj
5 0 obj
<<
/Length 99
/Filter/FlateDecode
>>
stream
xœŠI
€@ïyE¼)¸ÄŒ^—«ðŽ
2"êÍ×)ènšº ER¢¿ÊŠq>t¡¼pA-t#áö@ÒªÄú¯À†ã¢R7#ç(ý~qîq:og½
endstream
endobj
6 0 obj
<<
/Type/ExtGState
/ca 1
>>
endobj
7 0 obj
<<
/Type/FontDescriptor
/Ascent 1005
/CapHeight 727
/Descent -210
/Flags 32
/FontBBox[-550 -303 1707 1072]
/ItalicAngle 0
/StemV 0
/XHeight 548
/FontName/Verdana,Bold
>>
endobj
8 0 obj
<<
/Type/Font
/Subtype/TrueType
/BaseFont/Verdana,Bold
/Encoding/WinAnsiEncoding
/FontDescriptor 7 0 R
/FirstChar 0
/LastChar 255
/Widths[1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 341 402 587 867 710 1271 862 332 543 543 710 867 361 479 361 689 710 710 710 710 710 710 710 710 710 710 402 402 867 867 867 616 963 776 761 723 830 683 650 811 837 545 555 770 637 947 846 850 732 850 782 710 681 812 763 1128 763 736 691 543 689 543 867 710 710 667 699 588 699 664 422 699 712 341 402 670 341 1058 712 686 699 699 497 593 455 712 649 979 668 650 596 710 543 710 867 1000 710 1000 332 710 587 1048 710 710 710 1777 710 543 1135 1000 691 1000 1000 332 332 587 587 710 710 1000 710 963 593 543 1067 1000 596 736 341 402 710 710 710 710 543 710 710 963 597 849 867 479 963 710 587 867 597 597 710 721 710 361 710 597 597 849 1181 1181 1181 616 776 776 776 776 776 776 1093 723 683 683 683 683 545 545 545 545 830 846 850 850 850 850 850 867 850 812 812 812 812 736 734 712 667 667 667 667 667 667 1018 588 664 664 664 664 341 341 341 341 679 712 686 686 686 686 686 867 686 712 712 712 712 650 699 650]
>>
endobj
xref
0 9
0000000000 65535 f 
0000000015 00000 n 
0000000180 00000 n 
0000000228 00000 n 
0000000283 00000 n 
0000000538 00000 n 
0000000707 00000 n 
0000000750 00000 n 
0000000935 00000 n
trailer
<<
/ID[<48189AA5E6D2394D8EF6E7842493B4A9><48189AA5E6D2394D8EF6E7842493B4A9>]
/Info 1 0 R
/Root 2 0 R
/Size 9
>>
startxref
2167
%%EOF

推荐答案

从@Millie的答案中添加一些注释到图片中:

Some remarks to add to the picture from @Millie's answer:

当对PDF的某些方面有疑问时,首先要看的是规范

When in doubt about some aspects of PDF, the first place to look should be the specification ISO 32000-1.

它将 ID 条目指定为:

ID 数组(如果存在 Encrypt 条目则为必需;否则为可选; PDF 1.1)

ID array (Required if an Encrypt entry is present; optional otherwise; PDF 1.1)

由两个字节字符串组成的数组,构成文件的文件标识符(请参见14.4,文件标识符").如果存在加密条目,则此数组和两个字节字符串应为直接对象,并且应为未加密.

An array of two byte-strings constituting a file identifier (see 14.4, "File Identifiers") for the file. If there is an Encrypt entry this array and the two byte-strings shall be direct objects and shall be unencrypted.

注意1由于未对 ID 条目进行加密,因此可以检查 ID 密钥以确保在不解密文件的情况下访问了正确的文件.字符串是直接对象且未加密的限制确保了这是可能的.

NOTE 1 Because the ID entries are not encrypted it is possible to check the ID key to assure that the correct file is being accessed without decrypting the file. The restrictions that the string be a direct object and not be encrypted assure that this is possible.

注释2尽管此条目是可选的,但缺少该条目可能会阻止文件在某些​​依赖于唯一标识文件的工作流中起作用.

NOTE 2 Although this entry is optional, its absence might prevent the file from functioning in some workflows that depend on files being uniquely identified.

注释3 ID 字符串的值用作加密算法的输入.如果这些字符串是间接的,或者 ID 数组是间接的,则在写入时这些字符串将被加密.这将导致阅读器出现循环情况:必须解密 ID 字符串,以便使用它们来解密字符串,包括 ID 字符串本身.前面的限制可以防止这种循环情况.

NOTE 3 The values of the ID strings are used as input to the encryption algorithm. If these strings were indirect, or if the ID array were indirect, these strings would be encrypted when written. This would result in a circular condition for a reader: the ID strings must be decrypted in order to use them to decrypt strings, including the ID strings themselves. The preceding restriction prevents this circular condition.

(表15 –文件尾部字典中的条目)

基本上,

NOTE 2 还是建议添加此可选值,即使它不是使用本文档中其他地方使用的SHALL/SHOULD/MAY规范语言约定制定的.

NOTE 2 above in essence is a recommendation to add this optional value even though it is not formulated using the SHALL/SHOULD/MAY specification language conventions applied elsewhere in this document.

在参考的第14.4节中,该建议更为明确:

The recommendation is more explicit in the referenced section 14.4:

ID条目是可选的,但应使用.

The ID entry is optional but should be used.

这些规范中的

应该 表示一项建议,并且除非有充分的理由,否则一项建议被定义为必须要做的事情,这意味着PDF编写者必须创建该条目,除非它可以反对该要求(我很难想到使用反对该要求的论点).这应该可以回答米莉回答的问题

As should in these specifications denotes a recommendation and a recommendation is defined as something one has to do unless there are good reasons not to, this means a PDF writer has to create this entry unless it can argue against the requirement (I can hardly think of arguments to use against that). This should answer the question asked in response to Millie's answer

知道为什么PdfSharp和phantomjs都可以创建它吗?

尤其是,不是 仅被视为良好实践,如上面的另一条评论所述.

Especially it is not just considered good practice as assumed in another comment above.

关于 ID 数组的内容,该规范在14.4节中继续:

Concerning the contents of the ID array, the specification continues in section 14.4:

此项的值应为两个字节字符串的数组.第一个字节字符串应是基于文件最初创建时的内容的永久标识符,并且在文件进行增量更新时不得更改.第二个字节字符串应是基于文件上次更新时内容的变化标识符.首次写入文件时,两个标识符应设置为相同的值.如果在解析文件引用时两个标识符都匹配,则很可能已找到正确且未更改的文件.如果仅第一个标识符匹配,则找到了正确文件的其他版本.

The value of this entry shall be an array of two byte strings. The first byte string shall be a permanent identifier based on the contents of the file at the time it was originally created and shall not change when the file is incrementally updated. The second byte string shall be a changing identifier based on the file’s contents at the time it was last updated. When a file is first written, both identifiers shall be set to the same value. If both identifiers match when a file reference is resolved, it is very likely that the correct and unchanged file has been found. If only the first identifier matches, a different version of the correct file has been found.

为帮助确保文件标识符的唯一性,应使用消息摘要算法来计算它们...

To help ensure the uniqueness of file identifiers, they should be computed by means of a message digest algorithm ...

文件标识符的计算不需要可重复;重要的是标识符可能是唯一的.

The calculation of the file identifier need not be reproducible; all that matters is that the identifier is likely to be unique.

因此,引用米莉的第一篇文章在声明时并不完全正确

Thus, the first article Millie quoted from is not entirely correct when it claims

文件标识符(预告片字典中的/ID条目).这是一个任意的字节字符串

the file identifier (the /ID entry from the trailer dictionary). This is an arbitrary string of bytes

ID 条目的值不是 字符串,而是两个字符串组成的数组.字符串值不是任意 任意,而是建议通过散列获取的唯一值.因此,尤其是绝对不能将它们重新用于不同的文档,如果它们只是任意的,那就没关系.

The value of the ID entry is not a string but instead an array of two strings. And the string values are not arbitrary but instead unique values recommended to be obtained by hashing. Thus they especially must not be re-used for different documents which would be ok if they were merely arbitrary.

The other article quoted from also is not entirely correct saying

仅当要加密文件时,才需要创建PDF文件的程序来创建文件标识符.

a program that makes PDF files is only required to create the file identifier if the file is to be encrypted.

即使不进行加密,该程序也必须有充分的理由不创建文件标识符,这是规范中的建议.因此,由于缺少这些原因,需要程序 来创建文件标识符.

Even when not encrypting, that program has to have good reasons not to create file identifiers as it's a recommendation in the specification. Lacking such reasons, therefore, a program is required to create the file identifier.

总而言之,任何PDF使用者都必须准备好查找没有文件标识符的PDF ...毕竟可能有理由不创建它.

This all being said, any PDF consumer always has to be prepared to find a PDF without file identifier... there might be a reason for not creating it after all.

这篇关于pdf文件中的ID字段是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆