如何找到预告片字典? [英] How to find the trailer dictionary?

查看：100 发布时间：2020/5/25 1:07:57 parsing pdf

本文介绍了如何找到预告片字典?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

通过PDF规范，它说trailer在startxref之前.对我来说，xref可以出现在文档中的任何位置，但是trailer仍然出现在startxref之前.在您必须解析它之前，这是有意义的，因为您必须反向解析，因此无法考虑注释或字符串.让我们变得古怪一点.

Going through the PDF spec, it says that the trailer precedes the startxref. Which to me, says that the xref can appear anywhere in the document, but the trailer still appears before the startxref. This makes sense until you have to parse it, because you have to parse in reverse you can't take into account comments or strings. Lets get a little more wacky then.

trailer<< %\
  /Size 4 %\
  /Root 1 0 R %\
  /Info 4 0 R %\
  /Key (\
trailer<< %\
  /Size 4 %\
  /Root 2 0 R %\
  /Info 3 0 R %\
>>%)
>>&)
% test test )
startxref
 15
%%EOF

哪个是完全有效的预告片.第一个是真正的预告片，但第二个是字符串".在这种情况下，反向解析将无法捕获注释.如果字符串尾部包含注释或字符串，则查找字符串尾部将失败.我想知道找出预告片从哪里开始的最佳方法是什么?

Which is a perfectly valid trailer. The first one is the real trailer, but the second one is in a "string". In this case, reverse parsing is going to fail to catch the comments. Looking for the string trailer is going to fail if its apart of a comment or string. I was wondering what the best way of finding out where the trailer starts is?

更新-该预告片似乎已在Acrobat Reader中打开

Update - This trailer seems to open in Acrobat Reader

%PDF-1.3
%âãÏÓ
xref
0 4
00000000 65535 f
00000110 00000 n
00000250 00000 n
00000315 00000 n
00000576 00000 n

1 0 obj <<
  /Type /Catalog
  /Pages 2 0 R
  /OpenAction [ 3 0 R /XYZ null null null ]
  /PageLabels << /Nums [0 << /S /D >> ] >>
>>
endobj
2 0 obj <<
  /Type /Pages
  /Kids [ 3 0 R ]
  /Count 1
>>
endobj
3 0 obj <<
  /Type /Page
  /Parent 2 0 R
  /Resources << >>
  /MediaBox [ 0 0 612 792 ]
>>
endobj
4 0 obj <<
  /Producer (Me)
  /CreationDate (D:20110626000000Z)
>>
endobj

trailer<< %\
  /Size 4 %\
  /Root 1 0 R %\
  /Info 4 0 R %\
  /Key (\
trailer<< %\
  /Size 4 %\
  /Root 2 0 R %\
  /Info 3 0 R %\
>>%)
>>%)
% test test )
startxref
 15
%%EOF

就语法而言，这符合规范.他们似乎能够以某种方式知道是在注释中还是在字符串中.解析L-R时，第二个预告片是一个字符串，尾部带有％，在预告片后带有注释.但是，通过R-L解析，您不知道第一个)是注释的一部分还是字符串定义的结尾.

As far as syntax goes, this conforms to spec. Somehow they seem to be able to know if they are in a comment, or a string. Parsing L-R, the second trailer is in a string with a % tailed on, with a comment after the trailer. But R-L parsing, you have no idea if the first ) is part of a comment, or the end of a string definition.

另一个例子:

%PDF-1.3
%âãÏÓ
xref
0 8
0000000000 65535 f
0000000210 00000 n
0000000357 00000 n
0000000428 00000 n
0000000533 00000 n
0000000612 00000 n
0000000759 00000 n
0000000830 00000 n
0000000935 00000 n

1 0 obj <<
  /Type /Catalog
  /Pages 2 0 R
  /OpenAction [ 3 0 R /XYZ null null null ]
  /PageLabels << /Nums [0 << /S /D >> ] >>
>>
endobj
2 0 obj <<
  /Type /Pages
  /Kids [ 3 0 R ]
  /Count 1
>>
endobj
3 0 obj <<
  /Type /Page
  /Parent 2 0 R
  /Resources << >>
  /MediaBox [ 0 0 612 792 ]
>>
endobj
4 0 obj <<
  /Producer (Me)
  /CreationDate (D:20110626000000Z)
>>
endobj
5 0 obj <<
  /Type /Catalog
  /Pages 6 0 R
  /OpenAction [ 7 0 R /XYZ null null null ]
  /PageLabels << /Nums [0 << /S /D >> ] >>
>>
endobj
6 0 obj <<
  /Type /Pages
  /Kids [ 7 0 R ]
  /Count 1
>>
endobj
7 0 obj <<
  /Type /Page
  /Parent 6 0 R
  /Resources << >>
  /MediaBox [ 0 0 100 100 ]
>>
endobj
8 0 obj <<
  /Producer (Me)
  /CreationDate (D:20110626000000Z)
>>
endobj

trailer<< %\
  /Size 8 %\
  /Root 1 0 R %\
  /Info 4 0 R %\
  /Key (\
trailer<< %\
  /Size 8 %\
  /Root 5 0 R %\
  /Info 8 0 R %\
>>%)
>>%)
% test test )
startxref
 17
%%EOF

此示例在Adobe中正确显示.在我的最后一个案例中，您声称它将失败，因为根"节点无效，但是这个新示例(根)是有效的，但从未实际使用过.那么，它不应该显示100x100的窗口，而不是8.5"x11"的窗口吗?

This example, is displayed correctly in Adobe. In my last case, you claimed it would fail because the "root" node is invalid, but this new sample, the root is valid, but its never actually used. So shouldn't it display a 100x100 window, instead of the 8.5"x11"?

关于资源

  (Required; inheritable) A dictionary containing any resources required by the page 
(see Section 3.7.2, "Resource Dictionaries"). If the page requires no resources, the 
value of this entry should be an empty dictionary. Omitting the entry entirely
indicates that the resources are to be inherited from an ancestor node in the page 
tree.

推荐答案

~~startxref语句通常位于文件的末尾，尾部在其后.~~

~~The startxref statement usually is at the end of the file, with the trailer preceeding it.~~

更新: 上面的介绍性句子表达不够清晰，正如杰里米·沃尔顿(Jeremy Walton)正确观察到的(尽管我的回答后面的评论暗示了例外情况).它应该显示为:"startref语句通常以单个实例的形式出现在文件的末尾，且尾随其后(除非您的文件进行了增量更新，在这种情况下，您可能具有不同的实例)交叉引用与各种预告片."

Update: Above introductionary sentence was not clearly enough formulated, as Jeremy Walton correctly observed (though later comments in my answer hinted at the exceptions). It should have read: "The startref statement appears usually at the end of the file as a single instance, with the trailer preceeding it (unless your file has undergone incremental updates, in which case you may have different instances of cross-references with assorted trailers."

如果在外部参照表字节偏移量计算的字节计数中，有注释添加到PDF中，则它们的计数与真实" PDF页面描述代码相同.因此，正确解析它不是问题.

If there are comments sprinkled into the PDF, they count the same as "real" PDF page description code when it comes to byte counting for the xref table byte-offset calculations. Therefor, it is not a problem to parse it correctly.

直接引用从马口中"(

To quote straight "from the horse's mouth" (PDF specification ISO 32000-1, Section 7.5.5):

"PDF文件的预告片使合格的阅读器可以快速找到交叉引用表和某些特殊对象.合格的阅读器应从末尾读取PDF文件.文件应仅包含文件结尾标记%%EOF，前两行应每行按顺序包含关键字startxref和从文件开头到文件的解码流中的字节偏移量.在最后一个交叉引用部分中xref keyword的开头.startxref行之前应是 trailer字典，由关键字trailer组成，后跟一系列键值对在双尖括号[...]"

"The trailer of a PDF file enables a conforming reader to quickly find the cross-reference table and certain special objects. Conforming readers should read a PDF file from its end. The last line of the file shall contain only the end-of-file marker, %%EOF. The two preceding lines shall contain, one per line and in order, the keyword startxref and the byte offset in the decoded stream from the beginning of the file to the beginning of the xref keyword in the last cross-reference section. The startxref line shall be preceded by the trailer dictionary, consisting of the keyword trailer followed by a series of key-value pairs enclosed in double angle brackets [...]"

此处要考虑的关键表达式是 "LAST " 交叉引用部分" .

The key expression to take into account here is "LAST cross-reference section".

如果您打算更新预告片，请参阅第7.5.6节.

If you are having in mind updated trailers, then have a look at Section 7.5.6.

是的，您必须反向解析.要读取的第一个交叉引用部分是文件中出现的最后一个-它将具有前面的最后一个尾部.要读取的第二个是文件中出现的倒数第二个-带有前一个倒数第二个预告片.等等....如果您必须阅读多个预告片/外部参考部分，则阅读的每一部分都必须包含对下一个要阅读的内容的引用.

Yes, you have to parse in reverse. The first cross-reference section to read is the last one appearing in the file -- and it will have a preceding last trailer. The second one to read is the last-but-one appearing in the file -- with a preceding last-but-one trailer. Etc.pp.... If you'll have to read more than one trailer/xref section, each one you read has to contain a reference to the next one to read.

您是否应该将评论"视为可以随意插入PDF而不破坏其结构的东西:然后再想一想.插入注释后，您至少必须更新外部参照表(可能还需要更新对象的/Length键).

Should you think of "comments" being something you can freely insert into the PDF without corrupting its structure: then think different. Once you inserted comments, you have to update at least the xref table (and maybe the /Length keys of objects).

更新2: Jeremey构建的trailer<<...>>词典可能甚至根本不是有效的词典，因此它也不是有效的 trailer 词典...

Update 2: The trailer<<...>> dictionary Jeremey constructed is probably not even a valid dictionary at all, therefor it's also not a valid trailer dictionary...

无论如何，根据规范，预告片字典必须由一系列键/值对" 组成.预告片字典中的"legal"键仅限于一个非常狭窄的集合，其中有些甚至是可选的(请参见7.5.5节中的表15).

Anyway, according to the spec, the trailer dictionary must consist of "a series of key-value pairs". The 'legal' keys in the trailer dictionary are limited to a quite narrow set, some of which are even optional (see Table 15 in Section 7.5.5).

Jermey似乎以某种方式构造了他的示例，以便(误解)将此代码段理解为可能有效的预告片字典:

Jermey seems to have constructed his example in a way so to (mis-)understand this snippet as a potentially valid trailer dictionary:

trailer<<%) >> % test test )

当然根本不是字典，因为我们在这里看不到任何键值对.

Which of course isn't a dictionary at all, since we don't see any key-value pair here.

他的完整示例也无效，因为称为/Key的键"不在预告片的有效键名中(根据表15:/Size，/Prev，/Root，/Encrypt，/Info，/ID，/XRefStm).

His full example also isn't valid either because the "key" called /Key isn't amongst the valid key names for the trailer (which are, according to table 15: /Size, /Prev, /Root, /Encrypt, /Info, /ID, /XRefStm).

因此，杰里米(Jeremy)应该在其PDF解析代码中执行与所有理智的乃至大多数疯狂的PDF处理库相同的操作:放弃明显无效的结构，而不是在它们中进行搜索，并告诉用户该死的PDF已损坏，因为我们无法在文件的假定预告片部分中识别出有效的密钥..

So Jeremy should do in his PDF parsing code the same that all sane and even most insane PDF processing libraries do: give up on obviously invalid constructs instead of searching sense in them and tell the user that "your damn PDF is corrupt because we cannot identify valid keys in the supposed trailer section of the file".

这篇关于如何找到预告片字典?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何找到预告片字典? [英] How to find the trailer dictionary?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何找到预告片字典? [英] How to find the trailer dictionary?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭