为什么在不同的计算机上,由同一自动化过程生成的pdf会有所不同? [英] Why would pdfs generated by the same automated process be different on different machines?

查看:137
本文介绍了为什么在不同的计算机上,由同一自动化过程生成的pdf会有所不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个自动生成pdfs 的程序,然后我们通过批准测试将其与已知版本进行比较验证该管道中的任何内容均未损坏. 我将不匹配的字段归一化,例如创建/修改的日期和时区,本地所有内容始终匹配100 %.但是由于某些原因,在我们的构建服务器上生成的pdf与我在本地生成的pdf 非常不同,有时在本地生成的pdf会大20%.

I have an automated process that generates pdfs which we then compare to a known version via approval tests to verify nothing in that pipeline is broken. I normalize mismatching fields like created/modified date and timezone and locally everything always matches up 100%. However for some reason, pdfs generated on our build server are very different from those I generate locally with sometimes the ones I generate locally being as much as 20% larger.

比较winmerge中的文件时,第一个区别是/FontName字段,如下所示:

The first difference when comparing the files in winmerge is the /FontName field which looks like this:

本地生成

/FontName/QOAAAA+TimesNewRomanRegular

生成服务器生成

/FontName/QYAAAA+TimesNewRomanRegular

之后,我们在/FontBBox,长度和二进制数据方面存在差异.我看到了几个方块.

after that we have differences in /FontBBox, length, and binary data. I see several blocks of this.

我怀疑这两种机器上可用的字体和字体略有不同,它们都被嵌入到pdf中,但是我不知道上面的Q*AAAA代码是什么意思,也不知道如何验证该假设.

My suspicion is that slightly different fonts are available on and being selected on the two machines and being embedded into the pdf but I have not idea what the Q*AAAA code above means nor how to verify that hypothesis.

pdffonts报告两种字体都相同,但是不能只是同一嵌入式字体的不同版本吗?

pdffonts reports identical fonts in both but couldn't that just be different versions of the same embedded font?

W:\xpdfbin-win-3.03\bin64> .\pdffonts.exe w:\...\PhantomRasterizer\Can_rasterize_html_to_pdf.slide_with_table_and_svg.approved.pdf
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
TimesNewRomanRegular                 CID TrueType      yes no  yes      7  0
ArialBold                            CID TrueType      yes no  yes     12  0
ArialRegular                         CID TrueType      yes no  yes     17  0
W:xpdfbin-win-3.03\bin64> .\pdffonts.exe W:\...\PhantomRasterizer\Can_rasterize_html_to_pdf.slide_with_table_and_svg.received.pdf
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
TimesNewRomanRegular                 CID TrueType      yes no  yes      7  0
ArialBold                            CID TrueType      yes no  yes     12  0
ArialRegular                         CID TrueType      yes no  yes     17  0

推荐答案

请阅读我对以下问题的回答:

Please read my answer to this question: Why are PDF files different even if the content is the same?

您的问题等同于为什么HashMap中的条目顺序在不同的JVM上为何不同?"答案很简单:因为HashMap是按这种方式设计的. HashMap不是TreeMap.

Your question is the equivalent of "Why is the order of entries in a HashMap different on different JVMs?" The answer is simple: because HashMaps are designed that way. A HashMap is not a TreeMap.

您现在将重点放在字体上,更具体地说是字体子集(关于字体子集名称中的随机字符,ISO-32000-1指出字母的选择是任意的",因此您正在争夺ISO标准)你的问题).但是,这是您最少的麻烦. PDF的ID也应该不同,字典中条目的顺序类似于HashMap中的条目.请阅读ISO-32000-1的第7.3.7节:

You are now focusing on Fonts, more specifically font subsets (regarding the random characters in the name of the font subset ISO-32000-1 states "the choice of letters is arbitrary", so you're contesting the ISO standard in your question). However, this is the least of your troubles. The IDs of a PDF should be different too, the order of entries in dictionaries are like the entries in a HashMap. Read section 7.3.7 of ISO-32000-1:

字典中的条目代表一个关联表,因此 即使可以施加任意命令也应无序 它们写入文件时.该顺序将被忽略.

The entries in a dictionary represent an associative table and as such shall be unordered even though an arbitrary order may be imposed upon them when written in a file. That ordering shall be ignored.

对象编号也一样.我看过一些测试,它们检查对象号为1的对象是这个或那个字典,对象号为2的对象是这个或那个数组.但是:对象编号无关紧要.您可以在一个系统中创建一个PDF文档,其中第一个对象是字典,第二个对象是数组,并且使用相同的代码 创建相同的PDF文档,而使用另一种方法.我们最近注意到,使用Java 8而不是Java 7测试软件时,其中一项测试很糟糕.更改JVM后,您的测试可能会遇到相同的问题.

The same goes for object numbers. I've seen tests that check if the object with object number 1 is this or that dictionary, and the object with object number 2 is this or that array. However: object numbers don't matter. You can create a PDF document one one system where the first object is a dictionary and the second one an array, and the same PDF document using the same code in which it's the other way around. We recently noticed that one of our tests was bad when testing our software with Java 8 instead of Java 7. You can have the same problem with your tests as soon as you change the JVM.

您的验证是错误的.在测试PDF时,我们使用完全不同的方法.

Your validation is wrong. When we test PDFs, we use a completely different approach.

这篇关于为什么在不同的计算机上,由同一自动化过程生成的pdf会有所不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆