使用文档哈希将签名的PDF与未签名的PDF进行比较 [英] Comparing a signed PDF to an unsigned PDF using document hash

查看:223
本文介绍了使用文档哈希将签名的PDF与未签名的PDF进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过广泛的谷歌搜索,我开始怀疑我是否在某种程度上忽略了数字签名的重点。

After extensive google searches, I'm starting to wonder if I'm missing the point of digital signatures in some way.

这基本上是我相信我的意思应该能够原则上做,我希望iTextSharp允许我:

This is fundamentally what I believe I should be able to do in principle, and I'm hoping iTextSharp will allow me:

我用C#和.NET编写并使用iTextSharp来解析PDF文件。我有一个未签名的PDF文件,也​​是同一文件的签名版本。

I'm writing in C# and .NET and using iTextSharp to parse PDF files. I have an unsigned PDF file, and also a signed version of the same file.

我知道数字签名从根本上散列PDF数据,用私有加密密钥,然后验证过程的一部分是使用公钥解密它,并确保再次散列时结果与PDF数据匹配。

I'm aware a digital signature fundamentally hashes the PDF data, encrypts it with a private key, and then part of the verification process is to decrypt this using the public key and ensure the result matches the PDF data when hashed again.

除此之外,我想要获取此解密文档哈希值,并将其与从我的未签名PDF生成的文档哈希值进行比较。这是因为我不仅要验证签名的PDF是否真实,而且还要记录我记录的相同的无签名PDF。我想我也可以通过将PDF数据(没有签名)与记录中的PDF数据进行比较来做到这一点。

Additionally to this, I want to get this decrypted document hash, and compare it to a document hash generated from my unsigned PDF. This is because I not only want to verify that the signed PDF is authentic, but also that it's the same unsigned PDF I have on record. I suppose I could also do this by comparing the PDF data (without the signature) with my PDF data on record.

我目前还没有弄清楚如何做任何事情这个的!即:

I currently haven't worked out how to do any of this! i.e.:


  1. 如何从签名的PDF中提取PDF数据,不包括签名?

  2. 或者如何从未签名的PDF生成哈希?

  3. 与2.一起,如何从PDF签名中提取解密的哈希值?

希望这是明确的,我并没有错过任何地方!

Hope this is clear, and I haven't missed the point somewhere!

推荐答案

关于这个:


这是因为我不仅要验证签名的PDF是
是否真实,还要这是我记录的相同的无签名PDF

"This is because I not only want to verify that the signed PDF is authentic, but also that it's the same unsigned PDF I have on record"

假设您只想知道您获得的文档服务器是可信的:

创建签名文档时,您可以选择仅签署文件的一部分或整个文档。然后,您可以使用整个文档签名,如果您在服务器上获得的文档是真实的(这意味着签名的验证成功),那么它肯定是您记录的同一文档。

When creating a signed document, you have the choice of signing only one part of the file, or the entire document. You can then use a "whole document" signature, and if the document you get back on your server is "authentic" (which means that the verification of the signature succeeded), then it is for sure the same document you have on record.

值得一提的是,有两种类型的PDF签名,批准签名和认证签名。来自 Adob​​e的PDF数字签名文件:

It's worth mentioning that there are two types of PDF signatures, approval signatures and certification signatures. From the document Digital Signatures in PDF from Adobe:


(...)批准签名,有人签署文件以显示
同意,批准或接受。经证明的文件是当文件
准备好使用时,发起人应用的b $ b b认证签名。发起人指定允许的更改;
选择允许的三个修改级别之一:

(...) approval signatures, where someone signs a document to show consent, approval, or acceptance. A certified document is one that has a certification signature applied by the originator when the document is ready for use. The originator specifies what changes are allowed; choosing one of three levels of modification permitted:


  • 无变化

  • 仅填写表格

  • 表格填写和评论

假设您要匹配服务器上的某些已签名文档,以及数据库中未签名的等效文档:

文件识别,我建议单独处理。一旦打开文档,就可以从其所有页面的解压缩内容的串联中创建散列(例如md5),然后将其与原始文档中的另一个类似散列进行比较(可以生成一次并存储)在数据库中)。

For document identification, I would suggest to deal with it separately. Once a document can be opened, a hash (md5 for example) can be created from the concatenation of the decompressed content of all its pages, and then compare it to another similar hash from the original document, (that can be generated once and stored in a database).

我这样做的原因是它将独立于文档上使用的签名类型。即使在PDF文件中编辑表单字段,或添加注释,或创建新签名,页面内容也永远不会被修改,它将始终保持不变。

The reason I would do it this way is that it will be independent from the type of signature that was used on the document. Even when form fields are edited in a PDF file, or annotations are added, or new signatures are created, the page content is never modified, it will always remain the same.

如果您使用的是iText,则可以使用 PdfReader.getPageContent 并使用计算MD5哈希值

If you are using iText, you can get a byte array of the page content by using the method PdfReader.getPageContent and use the result for computing a MD5 hash.

Java中的代码可能如下所示:

The code in Java might look like this:

PdfReader reader = new PdfReader("myfile.pdf");
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
int pageCount = reader.getNumberOfPages(); 
for(int i=1;i <= pageCount; i++)
{
     byte[] buf = reader.getPageContent(i);
     messageDigest.update(buf, 0, buf.length);
}
byte[] hash = messageDigest.digest();

此外,如果服务器收到一个未签名的文件,则返回签名,签名可能会引用只是文件的一部分而不是全部。在这种情况下,签名摘要可能不足以识别文件。

Additionally, if the server receives a file that went out unsigned an came back signed, the signature may refer to just one part of the file and not all. In this scenario, the signature digests might not be enough to identify the file.

从PDF规范(我帐户中以粗体显示的部分):

From the PDF specification (sections in bold on my account):


签名是通过计算文档中数据(或数据中
的一部分)
的摘要并存储来创建的文档中的摘要。(...)
有两种定义的技术可用于计算PDF文件全部或部分的
内容的可重现摘要:

Signatures are created by computing a digest of the data (or part of the data) in a document, and storing the digest in the document.(...) There are two defined techniques for computing a reproducible digest of the contents of all or part of a PDF file:

-A 字节范围摘要是在文件中的字节范围内计算的,由签名字典中的ByteRange条目指示。这个
范围通常是整个文件,包括签名字典
但不包括签名值本身(Contents条目)。

-A byte range digest is computed over a range of bytes in the file, indicated by the the ByteRange entry in the signature dictionary. This range is typically the entire file, including the signature dictionary but excluding the signature value itself (the Contents entry).

- 一个对象摘要(PDF 1.5)由选择性地计算内存中对象的子树计算,从引用的对象
开始,通常是根对象。生成的摘要以及有关如何计算的
信息放在签名
参考字典(...)中。

-An object digest (PDF 1.5) is computed by selectively walking a subtree of objects in memory, beginning with the referenced object, which is typically the root object. The resulting digest, along with information about how it was computed, is placed in a signature reference dictionary (...).

这篇关于使用文档哈希将签名的PDF与未签名的PDF进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆