如果检测PDF文件是正确的(头PDF) [英] Detect if PDF file is correct (header PDF)

查看:253
本文介绍了如果检测PDF文件是正确的(头PDF)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有许多管理PDF文件的Windows应用程序.NET。有些文件已损坏。

I have a windows application .NET that manages many PDF Files. Some files are corrupt.

2的问题:我会尽量在我最糟糕的英语解释...对不起

2 issues: I'll try explain in my worst english...sorry

1)

我如何检测任何PDF文件是正确的?

How can I detect if any pdf file is correct ?

我要PDF读头和检测是正确的。

I want read header of PDF and detect is correct.

VAR okPDF = PDFCorrect(@C:\\ TEMP \\ pdfile1.pdf);

var okPDF = PDFCorrect(@"C:\temp\pdfile1.pdf");

2)

如何知道的byte [](字节组)文件是PDF文件或没有。

How to know if byte[] (bytearray) of file is PDF file or not.

例如,对于ZIP文件,你可以检查前四个字节,看看他们是否本地首部签名匹配,即以十六进制

For example, for ZIP files, you could examine the first four bytes and see if they match the local header signature, i.e. in hex

50 4B 03 04

50 4b 03 04

如果(缓冲[0] ==为0x50&放大器;&放大器;缓冲液[1] == 0x4b&放大器;&放大器;缓冲液[2] == 0×03和放大器;&放大器;
  缓冲液[3] == 0×04)

if (buffer[0] == 0x50 && buffer[1] == 0x4b && buffer[2] == 0x03 && buffer[3] == 0x04)

如果您正在加载到一个长期,这是(0x04034b50)。由大卫·皮尔森

If you are loading it into a long, this is (0x04034b50). by David Pierson

我想同为PDF文件。

字节[] dataPDF = ...

byte[] dataPDF = ...

VAR okPDF = PDFCorrect(dataPDF);

var okPDF = PDFCorrect(dataPDF);

任何样品来源$ C ​​$ C在.NET?

Any sample source code in .NET?

推荐答案

1),不幸的是没有简单的方法来确定的是pdf文件损坏。通常情况下,问题的文件有一个正确的标头,这样腐败的真正原因是不同的。 PDF文件是PDF有效对象的转储。该文件包含一个基准表,从该文件的开始给每个对象的确切字节偏移的位置。因此,最有可能损坏的文件有一个破碎的偏移或可能有些对象遗漏。

1) Unfortunately there is no easy way to determine is pdf file corrupt. Usually the problem files have a correct header so the real reasons of corruption are different. PDF file is effectively a dump of PDF objects. The file contains a reference table giving the exact byte offset locations of each object from the start of the file. So most probably corrupted files have a broken offsets or may be some object is missed.

最好的方法来确定已损坏的文件是使用专门的PDF库。
有很多免费和.NET库等商业的。你可能只是尝试加载PDF文件这样的图书馆之一。 iTextSharp的将是一个不错的选择。

The best way to determine that the file is corrupted is to use specialized PDF libraries. There are lots of both free and commercial of such libraries for .NET. You may simply try to load PDF file with one of such libraries. iTextSharp will be a good choice.

2)根据PDF参照PDF文件的首标已通常形成%PDF-1.X(其中X是一个数字,对于present从0到7)。和PDF文件的99%有这样的标题。但也有一些其他种类的头它的Acrobat阅读器接受,甚至没有头的不是PDF浏览器真正的问题。因此,作为损坏,如果它没有一个头,你不应该把文件。
例如。头可某处出现了第一个1024字节的文件中或在窗体%!PS-的Adobe PDF N,N--M.M

2) In accordance with PDF reference the header of PDF file has usually form %PDF−1.X (where X is a number, for the present from 0 to 7). And 99% of PDF files have such header. But also there are some other kinds of headers which Acrobat Viewer accepts and even absence of header isn't real problem for PDF viewers. So you shouldn't treat file as corrupted if it hasn't a header. E.g. the header may be appeared somewhere within the first 1024 bytes of the file or be in the form %!PS−Adobe−N.n PDF−M.m

只是为了您的信息,我的开发商Docotic PDF库

Just for your information I am a developer of the Docotic PDF library.

这篇关于如果检测PDF文件是正确的(头PDF)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆