如何调试PDF文件? [英] How do you debug PDF files?

查看:170
本文介绍了如何调试PDF文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很多时候,我以编程方式创建一个PDF,可能会有问题,例如某些特定的信件可能不会显示出来,或者我可能会遇到编码问题等。

Many times I create a PDF either programmatically and there might be a problem with it, e.g. some specific letter might no show up well or I might have encoding issues etc.

有没有办法调试PDF?例如。看看它的详细结构?

Is there some way to debug a PDF? E.g. see it's detailed structure?

推荐答案

有一些免费的工具,让你看看一个PDF的胆量,未压缩和解密(给出密码)。

There are a number of free tools that'll let you look at the guts of a PDF, uncompressed and decrypted (given the password).

对于iText来说,RUPS的想法(但我有偏见)。我不知道有一个iTextSharp等效。这是一个具有PDF对象的树视图(所有这些应用程序的东西)的GUI。

RUPS for iText springs to mind (but I'm biased). I don't know that there's an iTextSharp equivalent. It's a GUI with a tree view (something ALL these apps have) of the PDF objects.

有些将让您树,但不是很多。我相信Windjack的PDF CanOpener将(以及您期望的一些商业Acrobat插件的其他几个特征)。

Some will let you edit the PDF within that tree, but not many. I believe Windjack's PDF CanOpener will (along with several other spiffy features you'd expect from a commercial Acrobat plugin).

而且,在插入最喜欢的文本编辑器这里>工作...但不要尝试改变任何东西。 PDF是一种二进制格式:字节偏移很重要。如果您的文本编辑器将\\\
更改为\r\\\
(或尝试将其解释为UTF-8,或,或,或),则PDF将被破坏。不要这样做。

And in a pinch, <insert favorite text editor here> works... but don't try to change anything. PDF is a binary format: byte offsets are important. If your text editor changes the \n to a \r\n (or tries to interpret it as UTF-8, or, or, or), your PDF will be Horribly Broken. Don't do that.

我最终做了大量的搜索给定的对象编号来查找间接引用。查找单个数字参考总是很痛苦,因为4 obj显示在每第四个对象(14,24,34,1234等)的末尾。寻找line-4 obj-end of line的正则表达式搜索将是非常好的,但是我通常使用记事本,所以这是出来的(我也不是一个正则表达式的人)。

I end up doing a lot of searching for a given object number to look up indirect references. It's always a pain to look up a single digit reference because "4 obj" shows up at the end of every fourth object (14, 24, 34, 1234, etc). A regex search that looked for "beginning of line-4 obj-end of line" would be great, but I generally use notepad, so that's out (and I'm not much of a regex guy anyway).

PS :即使有一个庞大的Acrobat插件(不可以开启者,从家乡成长起来),我仍然需要不时地打开一个文本编辑器。

PS: Even with a spiffy Acrobat plugin(not can opener, home grown from way back), I still need to crack open a text editor from time to time.

Acrobat会在加载PDF时进行更改(主要是为了解决问题),如果您想知道What's Really There,则需要查看PDF以其他方式。而当您尝试调试破碎的PDF时,您所需要的最后一个的操作是非常有用的。

Acrobat will make changes at times as it loads a PDF (mostly to fix things), and if you want to know What's Really There, you need to look at that PDF in some other way. And when you're trying to debug a broken PDF, acrobat being helpful is the last thing you need.

PPS :Acrobat在其advanced-> preflight配置文件中也有一个spiffypdf语法检查。同时还检查了各种PDF / *标准(PDF / X,PDF / A-1 [a和b]等),可访问性等。当您尝试符合标准时,它们是非常宝贵的。不完全是您提出的调试工具,但是非常方便。

PPS: Acrobat also has a spiffy "pdf syntax check" in its advanced->preflight profiles. It's also got checks for various PDF/* standards (PDF/X, PDF/A-1 [a and b], etc), accessibility, and so forth. They're invaluable when you're trying to Be Compliant. Not quite the debugging tool you were asking about, but Very Handy none the less.

PPPS :diff两个PDF都是不可能,而不必为自己编写一个自定义工具。我写了一些以可预测的顺序列出所有页面(大小)和字段(类型,标志等)的东西,并将其转储到文本文件中,以便我可以区分文件...但是直接分歧两个PDF是毫无意义的。对于相同文件有不同的方法(对象顺序,字典键顺序,压缩级别等)。

PPPS: "diff"ing two PDFs is all but impossible, without writing a custom tool to do it for you. I wrote something that listed all the pages (with sizes) and fields (with types, flags, etc) in a predictable order and dumped it to a text file so I could diff the files... but directly diffing two PDFs is pointless. There are too many ways for "identical" files to differ (object order, dictionary key order, compression levels, etc).

这篇关于如何调试PDF文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆