示例PDF语言代码可帮助您研究正式的PDF规范? [英] Example PDF language code which helps to study the official PDF specification?
问题描述
我正在尝试学习PDF文件格式.
I am trying to learn the PDF file format.
为此,我下载了Adobe的PDF规范文件,该文件很大.
To this end I downloaded Adobe's PDF specification file, which is huge.
因此,为了帮助我研究PDF的详细信息,我想通过并行查看一些实际的PDF文件来遵循其抽象解释.
So to help me study the details of PDF, I want to follow its abstract explanations by looking in parallel at some real-world PDF files.
例如,一个想法是创建一个PDF文件(使用LaTeX),该文件只有一页,并且内容甚至只有一个字符a
.
For example, one idea was to create a PDF file (using LaTeX) which has only one page and as content even only one character, a
.
但是,当我在十六进制编辑器(或其他可以显示内部PDF结构的工具)中打开此PDF文件时,此PDF内部有很多二进制或压缩内容.有关我所看到的示例,请看下面的屏幕截图:
But when I open this PDF file in a hex editor (or in other tools that can show the internal PDF structure), there is a lot of binary or compressed content inside this PDF. For an example for what I see, look at the screenshot below:
我根本无法识别此二进制文件的哪一部分在此PDF中代表我的字符a
.
I simply can not identify which part of this binary is representing my character a
in this PDF.
到目前为止,我尝试过的所有现实世界PDF文件都发生了同样的情况.我根本找不到任何包含有效示例代码的PDF文件,以帮助我理解通用的PDF语言规范.
The same happens with all the real-world PDF files I've tried so far. I simply cannot find any PDF files which contain working example code to help me understand the generic PDF language specification.
-
我希望其他人向我解释:是否有一种实用的方法来研究PDF规范,同时用真实的PDF文件验证其点点滴滴?
I would like others to explain to me: is there a practical way to study the PDF specification while at the same time verifying its bits and pieces with real PDF files?
我想知道:PDF程序员通常使用哪些软件工具来帮助像我这样的新手开发人员解剖和解压缩现有的二进制PDF文件,以便可以使用简单的文本来研究其源代码.编辑? (注意:我并不是要提出 recommendation .根据SO常见问题解答,我只想知道这些工具是否确实存在,以及它们的名字.)
I would like to know: which software tools are commonly used by PDF programmers that would help a newbie developer like me to dissect and un-compress existing binary PDF files so their source code can be investigated using a simple text editor? (Note: I'm not asking for a recommendation. In compliance with the SO FAQ I just want to know if such tools do exist, and which names they have.)
是否有免费提供的不含二进制和/或压缩内容的PDF文件资源?或者我该如何创建自己的示例文件?
Is there a resource of freely available PDF files which don't contain binary and/or compressed content? Or how could I create my own such example files?
是否存在(最好是免费的)PDF编辑器/解析器,它们可以可视化+剖析PDF文件的原始二进制数据并公开其结构?
Are there (preferably free) PDF editors/parsers available which can visualize + dissect the raw binary data of PDF files and expose their structure?
我只需要第一个钩子.如果您愿意的话,这是进入现实世界PDF文件浓密丛林中狭窄路径的入口,然后我可以沿用...在使用这种名为"PDF规范"的丛林管理员的帮助下.
I only need a first hook. The entry point, if you will, to the narrow path in the thick jungle of real world PDF files, which I then could follow along... while using the help of this bushwacker called 'PDF Specification'.
推荐答案
iText (要创建的Java/C#库)的创建者和操作PDF)发布了名为 RUPS 的工具.
The creators of iText (a Java/C# lib to create and manipulate PDFs) published a tool called RUPS.
在sourceforge页面上:
From the sourceforge page:
RUPS是读取和更新PDF语法"的缩写. RUPS是建立在iText®之上的工具,它使您可以查看PDF文档内部并浏览不同的PDF对象和内容流. (尚无法更新PDF.)
RUPS is an abbreviation for Reading and Updating PDF Syntax. RUPS is a tool built on top of iText® that allows you to look inside a PDF document and browse the different PDF objects and content streams. (Updating PDFs isn't possible yet.)
这篇关于示例PDF语言代码可帮助您研究正式的PDF规范?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!