使用XPDF(或其他方式)编辑PDF [英] Editing PDF with XPDF (or with something else)

查看:432
本文介绍了使用XPDF(或其他方式)编辑PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问一下是否可以使用 xpdf 库编辑PDF文件,如果可以,怎么办?我想这是可能的,但是我找不到xpdf的任何教程或文档,所以我真的不知道:(.如果其他库支持 pdf ,我也愿意使用其他库.我对此类库的唯一要求是,它必须是 C ++ 库或至少一个 C 库,并且必须是跨平台的(Windows和Linux)

I would like to ask if it is possible to edit PDF files using the xpdf library and if yes how? I guess this is possible but i could not find any tutorial nor documentation for xpdf so i have realy no idea :( . I'm also open for using another library if any other has support for pdf editing. My only requirement for such library is that it has to be a C++ library or at least a C one and has to be cross-platform (Windows and Linux)

例如,我仅需要对pdf文件进行基本

I Only need basic editing of a pdf file for example:

这是pdf文档中的文本"将更改为这是pdf中的 文本 ",并且文本颜色也不同.

"this is a text in a pdf document" would be changed to "this is a text in pdf" with a different text color as well.

感谢您的所有答复!

推荐答案

只是您了解了所要学习的内容,所以PDF内容的基本编辑"几乎总是很简单的.

Just so you understand the scope of what you're getting into, "basic editing" of PDF content is nearly always non-trivial.

PDF中的页面内容由在页面上绘制的简短RPN程序表示.这是一种与PostScript相似的小语言,但在语义上没有循环结构或函数定义(因此没有停止问题).在一个理智的世界中,页面上的文本将由如下所示:

Page content in PDF is represented by short RPN programs that paint on the page. It's a small language similar to PostScript in semantics, but without looping structures or function definitions (so there is no halting problem). In a sane world, your text on the page is going to be represented by something like this:

BT /F1 12 Tf 72 720 Td (this is a text in a pdf document) Tj ET

当翻译成更熟悉的东西时,是这样的:

which when translated into something more familiar, is this:

BeginText();
SetFont(F1, 12.0);  // Font 1, 12.0 pt
TextMoveTo(72, 720);
ShowText("this is a text in a pdf document");
EndText();

因此,在这种情况下,您必须将其转换为如下形式:

So in this case, you have to transform this into something like this:

BeginText();
SetFont(F1, 12.0);  // Font 1, 12.0 pt
TextMoveTo(72, 720);
ShowText("this is a ");
SetFont(F2, 12);
ShowText("text");
SetFont(F1, 12);
ShowText(" in a pdf document");
EndText();

将变为:

BT /F1 12 Tf 72 720 Td (this is a ) Tj /F2 12 Tf (text) Tj /F1 12 Tf
( in a pdf document) Tj ET

在等效的PDF中.问题有很多:

in the equivalent PDF. The problem is many-fold:

  1. 您必须提取页面及其所有资源(重要的内容)
  2. 您必须生成一个新页面,插入新资源(您要添加新字体),并在允许的情况下嵌入字体
  3. 更改页面的内容流以包含更改后的内容.

3是您要挂断的地方,因为有无数种方法来生成具有您描述的内容的页面,即使有了一个不错的库,您也将很难得到大约70%的人.让我简要描述一下为什么这听起来很糟糕.有PDF生成程序(我正在看你,troff),这些程序首先将所有纯文本放置在页面上,然后放置所有斜体文本,然后放置所有粗体文本.我发誓,我没有弥补.一些程序希望非常精确地放置文本,因此,如果您幸运的话,它们将使用TJ运算符,该操作可以对具有特定字距的文本进行布局.如果您不走运(通常是大多数时间),那么他们会在页面上每个字形之前进行一系列移动来布置文本.而且,如果您的文字以弯曲或不寻常的方向(地图,广告)放置,该怎么办?如果有人巧妙地更改了字体大小以更好地区分大写和小写或模拟大写字母,该怎么办?

And 3 is where you're going to get hung up, because there are an infinite number of ways to generate a page that has the content you describe and even with a decent library, you're going to have a hard time getting maybe 70% of them. Let me briefly describe why this is as bad as it sounds. There are PDF generation programs (I'm looking at you, troff) that lay all the plain text on a page first, then lay all the italic text, then all the bold text. I swear, I'm not making this up. Some programs want to lay text down very precisely, so if you're lucky, they'll use the TJ operator which lays out text with specific kerning. If you're not lucky (which is most of the time), they're instead lay out the text with a set of moves before every single glyph on the page. And what if your text is laid our on a curve or an unusual orientation (maps, ads)? What about the cases where someone subtly changes the font size for a greater distinction between upper and lower case or simulates small caps?

这就是为什么当我为Acrobat 1.0编写查找文本工具时,我花了两个月的时间来处理许多边缘情况.这不是在编辑文本-只是在寻找一个单词或短语.

This is why, when I wrote the find text tool for Acrobat 1.0, it took me two months of sweat to handle as many of the edge cases. This is not editing text - it's just trying to find a single word or phrase.

我不会为您推荐一个库-抱歉-我给了xpdf一个简短的介绍,目前尚不清楚它是否具有PDF生成功能,或者它仅仅是PDF的使用者. PdfLib是一种商业产品,似乎正在生成PDF,尽管尚不清楚它是否可以使用它,但您当然可以通过将它们粘合在一起来获得双方的利益.

I'm not going to recommend a library for you - sorry - I gave xpdf a brief look over and it's not clear whether or not it has PDF generation capabilities or if it is simply a consumer of PDF. PdfLib, which is a commercial product, appears to be to generate PDF, although it's not clear if it can consume it, but you could certainly get both sides by gluing them together.

如果是我,我将使用我已经开发的工具,但我仍然对这项任务有点sh.我工作的公司 Atalasoft 使用我的库从整个布料生成PDF并在其中进行编辑一个非常有限的域(注释,文档元数据).最难的部分是,我们将竭尽全力向客户隐藏PDF的复杂性.通常,我们的客户希望我们代替他们理解规范,并使其余部分变得容易-但是像这样的任务(修订是另一项任务),如果不了解PDF的深度,确实很难完成.规格.如果您开始进入PDF操作的库世界,那么应该先阅读规范,尤其是第8章(图形)和第9章(文本),您将更好地了解您将要做的事情与图书馆.

If it were me, I would use tools that I've developed and I'd still be a little shy of this task. My library is being used by Atalasoft, the company I work for, to generate PDFs from whole cloth and to do editing within a very limited domain (annotations, document metadata). The hardest part is that we do our very best to hide the complexity of PDF from our customers. In general, our customers want us to understand the spec instead of them and make the rest easy - but tasks like this (redaction is another one), are really hard to do without understanding the depth of the PDF specification. If you start entering the library world of PDF manipulation, you should start with reading the spec, especially chapter 8 (Graphics) and chapter 9 (Text), and you'll get a better understanding of what you're going to have to do with the library.

这篇关于使用XPDF(或其他方式)编辑PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆