你如何编程编校PDF文件? [英] How do you programmatically redact PDF FIles?

查看:203
本文介绍了你如何编程编校PDF文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Adobe Acrobat有纂PDF文件的功能(即,实际删除的信息,而不是简单地画在它上面的一个黑盒子)。我想以编程方式使用此功能。纂使用您选中标记为密文工具中的图形用户界面,它绘制在文本进行删节,然后点击Apply涂黑。

Adobe Acrobat has the ability to redact PDF files (that is, actually remove the information, rather than simply drawing a black box on top of it). I would like to use this feature programmatically. To redact using the GUI you select the Mark for Redaction Tool, draw it over the text to be redacted, then Apply Redactions.

有没有办法以编程方式做到这一点,无论是通过AppleScript的还是一些其他的方式?

Is there any way to do this programmatically, either through AppleScript or some other way?

我知道的文字(X,Y)的位置被删节。

I know the (X,y) location of the text to be redacted.

谢谢!

推荐答案

为了正确纂PDF文件,你需要改变的内容流。这是很辛苦。

In order to properly redact a PDF, you need to Alter The Content Stream. This is Very Hard.

如果你能找到吸引你要删除的文字内容流的一部分,你一半。

If you can find the portion of the content stream that draws the text you want removed, you're halfway there.

另一半是搞清楚如何更改内容流,这样您不要修改该文件的其余部分。如果下一个文本平局运营商通过以旧换新命令进行(设置文本矩阵,绝对定位的下一段文字),这很容易。如果不是......你算算你要替换文本的精确宽度(几个不同的PDF库可以做到这一点),并改变绘图命令跳过这么多东西。

The other half is figuring out how to change the content stream such that you don't modify the rest of the document. If the next text draw operator is proceeded by a "tm" command (set the text matrix, which absolutely positions the next piece of text), it's easy. If not... you have to calculate the exact width of the text you're replacing (several different PDF libraries can do this), and alter the drawing commands to skip over that much stuff.

例如:


BT
/F1 10 Tf
1 0 0 1 30 720 Tm
(Here's some text, and you only want to REDACT that upper case "redact" over there)Tj
*
(This text is positioned relative to the previous line)Tj
1 0 0 1 30 650 Tm
(This text is positioned absolutely, starting at 30, 650)Tj

所以你得分手,首先(...)TJ 行成(这里的一些文字,你只想)TJ 否0 D (即大写的纂那边)TJ ...其中的N适度调整下面的文本绘制操作,使得它的土地在完全相同的SPOT的位置。所以你需要知道的纂的precise宽度使用字体资源/ F1(不管它原来是),尺寸为10分。

So you'd have to break up that first (...)Tj line into (Here's some text, and you only want to)Tj, N 0 Td, and (that upper case "redact" over there)Tj... where the 'N' properly adjusts the position of the following text drawing operation such that it lands in EXACTLY THE SAME SPOT. So you'd need to know the precise width of " REDACT " using the font resource /F1 (whatever that turned out to be), sized to 10 points.

只是为了让你的生活更精彩,你不用担心kerned文字了。您可以在线提供的小间距调整文本正是如此:

Just to make your life more exciting, you have to worry about kerned text too. You can provide little spacing adjustments inline with text thusly:

(这是从PDF规格绘制的第一个文本拍摄)

(This is taken from the first text drawn in the PDF Spec)


[(Adobe Sys)5(t)1(ems Inc)5(orporated)5( 20)5(08 \226 All rights)5( reser)-9(ved)]TJ

要正确纂收编,你需要确定它已经跨越两个字符串分割,并调整字符串的定位下它,所以它是完全相同的点。

To properly redact "Incorporated", you need to determine that it's been split across two strings, and adjust the positioning of the string following it so it's in Exactly The Same Spot.

和字符串可以是< D​​EADBEEF> 十六进制值,而不是(普通老式ASCII)

And strings can be <DEADBEEF> hex values rather than (plain old ascii).

有此想法?我还没有涵盖所有的可能性,这里,只是最常见的。

Get the idea? And I haven't covered all the possibilities here, just the most common ones.

就像我说:这是很辛苦

有被称为 Appligent Redax (无连接),可以让你画注解杂技演员插件(或其中产生通过模板,正则表达式,等等),然后运行他们的code来处理新版本。它应该是可能的编程方式创建他们的注释和或许甚至激活他们的插件: JS文件可以运行一个菜单项中

There's an acrobat plugin called Appligent Redax (no connection) that lets you draw annotations (or generate them via templates, regex, etc) and then run their code to handle the redaction. It should be possible to programmatically create their annotations and perhaps even activate their plugin: JS in a document can run a menu item.

这篇关于你如何编程编校PDF文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆