文本编辑器中的PDF表单字段 [英] PDF form field in a text editor

查看:129
本文介绍了文本编辑器中的PDF表单字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简而言之:我只想使用文本编辑器来编辑pdf表单中的只读字段。我已经成功了,但我想了解为什么在某些情况下它不起作用...

To make the long story short; I would like to edit a read-only field from a pdf form using a text editor ONLY. I've succeeded but I would like to understand why in some cases it doesn't work...

我注意到,如果我有PDF 1.5版本,我的原始文档(无字段,用word 2010保存为pdf),并使用Acrobat Pro XI添加该字段,使用另存为......保存字段->优化的PDF,使其与Acrobat 6.0兼容。我的字段在文本编辑器(notepad ++)中将如下所示:

I've noticed that if I have a version PDF 1.5 of my original document (without fields, saved by word 2010 as pdf) and add the field with Acrobat Pro XI, save it using Save as other... -> Optimized PDF and make it compatible with Acrobat 6.0. My field will look like this in a text editor (notepad++):

<</AP<</N 28 0 R>>/DA(/Helv 12 Tf 0 g)/DV(mytextfield)/F 4/FT/Tx/Ff 1/MK<<>>/P 3 0 

R/Rect[99.4934 686.99 249.493 708.99]/Subtype/Widget/T(%mytextfield)/Type/Annot/V(mytextfield)>>
endobj
28 0 obj
<</BBox[0.0 0.0 150.0 22.0]/FormType 1/Length 88/Matrix[1.0 0.0 0.0 1.0 0.0 0.0]/Resources<</Font<</Helv 20 0 R>>/ProcSet[/PDF/Text]>>/Subtype/Form/Type/XObject>>stream
/Tx BMC 
q
1 1 148 20 re
W
n
BT
/Helv 12 Tf
0 g
2 6.548 Td
(mytextfield) Tj

这很容易修改,因为每次您看到 mytextfield时,这就是我字段的内容

Which is very easy to modify as every time you see 'mytextfield', it's the content of my field and where you see '%mytextfield', it's the name of my field.

另一方面,如果我使用我的PDF 1.5(按字词2010保存),而不是使用acrobat pro XI进行优化的保存(添加字段后),我通常将其保存(另存为),得到的PDF 1.6带有以下内容(在记事本中):

On the other hand, if I take my PDF 1.5 (saved by word 2010) and instead of making an optimized saving (after adding the field) using acrobat pro XI I save it normally (save as), I get a PDF 1.6 with the following (in notepad++):

<</AcroForm 25 0 R/Lang(fr-CH)/MarkInfo<</Marked true>>/Metadata 3 0 R/Pages 15 0 R/StructTreeRoot 8 0 R/Type/Catalog>>
endobj
19 0 obj
<</Annots 26 0 R/Contents 22 0 R/CropBox[0 0 595.32 841.92]/Group<</CS/DeviceRGB/S/Transparency/Type/Group>>/MediaBox[0 0 595.32 841.92]/Parent 15 0 R/Resources<</ExtGState<</GS0 30 0 R>>/Font<</TT0 33 0 R>>/ProcSet[/PDF/Text]>>/Rotate 0/StructParents 0/Tabs/S/Type/Page>>
endobj
20 0 obj
<</BBox[0.0 0.0 150.0 22.0]/FormType 1/Length 85/Matrix[1.0 0.0 0.0 1.0 0.0 0.0]/Resources<</Font<</Helv 28 0 R>>/ProcSet[/PDF/Text]>>/Subtype/Form/Type/XObject>>stream
/Tx BMC 
q
1 1 148 20 re
W
n
BT
/Helv 12 Tf
0 g
2 6.548 Td
(mytextfield) Tj

这不是一种容易编辑字段的格式(如果更改mytextfield,则会得到损坏的文档! )。现在,如果我在acrobat pro中打开此PDF 1.6并使用上面提到的优化的PDF技巧保存它,那该字段将转换为第一个,那将是很好的。但事实并非如此!相反,我得到了完全相同的字段格式。

Which is not an easy format to edit the field (if I change mytextfield, I get a corrupted document!). Now, it would be just fine if when I open this PDF 1.6 in acrobat pro and save it using the optimized PDF trick mentioned above the field would transform to the first one; but it's not the case! Instead I get the exact same field format.

所以我的问题如下:


  1. 有没有一种方法可以确保我的pdf表格(无论原始版本是pdf版本)都可以使用Acrobat Pro或任何其他程序转换为正确的格式(易于编辑的字段)?

  2. 有没有一种方法可以轻松编辑PDF 1.6字段?


推荐答案

OP在评论中清楚地表明,在他进行编辑时,他用更长或更短的时间替换了PDF数据。

The OP in comments made clear that during his edits he replaced PDF data by something longer or shorter.

这通常是个坏主意,因为PDF文件具有交叉引用表(或流)指示每个间接对象(每个 nnn 0 obj ... endobj )的偏移量。用不同长度的数据替换PDF数据会使遵循编辑位置的对象的这些交叉引用信息无效。

This in general is a bad idea because PDF files have a cross reference table (or stream) indicating the respective offset of each indirect object (each nnn 0 obj...endobj). Replacing PDF data with data of different length invalidates these cross reference information for objects following the editing positions.

因此,要在编辑后拥有有效的PDF,至少必须更新交叉引用信息,这在纯文本编辑器中是一个真正的麻烦(对于交叉引用表而言),甚至几乎是不可能的(对于压缩的交叉引用流而言)。

Thus, to have a valid PDF after editing, one at least has to update cross reference information which in a mere text editor is a real hassle (in case of cross reference tables) or even virtually impossible (in case of compressed cross reference streams).

详细信息可以在PDF规范中找到 ISO 32000-1

Details can be found in the PDF specification ISO 32000-1.

此外,OP表示他在编辑后通过在PDF查看器中打开文件来检查文档的有效性。

Furthermore the OP indicated that he checked for document validity after his edits by opening them in a PDF viewer.

这也不是一个好主意,因为著名的PDF查看器通常倾向于尝试动态修复无效的PDF,而不必表现出来。操纵PDF的程序通常需要有效的PDF(至少在操纵方面是有效的)作为输入,因此可能会拒绝或(甚至更糟)乱码已编辑的PDF。

This also is not a good idea because well-known PDF viewers generally have the tendency to try and repair invalid PDFs on the fly without necessarily showing this. Programs manipulating PDFs more often require valid PDFs (at least valid in the aspect they are manipulating) as input and, therefore, probably will reject or (even worse) garble the edited PDFs.

OP表示他的任务已在此问题。除非那里有一些合适的JS库,否则他基本上将不得不根据自己的需求进行编程。

The OP indicates his task has been described in this question. Unless there is some appropriate JS library out there, he will essentially have to program one according to his needs.

尝试在此处使用增量更新而不是在这里使用可能会更有利。处理源PDF的内部信息。为此,请参见上述规范中的7.5.6 增量更新

It might be advantageous to try and use incremental updates here instead of manipulating the inner information of the source PDF. For this look at section 7.5.6 Incremental Updates in the specification mentioned above.

PS OP要求


增量更新将使用只读字段

would incremental updates work with read-only fields

增量更新只是组织更改的另一种方式-您可以在原始文件中更改的所有内容,也可以使用增量更新进行更改。实际上,您甚至可以使用增量更新来做更多的事情:如果签署了文档,通常仍然允许对文档进行某些更改,但是这些更改必须作为增量更新进行,否则签名会在结构上被破坏。

Incremental updates simply are a different way to organize your changes - everything you can change inside the original file you can also change using incremental updates. Actually you can even do more using incremental updates: In case of signed documents often certain changes to the document still are allowed, but these changes must be made as incremental updates as otherwise the signature would be structurally broken.

这篇关于文本编辑器中的PDF表单字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆