如何添加文本对象,以现有的PDF [英] How to add text object to existing pdf

查看:248
本文介绍了如何添加文本对象,以现有的PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个源代码的PDF而我通过添加文本对象修改。我使用的是中提到的PDF规范增量更新。但是,在使用这种方法添加文本对象,我做了一些错误,由于其中的PDF当PDF被打开,我就双击它无法正确显示在Adobe Reader中11,所添加的文本对象被删除。我想通了,这是由于文字注释。

I have a source pdf which I am modifying by adding text objects. I am using "Incremental Updates" which is mentioned in the PDF specification. But while adding text objects using this method I am making some mistakes due to which the pdf doesn't render properly in Adobe Reader 11. When the pdf is opened and I double-click on it, the added text objects get deleted. I figured out that this is due to text annotation.

现在我想知道一个新的文本对象可以使用增量更新添加?如何自由文本注释的内容和RC必须要维护呢?

Now I want to know how a new text object can be added using incremental update? How do the Contents and RC of a free text annotation have to be to maintained?

也就是可以禁用或删除注释,这样可以很容易地避开我的问题呢?因为我想要一个简单的PDF格式,我不想注解选项。

Also is it possible to disable or delete the annotation so that my problem can be avoided easily? Because I want a simple pdf, I don't want annotation options.

我使用的源PDF是这里

添加文本对象后,修改后的PDF是这里

The modified pdf after adding text object is here.

我不知道按照PDF格式规范,源PDF本身是正确的。

I am not sure that source pdf is itself proper according to pdf specification.

推荐答案

首先让我告诉你的东西是多么的容易,如果你可以使用一个像样的PDF库。我用iTextSharp的作为一个例子,但同样也可以与其他类似PDFBox的或PDFNet(已​​经在他的回答提到@Ika)完成的:

First off let me show you how easy things are if you can use a decent PDF library. I use iTextSharp as an example but the same can also be done with others like PDFBox or PDFNet (already mentioned by @Ika in his answer):

PdfReader reader = new PdfReader(sourcePdf);
using (PdfStamper stamper = new PdfStamper(reader, targetPdfStream)) {
  Font FONT = new Font(Font.FontFamily.HELVETICA, 12, Font.BOLD, new GrayColor(0.75f));
  PdfContentByte canvas = stamper.GetOverContent(1);
  ColumnText.ShowTextAligned(
    canvas,
    Element.ALIGN_LEFT, 
    new Phrase("Hello people!", FONT), 
    36, 540, 0
  );
}

(从 Web化iTextSharp的例子 StampText.cs中的解释http://www.manning.com/lowagie2 /samplechapter6.pdf相对=nofollow>章在行动6 的iText的 - 第二版

(Derived from the Webified iTextSharp Example StampText.cs explained in chapter 6 of iText in Action — 2nd Edition.)

(其中PDF库您选择,取决于你的总体要求和可用的许可模式。)

(Which PDF library you choose, depends on your general requirements and available license models.)

如果,尽管放心使用这种PDF库的,你坚持做手工,在这里一些言论:

If, in spite of the ease of use of such PDF libraries, you insist on doing it manually, here some remarks:

首先,你必须要找到你想要添加内容到页面的页字典。根据PDF的类型的对象流等,但在你的样品 modified1 .PDF 是没有必要的:

First you have to find the Page dictionary of the page you want to add content to. Depending on the type of PDF this already might require decompression of object streams etc. but in your sample modified1.pdf that is not necessary:

7 0 obj
  <</Rotate 90
    /Type /Page
    /TrimBox [ 9.54 6.12 585.68 835.88 ]
    /Resources 8 0 R
    /CropBox [ 0 0 595.22 842 ]
    /ArtBox [ 9.54 18.36 585.68 842 ]
    /Contents [ 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R ]
    /Parent 6 0 R
    /MediaBox [ 0 0 595.22 842 ]
    /Annots 17 0 R
    /BleedBox [ 9.54 6.12 585.68 835.88 ]
  >>
endobj 

您看到对内容流的数组。在这里,你必须添加新的页面内容。你可以操作一个现有的流或创建一个新流,并将其添加到该数组。

You see the array of references to content streams. This is where you have to add new page content to. You can manipulate an existing stream or create a new stream and add it to that array.

(大多数PDF有自己的内容流融为一体pressed。一般情况下,因此,你必须DECOM preSS流之前,你可以在它的工作。因此,在我的眼里,更容易方法是将开始一个新的流。)

(Most PDFs have their content stream compressed. For the general case, therefore, you'd have to decompress a stream before you can work on it. Thus, in my eyes, the easier way would be to start a new stream.)

您选择操纵最后引用的流16 0,这在你的PDF uncom pressed:

You chose to manipulate the last referenced stream 16 0 which in your PDF is uncompressed:

16 0 obj
<</Length 37 0 R>>
stream
  S 1 0 0 1 13.183 0 cm 0 0 m
  [...]
  0 10 -10 -0 506.238 342.629 Tm
  .13333 .11765 .12157 scn
  -.0002 Tc
  .0006 Tw
  (the Bank and branch on which cheque is drawn\).)Tj

  /F1 2 Tf
  -15.1279 10.9462 Td
  (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*aaaaaaaaaaaaa)Tj

  /F2 1 Tf
  015.1279 01.9462 Td
  (ANAabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789)Tj

  ET
endstream
endobj 

您的增加,我推测,有两个3套在其底部先选择一种字体,然后插入点定位,最后打印选择的信件。

Your additions, I gather, are the two 3-liners at the bottom which first select a font, then position the insertion point and finally print a selection of letters.

现在你说你的添加文本abc..z和ABC ... Z只是用于测试。但信bjkqv等没有出现在PDF 的问题就更加明显了你的第二个加字母。这里只是资本A和N的显示方式。

Now you say you added text abc..z and ABC...Z just for testing. But letters b j k q v etc not appearing in the pdf. The problem becomes even more visible for your second addition of letters; here only the capital 'A' and 'N' are displayed.

这是由于这样的事实,即在讨论的字体是嵌入到PDF ---字体被嵌入到PDF文件,以允许在其上不具有的字体问题,以显示在PDF系统PDF查看器---但它们并不完全嵌入,从该字体所需的字符的仅子集

This is due to the fact that the fonts in question are embedded into the PDF --- fonts are embedded into PDFs to allow PDF viewers on systems which don't have the font in question, to display the PDF --- but they are not completely embedded, only the subset of characters required from that font.

让我们来看看该字体F2的只有'N'和'A'显示:

Let's look for the font F2 for which only 'N' and 'A' appear:

据页面对象,页面资源对象8 0找到:

According to the page object, the page resources can be found in object 8 0:

8 0 obj
  <</Font <</F1 45 0 R /TT2 46 0 R /F2 47 0 R>>
    /ExtGState <</GS2 48 0 R>>
    /ProcSet [ /PDF /Text ]
    /ColorSpace <</Cs6 49 0 R>>
  >>
endobj 

所以F2中定义47 0:

So F2 is defined in 47 0:

47 0 obj
  <</Subtype /Type1
    /Type /Font
    /Widths [ 722 250 250 250 250 250 250 250 250 250 250 250 250 722 ]
    /Encoding 52 0 R
    /FirstChar 65
    /FontDescriptor 53 0 R
    /ToUnicode 54 0 R
    /BaseFont /ILBPOB+TimesNewRomanPSMT-Bold
    /LastChar 78
  >>
endobj 

在引用ToUni code地图54 0你看

In the referenced ToUnicode map 54 0 you see

54 0 obj
<</Length 55 0 R>>stream
  /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
  /Registry (AAAAAA+F2+0) /Ordering (T1UV) /Supplement 0 >> def
  /CMapName /AAAAAA+F2+0 def
  /CMapType 2 def
  1 begincodespacerange <41> <4e> endcodespacerange
  2 beginbfchar
  <41> <0041>
  <4e> <004E>
  endbfchar
  endcmap CMapName currentdict /CMap defineresource pop end end
endstream
endobj 

在此映射中可以看到只有字符codeS的0x41A和0x4e'N'映射

In this mapping you see that only character codes 0x41 'A' and 0x4e 'N' are mapped

在你的文档中的字体仅用于在量表格单元格,并没有别的打印NA。因此,只有这两个字母N和A是嵌入式的,这会导致你除了用该字体只输出这些信件。

In your document the font is used only to print "NA" in the amount table cells and for nothing else. Thus, only those two letters 'N' and 'A' are embedded, which results in your addition with that font only outputting these letters.

因此​​,为了成功地将文本添加到页面,你要么必须检查与他们提供的字形的页面相关联的字体ressources(并限制你加入这些字形),或者你必须添加自己的字体的资源。

Thus, to successfully add text to the page you either have to check the font ressources associated with the page for the glyphs they provide (and restrict your additions to those glyphs) or you have to add your own font resource.

由于在编码字符的presence往往是不容易看到,因为它是在这里(ToUni code是可选的),我建议你添加自己的字体ressources。 PDF规范 ISO 32000-1 解释了如何做到这一点。

As the presence of characters in the encoding often is not as easy to see as it is here (ToUnicode is optional), I would propose, you add your own font ressources. The PDF specification ISO 32000-1 explains how to do that.

此外,您陈述的 x和y轴的文本位置不正确的PDF格式显示。的虽然你没有说究竟是什么意思,你应该知道,在内容流您可以应用仿射变换到页面的坐标系,即拉伸,倾斜,旋转,和移动轴

Furthermore you state the x and y axis position for the text is not properly displaying in pdf. While you don't say what exactly you mean, you should be aware that in the content stream you can apply affine transformations to the coordinate system of the page, i.e. stretch, skew, rotate, and move the axis.

如果你想使用原来的坐标系统,而不是依赖于坐标是正确的,在你的补充,你应该添加一个初始内容流的网页包含问:运营商(以节省图形状态当前图形状态栈的),并以问:运营商开始你增加一个新的最终内容流(以通过去除最还原的图形状态最近保存的状态从堆栈中并使其目前的状态的)。

If you want to use the original coordinate system and not depend on the coordinates to be proper at your additions, you should add an initial content stream to the page containing a q operator (to save the current graphics state on the graphics state stack) and start your additions in a new final content stream with a Q operator (to restore the graphics state by removing the most recently saved state from the stack and making it the current state).

修改作为一个样本,我申请了相当于Java的C#code在顶部您的 modified1.pdf 与追加模式激活。以下对象已更改或添加的结果:

EDIT As a sample I applied the Java equivalent of the C# code at the top to your modified1.pdf with append mode activated. The following objects were changed or added as a result:

本页面对象7 0已被更新:

The page object 7 0 has been updated:

7 0 obj
  <</CropBox[0 0 595.22 842]
    /Parent 6 0 R
    /Contents[69 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 70 0 R]
    /Type/Page
    /Resources<<
      /ExtGState<</GS2 48 0 R>>
      /ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
      /ColorSpace<</Cs6 49 0 R>>
      /Font<</F1 45 0 R/F2 47 0 R/TT2 46 0 R/Xi0 68 0 R>>
    >>
    /MediaBox[0 0 595.22 842]
    /TrimBox[9.54 6.12 585.68 835.88]
    /BleedBox[9.54 6.12 585.68 835.88]
    /Annots 17 0 R
    /ArtBox[9.54 18.36 585.68 842]
    /Rotate 90
  >>
endobj 

如果你与你以前的版本进行比较,您会看到

If you compare with your former version, you see that

  • 在两个新的内容流已被添加,69 0时开始,70 0末;
  • 的资源是不是间接对象了,而是直接包括在这里;
  • 资源包含一个新的字体的ressource XI0在68 0。

现在让我们来看看添加的对象。

Now let's look at the added objects.

这是在68 0的字体的ressource为黑体,粗体命名XI0:

This is the font ressource for Helvetica-Bold named Xi0 at 68 0:

68 0 obj
  <</BaseFont/Helvetica-Bold
    /Type/Font
    /Encoding/WinAnsiEncoding
    /Subtype/Type1
  >>
endobj 

非嵌入式的,标准的14字体资源不复杂......

Non-embedded, standard 14 font resources are not complicated at all...

现在有额外的内容流。 iText的确实COM preSS他们,但我会告诉他们在一个uncom pressed状态这里:

Now there are the additional content streams. iText does compress them, but I'll show them in an uncompressed state here:

69 0 obj
<</Length 1>>stream
  q
endstream
endobj
70 0 obj
<</Length 106>>stream 
  Q
  q
  0 1 -1 0 595.22 0 cm
  q
  BT
  1 0 0 1 36 540 Tm
  /Xi0 12 Tf
  0.75 g
  (Hello people!)Tj
  0 g
  ET
  Q
  Q
endstream
endobj 

因此​​,在开始时的新的内容流存储当前图形的状态,以及新的一个在末尾检索存储的状态下,改变坐标系,位置为文字插入,选择字体,字体大小,和填充颜色,最后打印字符串。

So the new content stream at the start stores the current graphic state, and the new one at the end retrieves that stored state, changes the coordinate system, positions for text insertion, selects font, font size, and the fill colour, and finally prints a string.

这篇关于如何添加文本对象,以现有的PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆