如何将文本对象添加到现有的pdf [英] How to add text object to existing pdf

查看:36
本文介绍了如何将文本对象添加到现有的pdf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个源 pdf,我正在通过添加文本对象对其进行修改.我正在使用 PDF 规范中提到的增量更新".但是在使用这种方法添加文本对象时,我犯了一些错误,因为 pdf 在 Adob​​e Reader 11 中无法正确呈现.打开 pdf 并双击它时,添加的文本对象会被删除.我发现这是由于文本注释.

现在我想知道如何使用增量更新添加新的文本对象?自由文本注释的 Contents 和 RC 必须如何维护?

是否也可以禁用或删除注释,以便轻松避免我的问题?因为我想要一个简单的 pdf,所以我不需要注释选项.

我使用的源 pdf 是解释了如何做到这一点.

此外,您还指出 文本的 x 和 y 轴位置未在 pdf 中正确显示. 虽然您没有说出您的确切意思,但您应该意识到在内容流中可以对页面坐标系进行仿射变换,即拉伸、倾斜、旋转和移动轴.

如果您想使用原始坐标系而不是依赖于正确添加的坐标,您应该将初始内容流添加到包含 q 运算符(到 将当前图形状态保存在图形状态堆栈中)并使用 Q 运算符在新的最终内容流中开始添加(以通过删除最最近从堆栈中保存状态并使其成为当前状态).

编辑 作为示例,我将顶部的 C# 代码的 Java 等效代码应用到您的 modified1.pdf 已激活附加模式.结果更改或添加了以下对象:

页面对象 7 0 已更新:

7 0 对象<</CropBox[0 0 595.22 842]/父母 6 0 R/内容[69 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 70 0 R]/类型/页面/资源</ProcSet [/PDF/Text/ImageB/ImageC/ImageI]/ColorSpace</字体<>>>/MediaBox[0 0 595.22 842]/TrimBox[9.54 6.12 585.68 835.88]/BleedBox[9.54 6.12 585.68 835.88]/注释 17 0 R/ArtBox[9.54 18.36 585.68 842]/旋转 90>>结束对象

如果与以前的版本进行比较,您会看到

  • 添加了两个新的内容流,开头是 69 0,结尾是 70 0;
  • 资源不再是间接对象,而是直接包含在这里;
  • 资源在 68 0 处包含一个新的字体资源 Xi0.

现在让我们看看添加的对象.

这是Helvetica-Bold 68 0处名为Xi0的字体资源:

68 0 目标<</BaseFont/Helvetica-Bold/类型/字体/Encoding/WinAnsiEncoding/子类型/类型1>>结束对象

非嵌入式,标准的14种字体资源一点都不复杂...

现在有额外的内容流.iText 确实会压缩它们,但我会在此处以未压缩状态显示它们:

69 0 目标<</长度 1>>流q尾流结束对象70 0 目标<</长度106>>流问q0 1 -1 0 595.22 0 厘米qBT1 0 0 1 36 540 吨/Xi0 12 Tf0.75 克(大家好!)Tj0 克ET问问尾流结束对象

所以新的内容流在开始时存储当前图形状态,最后的新内容流检索存储的状态,更改坐标系,文本插入位置,选择字体、字体大小和填充颜色,最后打印一个字符串.

I have a source pdf which I am modifying by adding text objects. I am using "Incremental Updates" which is mentioned in the PDF specification. But while adding text objects using this method I am making some mistakes due to which the pdf doesn't render properly in Adobe Reader 11. When the pdf is opened and I double-click on it, the added text objects get deleted. I figured out that this is due to text annotation.

Now I want to know how a new text object can be added using incremental update? How do the Contents and RC of a free text annotation have to be to maintained?

Also is it possible to disable or delete the annotation so that my problem can be avoided easily? Because I want a simple pdf, I don't want annotation options.

The source pdf I am using is here.

The modified pdf after adding text object is here.

I am not sure that source pdf is itself proper according to pdf specification.

解决方案

First off let me show you how easy things are if you can use a decent PDF library. I use iTextSharp as an example but the same can also be done with others like PDFBox or PDFNet (already mentioned by @Ika in his answer):

PdfReader reader = new PdfReader(sourcePdf);
using (PdfStamper stamper = new PdfStamper(reader, targetPdfStream)) {
  Font FONT = new Font(Font.FontFamily.HELVETICA, 12, Font.BOLD, new GrayColor(0.75f));
  PdfContentByte canvas = stamper.GetOverContent(1);
  ColumnText.ShowTextAligned(
    canvas,
    Element.ALIGN_LEFT, 
    new Phrase("Hello people!", FONT), 
    36, 540, 0
  );
}

(Derived from the Webified iTextSharp Example StampText.cs explained in chapter 6 of iText in Action — 2nd Edition.)

(Which PDF library you choose, depends on your general requirements and available license models.)

If, in spite of the ease of use of such PDF libraries, you insist on doing it manually, here some remarks:

First you have to find the Page dictionary of the page you want to add content to. Depending on the type of PDF this already might require decompression of object streams etc. but in your sample modified1.pdf that is not necessary:

7 0 obj
  <</Rotate 90
    /Type /Page
    /TrimBox [ 9.54 6.12 585.68 835.88 ]
    /Resources 8 0 R
    /CropBox [ 0 0 595.22 842 ]
    /ArtBox [ 9.54 18.36 585.68 842 ]
    /Contents [ 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R ]
    /Parent 6 0 R
    /MediaBox [ 0 0 595.22 842 ]
    /Annots 17 0 R
    /BleedBox [ 9.54 6.12 585.68 835.88 ]
  >>
endobj 

You see the array of references to content streams. This is where you have to add new page content to. You can manipulate an existing stream or create a new stream and add it to that array.

(Most PDFs have their content stream compressed. For the general case, therefore, you'd have to decompress a stream before you can work on it. Thus, in my eyes, the easier way would be to start a new stream.)

You chose to manipulate the last referenced stream 16 0 which in your PDF is uncompressed:

16 0 obj
<</Length 37 0 R>>
stream
  S 1 0 0 1 13.183 0 cm 0 0 m
  [...]
  0 10 -10 -0 506.238 342.629 Tm
  .13333 .11765 .12157 scn
  -.0002 Tc
  .0006 Tw
  (the Bank and branch on which cheque is drawn\).)Tj

  /F1 2 Tf
  -15.1279 10.9462 Td
  (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*aaaaaaaaaaaaa)Tj

  /F2 1 Tf
  015.1279 01.9462 Td
  (ANAabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789)Tj

  ET
endstream
endobj 

Your additions, I gather, are the two 3-liners at the bottom which first select a font, then position the insertion point and finally print a selection of letters.

Now you say you added text abc..z and ABC...Z just for testing. But letters b j k q v etc not appearing in the pdf. The problem becomes even more visible for your second addition of letters; here only the capital 'A' and 'N' are displayed.

This is due to the fact that the fonts in question are embedded into the PDF --- fonts are embedded into PDFs to allow PDF viewers on systems which don't have the font in question, to display the PDF --- but they are not completely embedded, only the subset of characters required from that font.

Let's look for the font F2 for which only 'N' and 'A' appear:

According to the page object, the page resources can be found in object 8 0:

8 0 obj
  <</Font <</F1 45 0 R /TT2 46 0 R /F2 47 0 R>>
    /ExtGState <</GS2 48 0 R>>
    /ProcSet [ /PDF /Text ]
    /ColorSpace <</Cs6 49 0 R>>
  >>
endobj 

So F2 is defined in 47 0:

47 0 obj
  <</Subtype /Type1
    /Type /Font
    /Widths [ 722 250 250 250 250 250 250 250 250 250 250 250 250 722 ]
    /Encoding 52 0 R
    /FirstChar 65
    /FontDescriptor 53 0 R
    /ToUnicode 54 0 R
    /BaseFont /ILBPOB+TimesNewRomanPSMT-Bold
    /LastChar 78
  >>
endobj 

In the referenced ToUnicode map 54 0 you see

54 0 obj
<</Length 55 0 R>>stream
  /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
  /Registry (AAAAAA+F2+0) /Ordering (T1UV) /Supplement 0 >> def
  /CMapName /AAAAAA+F2+0 def
  /CMapType 2 def
  1 begincodespacerange <41> <4e> endcodespacerange
  2 beginbfchar
  <41> <0041>
  <4e> <004E>
  endbfchar
  endcmap CMapName currentdict /CMap defineresource pop end end
endstream
endobj 

In this mapping you see that only character codes 0x41 'A' and 0x4e 'N' are mapped

In your document the font is used only to print "NA" in the amount table cells and for nothing else. Thus, only those two letters 'N' and 'A' are embedded, which results in your addition with that font only outputting these letters.

Thus, to successfully add text to the page you either have to check the font ressources associated with the page for the glyphs they provide (and restrict your additions to those glyphs) or you have to add your own font resource.

As the presence of characters in the encoding often is not as easy to see as it is here (ToUnicode is optional), I would propose, you add your own font ressources. The PDF specification ISO 32000-1 explains how to do that.

Furthermore you state the x and y axis position for the text is not properly displaying in pdf. While you don't say what exactly you mean, you should be aware that in the content stream you can apply affine transformations to the coordinate system of the page, i.e. stretch, skew, rotate, and move the axis.

If you want to use the original coordinate system and not depend on the coordinates to be proper at your additions, you should add an initial content stream to the page containing a q operator (to save the current graphics state on the graphics state stack) and start your additions in a new final content stream with a Q operator (to restore the graphics state by removing the most recently saved state from the stack and making it the current state).

EDIT As a sample I applied the Java equivalent of the C# code at the top to your modified1.pdf with append mode activated. The following objects were changed or added as a result:

The page object 7 0 has been updated:

7 0 obj
  <</CropBox[0 0 595.22 842]
    /Parent 6 0 R
    /Contents[69 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 70 0 R]
    /Type/Page
    /Resources<<
      /ExtGState<</GS2 48 0 R>>
      /ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
      /ColorSpace<</Cs6 49 0 R>>
      /Font<</F1 45 0 R/F2 47 0 R/TT2 46 0 R/Xi0 68 0 R>>
    >>
    /MediaBox[0 0 595.22 842]
    /TrimBox[9.54 6.12 585.68 835.88]
    /BleedBox[9.54 6.12 585.68 835.88]
    /Annots 17 0 R
    /ArtBox[9.54 18.36 585.68 842]
    /Rotate 90
  >>
endobj 

If you compare with your former version, you see that

  • two new content streams have been added, 69 0 at the start and 70 0 at the end;
  • the resources are not an indirect object anymore but instead are directly included here;
  • the resources contain a new Font ressource Xi0 at 68 0.

Now let's look at the added objects.

This is the font ressource for Helvetica-Bold named Xi0 at 68 0:

68 0 obj
  <</BaseFont/Helvetica-Bold
    /Type/Font
    /Encoding/WinAnsiEncoding
    /Subtype/Type1
  >>
endobj 

Non-embedded, standard 14 font resources are not complicated at all...

Now there are the additional content streams. iText does compress them, but I'll show them in an uncompressed state here:

69 0 obj
<</Length 1>>stream
  q
endstream
endobj
70 0 obj
<</Length 106>>stream 
  Q
  q
  0 1 -1 0 595.22 0 cm
  q
  BT
  1 0 0 1 36 540 Tm
  /Xi0 12 Tf
  0.75 g
  (Hello people!)Tj
  0 g
  ET
  Q
  Q
endstream
endobj 

So the new content stream at the start stores the current graphic state, and the new one at the end retrieves that stored state, changes the coordinate system, positions for text insertion, selects font, font size, and the fill colour, and finally prints a string.

这篇关于如何将文本对象添加到现有的pdf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆