pdfbox:如何克隆页面 [英] pdfbox: how to clone a page

查看:321
本文介绍了pdfbox:如何克隆页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Apache PDFBox,我正在编辑现有文档,我想从该文档中取一个页面并简单地克隆它,复制它包含的任何元素。另外,我希望获得对这个新克隆页面中任何表单字段的所有 PDField 的引用。这是我到目前为止尝试的代码:

Using Apache PDFBox, I am editing an existing document and I would like to take one page from that document and simply clone it, copying whatever elements it contains. As an additional twist, I would like to get a reference to all the PDFields for any form fields in this newly cloned page. Here's the code I tried so far:

            PDPage newPage = new PDPage(lastPage.getCOSDictionary());
            PDFCloneUtility cloner = new PDFCloneUtility(pdfDoc);
            pdfDoc.addPage(newPage);
            cloner.cloneMerge(lastPage, newPage);

            // there doesn't seem to be an API to read the fields from the page, need to filter them out from the document.
            List<PDField> newFields = readPdfFields(pdfDoc);
            Iterator<PDField> i = newFields.iterator();
            while (i.hasNext()) {
                if (i.next().getWidget().getPage() != newPage)
                    i.remove();
            }

readPdfFields 是一个我写的帮助方法使用AcroForm获取文档中的所有字段。

readPdfFields is a helper method I wrote to get all the fields in a document using the AcroForm.

但是这段代码似乎导致我的JVM中出现某种崩溃/挂起状态 - 我无法正确调试正在发生的事情,但我猜这实际上并不是克隆页面的正确方法。什么是?

But this code seems to lead to some kind of crash/hang state in my JVM - I haven't been able to debug exactly what's happening but I'm guessing this is not actually the right way to clone a page. What is?

推荐答案

克隆页面的资源最少的方法是相应字典的浅表副本:

The least resource intensive way to clone a page is a shallow copy of the corresponding dictionary:

PDDocument doc = PDDocument.load( file );

List<PDPage> allPages = doc.getDocumentCatalog().getAllPages();

PDPage page = allPages.get(0);
COSDictionary pageDict = page.getCOSDictionary();
COSDictionary newPageDict = new COSDictionary(pageDict);

newPageDict.removeItem(COSName.ANNOTS);

PDPage newPage = new PDPage(newPageDict);
doc.addPage(newPage);

doc.save( outfile );

我明确删除了副本的注释(表单字段等),因为注释有一个指向的引用复制页面显然是错误的页面。

I explicitly deleted the annotations (form fields etc) of the copy because an annotation has a reference pointing back to its page which in the copied page obviously is wrong.

因此,如果你想要一个干净的方式出现注释,你必须创建一个浅的副本annotations数组和所有包含的注释字典,并替换其中的页面引用。

Thus, if you want the annotations to come along in a clean way, you have to create shallow copies of the annotations array and all contained annotation dictionaries, too, and replace the page reference therein.

但是,如果页面引用不正确,大多数PDF阅读器都不会介意。因此,对于脏解决方案,您只需将注释保留在页面字典中即可。但谁想要变脏...;)

Most PDF reader would not mind, though, if the page references are incorrect. For a dirty solution, therefore, you could simply leave the annotations in the page dictionary. But who wants to be dirty... ;)

如果你想要另外更改新页面或旧页面的某些部分,你显然也必须复制相应的操作它们之前的PDF对象。

If you want to additionally change some parts of the new or the old page, you obviously also have to copy the respective PDF objects before manipulating them.

其他一些评论:

克隆给我的原始页面看起来很奇怪。毕竟你再次将相同的页面字典添加到文档中(我认为页面树中的重复条目被忽略),然后在这些相同的页面对象之间进行一些合并。

Your original page cloning to me looks weird. After all you add the identical page dictionary to the document again (duplicate entries in the page tree are ignored, I think) and then do some merge between these identical page objects.

我假设 PDFCloneUtility 用于在不同文档之间进行克隆,而不是在同一文档内部进行克隆,但将字典合并到自身不需要工作。

I assume the PDFCloneUtility is meant for cloning between different documents, not inside the same, but merging a dictionary into itself does not need to work.


我想获得对这个新克隆页面中任何表单字段的所有PDFields的引用

I would like to get a reference to all the PDFields for any form fields in this newly cloned page

由于字段具有相同的名称,它们是相同的!

As the fields have the same name, they are identical!

PDF中的字段是抽象字段,可以在文档上分布许多外观。相同的名称表示相同的字段。

Fields in PDF are abstract fields which can have many appearances spread over the document. The same name implies the same field.

某些页面上出现的字段表示在页面上有一个表示该字段的注释。为了使事情变得更复杂,可以为只有一个外观的字段合并字段字典和注释字典。

A field appearing on some page means that there is an annotation representing that field on the page. To make things more complicated, field dictionary and annotation dictionary can be merged for fields with one appearance only.

因此,根据您的要求,您首先要确定是否想要使用字段或字段注释。

Thus, depending on your requirements you will first have to decide whether you want to work with fields or with field annotations.

这篇关于pdfbox:如何克隆页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆