在iText中复制包括.joboptions文件的PDF附件 [英] Copying PDF Attachments Including a .joboptions File in iText

查看:387
本文介绍了在iText中复制包括.joboptions文件的PDF附件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用iText 5.5库来处理PDF中的信息.我想扫描PDF以查找附件,如果附件有附件,请对其进行物理复制(无需删除/编辑原始文件).当存在附有.joboptions文件的PDF时,我遇到了一个问题. 我正在使用以下代码:

I'm trying to use the iText 5.5 library to manipulate information within a PDF. I want to scan a PDF for attachments and if it has attachments make physical copies of them (without removing/editting the original file). I'm running into an issue when there is a PDF with a .joboptions file attached. I'm using the following code:

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfDictionary;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfString;

public class extractAttachments{

public extractAttachments(String src, String dir) throws IOException {

    File folder = new File(dir);
       folder.mkdirs();
       PdfReader reader = new PdfReader(src);
       PdfDictionary root = reader.getCatalog();
       PdfDictionary names = root.getAsDict(PdfName.NAMES);
       PdfDictionary embedded = names.getAsDict(PdfName.EMBEDDEDFILES);
       PdfArray filespecs = embedded.getAsArray(PdfName.NAMES);
       for (int i = 0; i < filespecs.size(); ) {
         extractAttachment(reader, folder, filespecs.getAsString(i++),
         filespecs.getAsDict(i++));
       }
     }

    protected void extractAttachment(PdfReader reader, File dir, PdfString name, PdfDictionary filespec)
       throws IOException {
       PRStream stream;
       FileOutputStream fos;
       String filename;
       PdfDictionary refs = filespec.getAsDict(PdfName.EF);
       for (PdfName key : refs.getKeys()) {
         stream = (PRStream)PdfReader.getPdfObject(refs.getAsIndirectObject(key));
         filename = filespec.getAsString(key).toString();
         fos = new FileOutputStream(new File(dir, filename));
         fos.write(PdfReader.getStreamBytes(stream));
         fos.flush();
         fos.close();
       }
     }
  }

一旦到达PdfArray filespecs = embedded.getAsArray(PdfName.NAMES);,将返回null.我不在乎是否复制.joboptions文件,但是我确实希望复制其他附件(如果有).有什么想法可以解决这个问题吗?

Once it gets to PdfArray filespecs = embedded.getAsArray(PdfName.NAMES); null is returned. I don't care if the .joboptions file is copied, however I do want the other attachments (if there are any) to be copied. Any ideas how I can get around this?

此外,如果要使用上述.joboptions文件创建PDF,请打开PDF文档,然后转到打印菜单,然后将打印机更改为"Adob​​e PDF".现在选择属性",单击确定",然后在主打印菜单中单击打印".这将提示您选择保存文档的位置,新文档将带有.joboptions作为附件.

Also, if you want to create a PDF with said .joboptions file open a PDF document, go to the print menu and change the Printer to "Adobe PDF". Now select Properties, click OK and in the main print menu click Print. This will prompt you to select a location to save the document and the new document will have a .joboptions as an attachment.

推荐答案

您的代码不完整,因为它只了解非常原始的 EmbeddedFiles 结构.您的示例文件的 EmbeddedFiles 结构稍微复杂一些.您需要改进代码以了解这种更复杂的结构.

Your code is incomplete as it only understands very primitive EmbeddedFiles structures. Your sample file has a slightly more complex EmbeddedFiles structure. You need to improve your code to also understand such more complex structures.

EmbeddedFiles 字典被指定为包含名称树:

The EmbeddedFiles dictionary is specified to contain a name tree:

EmbeddedFiles 名称树(可选; PDF 1.4)名称树将名称字符串映射到文件 嵌入式文件流的规范(请参见7.11.4,嵌入式 文件流").

EmbeddedFiles name tree (Optional; PDF 1.4) A name tree mapping name strings to file specifications for embedded file streams (see 7.11.4, "Embedded File Streams").

( ISO 32000- 1 表31 –名称词典中的条目)

(ISO 32000-1 Table 31 – Entries in the name dictionary)

名称树应由节点构成,每个节点应是一个字典对象.表36显示了节点字典中的条目.节点应分为三种,具体取决于它们所包含的特定条目.树应始终仅具有一个根节点,该根节点应包含一个条目:孩子名称,但不能同时包含两者.如果根节点具有名称条目,则它应是树中的唯一节点.如果它具有 Kids (孩子)条目,则其余的每个节点应是一个中间节点,该节点应包含一个 Limits (条目)和一个 Kids (孩子)条目或叶节点,其中应包含一个限制条目和一个名称条目.

A name tree shall be constructed of nodes, each of which shall be a dictionary object. Table 36 shows the entries in a node dictionary. The nodes shall be of three kinds, depending on the specific entries they contain. The tree shall always have exactly one root node, which shall contain a single entry: either Kids or Names but not both. If the root node has a Names entry, it shall be the only node in the tree. If it has a Kids entry, each of the remaining nodes shall be either an intermediate node, that shall contain a Limits entry and a Kids entry, or a leaf node, that shall contain a Limits entry and a Names entry.

( ISO 32000- 1 第7.9.6节-名称树)

(ISO 32000-1 Section 7.9.6 - Name Trees)

您的代码仅了解根节点具有名称条目的种类,因此,树中唯一的节点 :

Your code only understands the variety in which the root node has a Names entry and, therefore, is the only node in the tree:

...
PdfDictionary embedded = names.getAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embedded.getAsArray(PdfName.NAMES);
...

另一方面,在示例PDF文件中, EmbeddedFiles 词典有一个 Kids 条目,因此您的代码无法理解:

In your sample PDF file on the other hand the EmbeddedFiles dictionary has a Kids entry and, therefore, is not understood by your code:

这篇关于在iText中复制包括.joboptions文件的PDF附件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆