阅读PDF文件附件注释与iTextSharp的 [英] Reading PDF File Attachment Annotations with iTextSharp

查看:933
本文介绍了阅读PDF文件附件注释与iTextSharp的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下问题。我有附在里面注释的XML文件的PDF文件。
还不如嵌入式文件而是作为注解。现在,我尝试从以下链接code来阅读:

I have the following issue. I have a PDF with a XML file attached as annotation inside it. Not as embedded file but as annotation. Now I try to read it with the code from the following link:

<一个href=\"http://stackoverflow.com/questions/3007780/itextsharp-how-to-open-read-extract-a-file-attachment\">iTextSharp - 如何打开/读/解压缩文件的附件

它适用于嵌入式文件,但没有文件attachemts作为注解。

It works for embedded files but not for file attachemts as annotations.

我谷歌从PDF中提取注释,并找出以下链接:
阅读PDF批注与iText的

I Google for extracting annotations from PDF and find out the following link: Reading PDF Annotations with iText

所以注释类型为文件附件集注

So the annotation type is "File Attachment Annotations"

有人能证明工作的例子?

Could someone show a working example?

在此先感谢您的帮助。

推荐答案

由于常常在涉及的iText和iTextSharp的问题,先要看的上itextpdf.com 的关键字列表。这里你可以找到文件附件,提取附件从的iText在行动 - 第二版

As so often in questions concerning iText and iTextSharp, one should first look at the keyword list on itextpdf.com. Here you find File attachment, extract attachments referencing two Java samples from iText in Action — 2nd Edition:

  • part4.chapter16.KubrickDvds
  • part4.chapter16.KubrickDocumentary

中类似的 Web化iTextSharp的例子

KubrickDvds包含以下方法 extractAttachments / ExtractAttachments 来提取文件附件注释:

KubrickDvds contains the following method extractAttachments/ExtractAttachments to extract File Attachment Annotations:

Java的:

/**
 * Extracts attachments from an existing PDF.
 * @param src   the path to the existing PDF
 */
public void extractAttachments(String src) throws IOException {
    PdfReader reader = new PdfReader(src);
    PdfArray array;
    PdfDictionary annot;
    PdfDictionary fs;
    PdfDictionary refs;
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        array = reader.getPageN(i).getAsArray(PdfName.ANNOTS);
        if (array == null) continue;
        for (int j = 0; j < array.size(); j++) {
            annot = array.getAsDict(j);
            if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) {
                fs = annot.getAsDict(PdfName.FS);
                refs = fs.getAsDict(PdfName.EF);
                for (PdfName name : refs.getKeys()) {
                    FileOutputStream fos
                        = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
                    fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name)));
                    fos.flush();
                    fos.close();
                }
            }
        }
    }
    reader.close();
}

C#:

/**
 * Extracts attachments from an existing PDF.
 * @param src the path to the existing PDF
 * @param zip the ZipFile object to add the extracted images
 */
public void ExtractAttachments(byte[] src, ZipFile zip) {
  PdfReader reader = new PdfReader(src);
  for (int i = 1; i <= reader.NumberOfPages; i++) {
    PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
    if (array == null) continue;
    for (int j = 0; j < array.Size; j++) {
      PdfDictionary annot = array.GetAsDict(j);
      if (PdfName.FILEATTACHMENT.Equals(
          annot.GetAsName(PdfName.SUBTYPE)))
      {
        PdfDictionary fs = annot.GetAsDict(PdfName.FS);
        PdfDictionary refs = fs.GetAsDict(PdfName.EF);
        foreach (PdfName name in refs.Keys) {
          zip.AddEntry(
            fs.GetAsString(name).ToString(), 
            PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name))
          );
        }
      }
    }
  }
}

KubrickDocumentary包含以下方法 extractDocLevelAttachments / ExtractDocLevelAttachments 来提取文档级附件:

KubrickDocumentary contains the following method extractDocLevelAttachments/ExtractDocLevelAttachments to extract document level attachments:

Java的:

/**
 * Extracts document level attachments
 * @param filename     a file from which document level attachments will be extracted
 * @throws IOException
 */
public void extractDocLevelAttachments(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    PdfDictionary root = reader.getCatalog();
    PdfDictionary documentnames = root.getAsDict(PdfName.NAMES);
    PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES);
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES);
    PdfDictionary filespec;
    PdfDictionary refs;
    FileOutputStream fos;
    PRStream stream;
    for (int i = 0; i < filespecs.size(); ) {
      filespecs.getAsString(i++);
      filespec = filespecs.getAsDict(i++);
      refs = filespec.getAsDict(PdfName.EF);
      for (PdfName key : refs.getKeys()) {
        fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
        stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));
        fos.write(PdfReader.getStreamBytes(stream));
        fos.flush();
        fos.close();
      }
    }
    reader.close();
}

C#:

/**
 * Extracts document level attachments
 * @param PDF from which document level attachments will be extracted
 * @param zip the ZipFile object to add the extracted images
 */
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) {
  PdfReader reader = new PdfReader(pdf);
  PdfDictionary root = reader.Catalog;
  PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
  PdfDictionary embeddedfiles = 
      documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
  PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
  for (int i = 0; i < filespecs.Size; ) {
    filespecs.GetAsString(i++);
    PdfDictionary filespec = filespecs.GetAsDict(i++);
    PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
    foreach (PdfName key in refs.Keys) {
      PRStream stream = (PRStream) PdfReader.GetPdfObject(
        refs.GetAsIndirectObject(key)
      );
      zip.AddEntry(
        filespec.GetAsString(key).ToString(), 
        PdfReader.GetStreamBytes(stream)
      );
    }
  }
}

(出于某种原因,C#示例将提取的文件在一些ZIP文件同时的Java版本把它们放到文件系统...哦...好)

(For some reason the c# examples put the extracted files in some ZIP file while the Java versions put them into the file system... oh well...)

这篇关于阅读PDF文件附件注释与iTextSharp的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆