使用 iTextSharp 读取 PDF 文件附件注释 [英] Reading PDF File Attachment Annotations with iTextSharp

查看:52
本文介绍了使用 iTextSharp 读取 PDF 文件附件注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下问题.我有一个 PDF,其中附加了一个 XML 文件作为注释.不是作为嵌入文件而是作为注释.现在我尝试使用以下链接中的代码阅读它:

I have the following issue. I have a PDF with a XML file attached as annotation inside it. Not as embedded file but as annotation. Now I try to read it with the code from the following link:

iTextSharp - 如何打开/读取/提取文件附件?

它适用于嵌入文件,但不适用于作为注释的文件附件.

It works for embedded files but not for file attachemts as annotations.

我用谷歌从 PDF 中提取注释并找到以下链接:使用 iText 阅读 PDF 注释

I Google for extracting annotations from PDF and find out the following link: Reading PDF Annotations with iText

所以注释类型是文件附件注释"

So the annotation type is "File Attachment Annotations"

有人可以展示一个有效的例子吗?

Could someone show a working example?

在此先感谢您的帮助

推荐答案

在有关 iText 和 iTextSharp 的问题中经常出现,首先应该查看 itextpdf.com 上的关键字列表.在这里您可以找到 文件附件,提取附件,引用了 iText 实战 - 第二版:

As so often in questions concerning iText and iTextSharp, one should first look at the keyword list on itextpdf.com. Here you find File attachment, extract attachments referencing two Java samples from iText in Action — 2nd Edition:

旧的关键字列表没有了;itextpdf.com 站点现在提供了其他搜索示例的方法,但我不会描述它们,以免站点再次更改并且我再次出现死链接...

The old keyword list is no more; the itextpdf.com site now offers other ways for searching examples but I won't describe them lest the site changes again and I have dead links once more...

相关iText示例基于iText in Action - 第二版是:

The relevant iText examples based on iText in Action — Second Edition are:

  • part4.chapter16.KubrickDvds
  • part4.chapter16.Kubrick 纪录片

这里是 来自 iText5 的示例

(我还没有找到样本到 .Net 和 iText 7 的端口,但基于其他来源,这个端口应该不会太难...)

(I haven't found ports of the samples to .Net and iText 7 but based on the other sources this port should not be too difficult...)

KubrickDvds 包含以下方法 extractAttachments/ExtractAttachments 来提取文件附件注释:

KubrickDvds contains the following method extractAttachments/ExtractAttachments to extract File Attachment Annotations:

Java、iText 5.x:

Java, iText 5.x:

/**
 * Extracts attachments from an existing PDF.
 * @param src   the path to the existing PDF
 */
public void extractAttachments(String src) throws IOException {
    PdfReader reader = new PdfReader(src);
    PdfArray array;
    PdfDictionary annot;
    PdfDictionary fs;
    PdfDictionary refs;
    for (int i = 1; i <= reader.getNumberOfPages(); i++) {
        array = reader.getPageN(i).getAsArray(PdfName.ANNOTS);
        if (array == null) continue;
        for (int j = 0; j < array.size(); j++) {
            annot = array.getAsDict(j);
            if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) {
                fs = annot.getAsDict(PdfName.FS);
                refs = fs.getAsDict(PdfName.EF);
                for (PdfName name : refs.getKeys()) {
                    FileOutputStream fos
                        = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
                    fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name)));
                    fos.flush();
                    fos.close();
                }
            }
        }
    }
    reader.close();
}

Java、iText 7.x:

Java, iText 7.x:

public void extractAttachments(String src) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src));
    PdfReader reader = new PdfReader(src);
    PdfArray array;
    PdfDictionary annot;
    PdfDictionary fs;
    PdfDictionary refs;
    for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) {
        array = pdfDoc.getPage(i).getPdfObject().getAsArray(PdfName.Annots);
        if (array == null) continue;
        for (int j = 0; j < array.size(); j++) {
            annot = array.getAsDictionary(j);
            if (PdfName.FileAttachment.equals(annot.getAsName(PdfName.Subtype))) {
                fs = annot.getAsDictionary(PdfName.FS);
                refs = fs.getAsDictionary(PdfName.EF);
                for (PdfName name : refs.keySet()) {
                    FileOutputStream fos
                            = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString()));
                    fos.write(refs.getAsStream(name).getBytes());
                    fos.flush();
                    fos.close();
                }
            }
        }
    }
    reader.close();
}

C#、iText 5.x:

C#, iText 5.x:

/**
 * Extracts attachments from an existing PDF.
 * @param src the path to the existing PDF
 * @param zip the ZipFile object to add the extracted images
 */
public void ExtractAttachments(byte[] src, ZipFile zip) {
  PdfReader reader = new PdfReader(src);
  for (int i = 1; i <= reader.NumberOfPages; i++) {
    PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
    if (array == null) continue;
    for (int j = 0; j < array.Size; j++) {
      PdfDictionary annot = array.GetAsDict(j);
      if (PdfName.FILEATTACHMENT.Equals(
          annot.GetAsName(PdfName.SUBTYPE)))
      {
        PdfDictionary fs = annot.GetAsDict(PdfName.FS);
        PdfDictionary refs = fs.GetAsDict(PdfName.EF);
        foreach (PdfName name in refs.Keys) {
          zip.AddEntry(
            fs.GetAsString(name).ToString(), 
            PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name))
          );
        }
      }
    }
  }
}

KubrickDocumentary 包含以下方法 extractDocLevelAttachments/ExtractDocLevelAttachments 来提取文档级附件:

KubrickDocumentary contains the following method extractDocLevelAttachments/ExtractDocLevelAttachments to extract document level attachments:

Java、iText 5.x:

Java, iText 5.x:

/**
 * Extracts document level attachments
 * @param filename     a file from which document level attachments will be extracted
 * @throws IOException
 */
public void extractDocLevelAttachments(String filename) throws IOException {
    PdfReader reader = new PdfReader(filename);
    PdfDictionary root = reader.getCatalog();
    PdfDictionary documentnames = root.getAsDict(PdfName.NAMES);
    PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES);
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES);
    PdfDictionary filespec;
    PdfDictionary refs;
    FileOutputStream fos;
    PRStream stream;
    for (int i = 0; i < filespecs.size(); ) {
      filespecs.getAsString(i++);
      filespec = filespecs.getAsDict(i++);
      refs = filespec.getAsDict(PdfName.EF);
      for (PdfName key : refs.getKeys()) {
        fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
        stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key));
        fos.write(PdfReader.getStreamBytes(stream));
        fos.flush();
        fos.close();
      }
    }
    reader.close();
}

Java,iText 7.x

Java, iText 7.x

public void extractDocLevelAttachments(String src) throws IOException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(src));
    PdfDictionary root = pdfDoc.getCatalog().getPdfObject();
    PdfDictionary documentnames = root.getAsDictionary(PdfName.Names);
    PdfDictionary embeddedfiles = documentnames.getAsDictionary(PdfName.EmbeddedFiles);
    PdfArray filespecs = embeddedfiles.getAsArray(PdfName.Names);
    PdfDictionary filespec;
    PdfDictionary refs;
    FileOutputStream fos;
    PdfStream stream;
    for (int i = 0; i < filespecs.size(); ) {
        filespecs.getAsString(i++);
        filespec = filespecs.getAsDictionary(i++);
        refs = filespec.getAsDictionary(PdfName.EF);
        for (PdfName key : refs.keySet()) {
            fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString()));
            stream = refs.getAsStream(key);
            fos.write(stream.getBytes());
            fos.flush();
            fos.close();
        }
    }
    pdfDoc.close();
}

C#、iText 5.x:

C#, iText 5.x:

/**
 * Extracts document level attachments
 * @param PDF from which document level attachments will be extracted
 * @param zip the ZipFile object to add the extracted images
 */
public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) {
  PdfReader reader = new PdfReader(pdf);
  PdfDictionary root = reader.Catalog;
  PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
  PdfDictionary embeddedfiles = 
      documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
  PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
  for (int i = 0; i < filespecs.Size; ) {
    filespecs.GetAsString(i++);
    PdfDictionary filespec = filespecs.GetAsDict(i++);
    PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
    foreach (PdfName key in refs.Keys) {
      PRStream stream = (PRStream) PdfReader.GetPdfObject(
        refs.GetAsIndirectObject(key)
      );
      zip.AddEntry(
        filespec.GetAsString(key).ToString(), 
        PdfReader.GetStreamBytes(stream)
      );
    }
  }
}

(出于某种原因,c# 示例将提取的文件放在某个 ZIP 文件中,而 Java 版本则将它们放入文件系统中……哦,好吧……)

(For some reason the c# examples put the extracted files in some ZIP file while the Java versions put them into the file system... oh well...)

这篇关于使用 iTextSharp 读取 PDF 文件附件注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆