我如何提取从PDF文件附件？ [英] How do I extract attachments from a pdf file?

查看：1244 发布时间：2015/11/24 21:50:30 c# .net pdf

本文介绍了我如何提取从PDF文件附件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我和重视他们的XML文件的大数目的PDF文档。我想提取这些附加的XML文件，并阅读。我怎样才能做到这一点编程方式使用.NET？

I have a big number pdf documents with xml files attached to them. I would like to extract those attached xml files and read them. How can I do this programatically using .net?

推荐答案

iTextSharp的也比较能够提取附件......呃......虽然你可能必须使用低级别对象这样做的。

iTextSharp is also quite capable of extracting attachments... ugh... though you might have to use the low level objects to do so.

有两种方法可以嵌入在PDF文件：

There are two ways to embed files in a PDF:

在文件注释
在文档级EmbeddedFiles。

在你无论从源文件规范词典，文件本身将在标记为EF（嵌入的文件）的流。

Once you have a file specification dictionary from either source, the file itself will be in a stream labeled "EF" (embedded file).

所以，列出在文档级别的所有文件，一会写code（在Java中）正是如此：

So to list all the files at the document level, one would write code (in Java) thusly:

Map<String, byte[]> files = new HashMap<String,byte[]>();

PdfReader reader = new PdfReader(pdfPath);
PdfDictionary root = reader.getCatalog();
PdfDictionary names = root.getAsDict(PdfName.NAMES); // may be null
PdfArray embeddedFiles = names.getAsArray(PdfName.EMBEDDEDFILES); //may be null
int len = embeddedFiles.size();
for (int i = 0; i < len; i += 2) {
  PdfName name = embeddedFiles.getAsName(i); // should always be present
  PdfDictionary fileSpec = embeddedFiles.getAsDict(i+1); // ditto
  PRStream stream = (PRStream)fileSpec.getAsStream(PdfName.EF);
  if (stream != null) {
    files.put( PdfName.decodeName(name.toString()), stream.getBytes() );
  }
}

这篇关于我如何提取从PDF文件附件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何提取从PDF文件附件？ [英] How do I extract attachments from a pdf file?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

我如何提取从PDF文件附件？ [英] How do I extract attachments from a pdf file?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭