用 Python 阅读 PDF 包? [英] Reading a PDF Portfolio in Python?
问题描述
我有一个由电子邮件线程组成的 pdf 投资组合,每封电子邮件都包含附件.我想阅读每封电子邮件中的文本并提取附件.但是,我找不到有关如何在 python 中阅读 pdf 投资组合的信息.我曾尝试使用 PDFMiner 和 textract 库,但输出只是简单地显示:为了获得最佳体验,请在 Acrobat X 或 Adobe Reader X 或更高版本中打开此 PDF 组合.立即获取 Adobe Reader!"
I have a pdf portfolio which is comprised of an email thread, each email containing attachments. I would like to read the text from each email and extract the attachments. However, I cannot find information on how to read a pdf portfolio in python. I have tried using the libraries, PDFMiner and textract, but the output simply reads, "For the best experience, open this PDF portfolio in Acrobat X or Adobe Reader X, or later. Get Adobe Reader Now!"
有什么想法吗?谢谢!
推荐答案
pdfdetach
程序来自 poppler 实用程序可以提取附件.
The program pdfdetach
from the poppler utilities can extract attachments.
大多数类 UNIX 操作系统发行版都有一个 poppler-utils
包可用.您可以在 SourceForge 上找到 ms-windows 版本.
Most UNIX-like operating system distributions have a poppler-utils
package available. You can find a ms-windows version on SourceForge.
您可以使用 subprocess
模块从 Python 调用此程序.
You can use the subprocess
module to call this program from Python.
这篇关于用 Python 阅读 PDF 包?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!