使用python pdfminer提取整个pdf数据 [英] Extracting entire pdf data with python pdfminer

查看:479
本文介绍了使用python pdfminer提取整个pdf数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用pdfminer使用python从pdf文件中提取数据.我想提取pdf中存在的所有数据,而不管它是图像还是文本,无论它是什么.我们可以在一行中执行此操作吗?任何帮助表示赞赏.预先感谢

I am using pdfminer to extract data from pdf files using python. I would like to extract all the data present in pdf irrespective of wheather it is an image or text or whatever it is. Can we do that in a single line(or two if needed, without much work). Any help is appreciated. Thanks in advance

推荐答案

我们可以在一行中执行此操作吗(如果需要,可以执行两行,而无需太多工作).

Can we do that in a single line(or two if needed, without much work).

不,您不能. Pdfminer功能强大,但级别较低.

No, you cannot. Pdfminer is powerful but it's rather low-level.

不幸的是,文档并不完全详尽.多亏了Denis Papathanasiou的一些代码,我得以找到解决方法.在他的博客中讨论了该代码,您可以找到源代码此处: layout_scanner.py

Unfortunately, the documentation is not exactly exhaustive. I was able to find my way around it thanks to some code by Denis Papathanasiou. The code is discussed in his blog, and you can find the source here: layout_scanner.py

另请参见此答案,在此我会提供更多详细信息.

See also this answer, where I give a little more detail.

这篇关于使用python pdfminer提取整个pdf数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆