使用python pdfminer提取整个pdf数据 [英] Extracting entire pdf data with python pdfminer

查看：479 发布时间：2020/7/2 20:00:46 python pdf-reader

本文介绍了使用python pdfminer提取整个pdf数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用pdfminer使用python从pdf文件中提取数据.我想提取pdf中存在的所有数据，而不管它是图像还是文本，无论它是什么.我们可以在一行中执行此操作吗?任何帮助表示赞赏.预先感谢

I am using pdfminer to extract data from pdf files using python. I would like to extract all the data present in pdf irrespective of wheather it is an image or text or whatever it is. Can we do that in a single line(or two if needed, without much work). Any help is appreciated. Thanks in advance

推荐答案

我们可以在一行中执行此操作吗(如果需要，可以执行两行，而无需太多工作).

Can we do that in a single line(or two if needed, without much work).

不，您不能. Pdfminer功能强大，但级别较低.

No, you cannot. Pdfminer is powerful but it's rather low-level.

不幸的是，文档并不完全详尽.多亏了Denis Papathanasiou的一些代码，我得以找到解决方法.在他的博客中讨论了该代码，您可以找到源代码此处: layout_scanner.py

Unfortunately, the documentation is not exactly exhaustive. I was able to find my way around it thanks to some code by Denis Papathanasiou. The code is discussed in his blog, and you can find the source here: layout_scanner.py

另请参见此答案，在此我会提供更多详细信息.

See also this answer, where I give a little more detail.

这篇关于使用python pdfminer提取整个pdf数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用python pdfminer提取整个pdf数据 [英] Extracting entire pdf data with python pdfminer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用python pdfminer提取整个pdf数据 [英] Extracting entire pdf data with python pdfminer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭