使用Python从PDF中的物理坐标返回文本字符串 [英] Return text string from physical coordinates in a PDF with Python

查看：601 发布时间：2020/5/25 4:21:09 python pdf

本文介绍了使用Python从PDF中的物理坐标返回文本字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在过去的几个小时中，我一直在与Google和PDFMiner的有限文档作战，尽管我感到很亲密，但我仍无法获得所需的东西.我已经完成了 http://www.unixuser.org/~euske/python/pdfminer /和所有三个YouTube视频，以更好地了解PDF，我可以输出原始文本.

I have been battling with Google and the limited documentation of PDFMiner for the last several hours, and although I feel close, I'm just not getting what I need. I've worked through http://www.unixuser.org/~euske/python/pdfminer/ and all three of the YouTube videos to gain a better understanding about PDFs and I'm able to output raw text just fine.

我正在研究一个脚本来解析多个PDF页面.不幸的是，对于这个项目，我处理的是质量较差的PDF文件，我看到的唯一可靠的常数是文本字符串的物理位置完全相同.尽管我读过一些暗示，可以通过物理坐标提取文本字符串，但是我还没有看到一个可行的示例.

I am working on a script to parse multiple PDF pages. Unfortunately, for this project I am dealing with poor quality PDF files, and the only reliable constant I see is the physical location of text strings being exactly the same. Although I've read hints that text strings can be extracted by physical coords, I have yet to see a working example.

有没有人可以阐明如何使用PDFMiner做到这一点?如果有明显更好的选择，我可以开放其他模块，但是我需要坚持使用Python作为脚本.

Is there anyone out there who could shed some light on how this is done with PDFMiner? I am open to other modules if there is an obvious better choice, however I need to stick with Python for the script.

此外，我也尝试过PyPdf也没有成功(除了基本文本输出).

Additionally, I have tried PyPdf to no success as well (other than basic text output).

谢谢！

使用Python从PDF中的物理坐标返回文本字符串 [英] Return text string from physical coordinates in a PDF with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python从PDF中的物理坐标返回文本字符串 [英] Return text string from physical coordinates in a PDF with Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭