Python读取PDF文件 [英] Python to read pdf files

查看：116 发布时间：2020/5/25 4:30:09 python pdf

本文介绍了Python读取PDF文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我发现许多帖子都提出了阅读pdf的解决方案.我想逐字阅读pdf文件并对其进行一些处理.人们建议使用pdfMiner，它将整个pdf文件转换为文本文件.但是我想要的是逐字阅读pdf.谁能建议一个可以做到这一点的图书馆?

I have found many posts where solutions to read pdfs has been proposed. I want to read a pdf file word by word and do some processing on it. people suggest pdfMiner which converts entire pdf file into text file. But what i want is that to read pdfs word by word. Can anyone suggest a library that does this??

推荐答案

可能最快的方法是首先使用

Possibly the fastest way to do this is to first convert your pdf inta a text file using pdftotext (on pdfMiner's site, there's a statement that pdfMiner is 20 times slower than pdftotext) and afterwards parse the text file as usual.

此外，当您说我想逐字读取pdf文件并对其进行处理"时，您未指定是要基于pdf文件中的单词进行处理，还是实际上想要修改pdf文件本身.如果是第二种情况，那么您手上将面临一个完全不同的问题.

Also, when you said "I want to read a pdf file word by word and do some processing on it", you didn't specify if you want to do processing based on words in a pdf file, or do you actually want to modify the pdf file itself. If it's the second case, then you've got an entirely different problem on your hands.

这篇关于Python读取PDF文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python读取PDF文件 [英] Python to read pdf files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python读取PDF文件 [英] Python to read pdf files

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭