使用Python从扫描的pdf中提取PDF数据 [英] Pdf data extraction from scanned pdf using python

查看：23 发布时间：2022/3/27 15:50:42 python-3.x ocr python-tesseract pdfminer pdf-extraction

本文介绍了使用Python从扫描的pdf中提取PDF数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用tesseract OCR从扫描的pdf中提取数据，我能够提取数据，但精度不是很好。在很多地方，它显示错误的数据，所以我可以100%准确地获取数据。

首先我将pdf转换为jpg格式，然后使用tesseract模块从图像中提取数据。

from PIL import Image
import pytesseract

text=(pytesseract.image_to_string(Image.open(r"C:UserssumeshDesktopipippdf11.jpg")))
text=repr(text)
text=text.replace(r"
","")
print(text)

我期望从pdf获得正确的数据，但我得到的数据不同，例如z显示2，5是s，1是i，依此类推

使用Python从扫描的pdf中提取PDF数据 [英] Pdf data extraction from scanned pdf using python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Python从扫描的pdf中提取PDF数据 [英] Pdf data extraction from scanned pdf using python

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭