如何使用python从图像中提取文本或数字 [英] How to extract text or numbers from images using python

查看：660 发布时间：2020/5/19 19:25:03 python image ocr tesseract python-tesseract

本文介绍了如何使用python从图像中提取文本或数字的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想从这样的图像中提取文本(主要是数字)

I want to extract text (mainly numbers) from images like this

我尝试了此代码

import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
img = Image.open('1.jpg')
text = pytesseract.image_to_string(img, lang='eng')
print(text)

但是我得到的只是这个 (hE PPAR)

but all i get is this (hE PPAR)

推荐答案

执行OCR时，对图像进行预处理非常重要，因此要检测的期望文本为黑色，背景为白色.为此，这是一种简单的方法，使用OpenCV对Otsu的图像阈值进行处理，将生成二进制图像.这是预处理后的图像:

When performing OCR, it is important to preprocess the image so the desired text to detect is in black with the background in white. To do this, here's a simple approach using OpenCV to Otsu's threshold the image which will result in a binary image. Here's the image after preprocessing:

我们使用--psm 6配置设置将图像视为统一的文本块.您可以尝试以下其他配置选项. Pytesseract的结果

We use the --psm 6 configuration setting to treat the image as a uniform block of text. Here's other configuration options you can try. Result from Pytesseract

01153521976

代码

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.waitKey()

这篇关于如何使用python从图像中提取文本或数字的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用python从图像中提取文本或数字 [英] How to extract text or numbers from images using python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用python从图像中提取文本或数字 [英] How to extract text or numbers from images using python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭