从图像python识别明文 [英] identify clear text from image python

查看:106
本文介绍了从图像python识别明文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用pytesseract从图像中识别文本

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

然后我使用下面的代码标识文本

then i used below code to identify text

textImg = pytesseract.image_to_string(Image.open(imgLoc+"/"+imgName))

print(textImg)
text_file = open(imgLoc+"/"+"oriText.txt", "w")
text_file.write(textImg)
text_file.close()

这是我的输入图片

这是我的输出文本文件的图像

有什么方法可以从图像中清楚地识别文本

is there any way to identify the text clearly from image

推荐答案

您可以尝试通过缩短字符集并仅允许使用特定语言合法的字符(不包括数字,特殊字符等)来改善结果. 此答案会有所帮助.

Your can try improving the results by shortening the character set, and only allowing characters that are legal in your particular language (exclude numbers, special characters etc) . This Answer will help.

Tesseract OCR并不是找出图像中字符的最佳方法.您可以尝试稍微处理图像,以改善效果. 这会有所帮助

Tesseract OCR isn't the best at figuring out characters in a image. Your can try processing the image a bit, in order to improve the results. This will help

  • 确保图像dpi/ppi高于250,否则结果可能是 不准确.
  • Make sure the image dpi/ppi is above 250 otherwise the results may be inaccurate.

我通常更喜欢该网站www.onlineocr.net进行光学字符识别,因为每次的结果几乎都是完美的.您可以尝试使用自己的API进行字符识别(需要互联网连接才能正常工作).使用此API所获得的结果远远优于tesseract OCR.因此,您可以尝试一下.

I generally prefer this website www.onlineocr.net for doing Optical Character Recognition as the results are almost perfect each time. Your can try using their own API, for doing character recognition (requires internet connectivity to be functional). The Results obtained by using this API, are far superior then from tesseract OCR. So you may give it a try.

这篇关于从图像python识别明文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆