Pytesseract OCR多个配置选项 [英] Pytesseract OCR multiple config options

查看:603
本文介绍了Pytesseract OCR多个配置选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pytesseract上遇到了一些问题.我需要将Tesseract配置为可以接受单个数字,同时也只能接受数字,因为数字0经常与'O'混淆.

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an 'O'.

赞:

target = pytesseract.image_to_string(im,config='-psm 7',config='outputbase digits')

推荐答案

tesseract-4.0.0apsm下提供支持.如果要识别单个字符,请设置psm = 10.如果您的文字仅包含数字,则可以设置tessedit_char_whitelist=0123456789.

tesseract-4.0.0a supports below psm. If you want to have single character recognition, set psm = 10. And if your text consists of numbers only, you can set tessedit_char_whitelist=0123456789.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

这是image_to_string带有多个参数的示例用法.

Here is a sample usage of image_to_string with multiple parameters.

target = pytesseract.image_to_string(image, lang='eng', boxes=False, \
        config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

希望这会有所帮助.

这篇关于Pytesseract OCR多个配置选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆