Tesseract无法读取这个非常简单的数字字符串 [英] Tesseract has trouble reading this extremely simple string of numbers

查看:487
本文介绍了Tesseract无法读取这个非常简单的数字字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在用python编写一个脚本,该脚本需要使用tesseract来读取这样的数字:

I'm currently writing a script in python that requires the use of tesseract to read a number like this:

仅使用数字和-psm 6(或7),它输出5.551

Using digits only and -psm 6 (or 7) it outputs 5.551

我在其他数字上取得了一些成功(5.700作品),但是这个特殊的数字给我带来了很多问题.不幸的是,我的程序需要高度的准确性,但是我认为tesseract能够解密这样一个简单的字符串.

I have had some success with other numbers (5.700 works) but this particular number is giving me a ton of problems. Unfortunately i need a high degree of accuracy for my program but i thought tesseract would be able to decipher such a simple string.

我也尝试过使用GOCR,并且可以正确读取6.881(是!),但输出5._00为5.700(boo!)

I have also tried to use GOCR and that correctly read 6.881 (yay!) but gave the output 5._00 for 5.700 (boo!)

有人知道为什么要这么做吗?

Any idea why it would be doing this?

或更重要的是,我可以做任何事情来解决这个问题(最好不用培训tesseract).

Or more importantly, anything i can do to get around the problem ( preferably without having to train tesseract ).

推荐答案

我使用Imagemagick(如果需要,可以使用其他方式)将其尺寸加倍,并删除了透明度(用白色代替),而Tesseract OCR则对增强功能进行了改进正确显示图片:

I doubled its size and removed the transparency (replacing it with white) using Imagemagick (you can use something else if you want) and Tesseract OCR'd the enhanced image correctly:

$ convert I1Zau.png -background white -flatten -resize 200% I1Zau_2.png
$ tesseract I1Zau_2.png o.txt
$ cat o.txt.txt 
6.881

这篇关于Tesseract无法读取这个非常简单的数字字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆