Tesseract无法读取这个非常简单的数字字符串 [英] Tesseract has trouble reading this extremely simple string of numbers
问题描述
我目前正在用python编写一个脚本,该脚本需要使用tesseract来读取这样的数字:
I'm currently writing a script in python that requires the use of tesseract to read a number like this:
仅使用数字和-psm 6(或7),它输出5.551
Using digits only and -psm 6 (or 7) it outputs 5.551
我在其他数字上取得了一些成功(5.700作品),但是这个特殊的数字给我带来了很多问题.不幸的是,我的程序需要高度的准确性,但是我认为tesseract能够解密这样一个简单的字符串.
I have had some success with other numbers (5.700 works) but this particular number is giving me a ton of problems. Unfortunately i need a high degree of accuracy for my program but i thought tesseract would be able to decipher such a simple string.
我也尝试过使用GOCR,并且可以正确读取6.881(是!),但输出5._00为5.700(boo!)
I have also tried to use GOCR and that correctly read 6.881 (yay!) but gave the output 5._00 for 5.700 (boo!)
有人知道为什么要这么做吗?
Any idea why it would be doing this?
或更重要的是,我可以做任何事情来解决这个问题(最好不用培训tesseract).
Or more importantly, anything i can do to get around the problem ( preferably without having to train tesseract ).
推荐答案
我使用Imagemagick(如果需要,可以使用其他方式)将其尺寸加倍,并删除了透明度(用白色代替),而Tesseract OCR则对增强功能进行了改进正确显示图片:
I doubled its size and removed the transparency (replacing it with white) using Imagemagick (you can use something else if you want) and Tesseract OCR'd the enhanced image correctly:
$ convert I1Zau.png -background white -flatten -resize 200% I1Zau_2.png
$ tesseract I1Zau_2.png o.txt
$ cat o.txt.txt
6.881
这篇关于Tesseract无法读取这个非常简单的数字字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!