Tesseract 似乎不适用于数字 [英] Tesseract doesn't seem to work with digits
问题描述
我按照常见问题解答让 Tesseract 识别数字,但我得到的只是输出文件中的一堆文本,尽管我的图像中只有数字.
I followed the FAQ to make Tesseract recognize digits, but all I get is a bunch of text in the output file, despite having only numbers in my image.
我的命令行如下所示:
tesseract --tessdata-dir ./ ./input.jpg ./output/output digits
任何想法可能会发生什么?.
Any ideas what could be happening?.
推荐答案
如中所述tesseract github 问题 您不能使用 tesseract 4.0 LSTM 将字符列入黑名单或白名单,而应该使用您期望的图像字符训练 LSTM.
As mentioned in tesseract github issue you can't black or whitelist characters with tesseract 4.0 LSTM, instead you should train LSTM with characters you expect on your image.
感谢 Shreeshrii,您可以尝试他的实验性"数字训练数据来自 这里
Thanks to Shreeshrii you can try his 'experimantal' digits traineddata from here
请注意,Tesseract 4.0 仍处于 alpha 阶段,如果您愿意,您仍然可以使用 3.* 版本的 tesseract 来支持您的需求.Tesseract v 3.4 tessdata位于这里,windows库可以下载来自这里
Please note that Tesseract 4.0 is still in alpha stage and if you want - you can still use 3.* versions of tesseract which support your needs from the box. Tesseract v 3.4 tessdata is located here, library for windows can be downloaded from here
这篇关于Tesseract 似乎不适用于数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!