Tesseract错误空间识别 [英] Tesseract False Space Recognition

查看：410 发布时间：2020/5/19 19:32:33 c++ opencv ocr tesseract spaces

本文介绍了Tesseract错误空间识别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用tesseract识别序列号.这是可以接受的，常见的问题，例如错误识别零和"O"，6和5或M和H. 除此之外，tesseract还为识别出的单词添加了空格，图像中没有空格.下图被识别为"HI 3H" .

I'm using tesseract to recognize a serial number. This works acceptable, common problem like false recognition of zero and "O", 6 and 5, or M and H exists. Beside by this tesseract adds spaces to the recognized words, where no space is in the image. The following image is recognized as "HI 3H".

此图片生成"FBKHJ 1R1"

所以tesseract添加了一个空格，尽管图像中实际上没有空格. 是否有可能使tesseract的间隔行为参数化?

So tesseract added a space, although there isn't really a space in the image. Is there a possibility parametrize the spacing behavior of tesseract?

修改

很抱歉，忘记了添加，我也有包含空格的序列号.因此，我无法删除识别的序列号内的所有空格.

I'm sorry, have forgot to add, that I also have serial numbers which include spaces. So I cannot delete all spaces inside the recognized serial number.

例如，下面的包含序列号中空格的图像将在tesseract识别后生成: J4 F1583BB .除了字符的识别是错误的之外，该图像还可以识别出正确的空格.

For example the following image containing a space in the serial number results after tesseract recognition into: J4 F1583BB. Beside that the recognition of the characters is false, the space is recognized correct with this image.

我对tesseract的实际参数是:

My actual parameters for tesseract are:

tesseract::TessBaseAPI tess;
tess.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);
tess.SetPageSegMode(tesseract::PSM_SINGLE_BLOCK);
tess.SetVariable("tessedit_char_whitelist",
            "ABCDEFGHIJKLMNOPQRSTUVWXYZ012345789");

char* out = tess.GetUTF8Text();
string text = string(out);

修改

从已经存在的答案中可以注意到，例如，"J"和"I"之间的间隔似乎比其他字符之间的间隔小.我选择的字体类型是Monotype字体.原因是我认为这有助于tesseract进行字符识别.每个字符都具有相同宽度的Monospace字体类型的缺点是内核(字符之间的间隔)不同. 请参见以下来源的示例图片来源

It is notices from already existing answers, that the space between the "J" and "I" for example seems to be little more, than between the other characters. The font-type I have chosen is a Monotype Font. Reason for this is that I thought, that this helps tesseract for character recognition. Drawback of such a Monospace font-type, where every character has the same width, is that the kernel (the space between the characters) varies. See example image of following source Source

您认为哪种字体类型会获得更好的识别效果?

Which font type do you think, will achieve better recognition results?

Tesseract错误空间识别 [英] Tesseract False Space Recognition

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

Tesseract错误空间识别 [英] Tesseract False Space Recognition

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭