使用配置文件对 Tesseract 3.02 的字符置信度 [英] Character confidence for Tesseract 3.02 using config file

查看:58
本文介绍了使用配置文件对 Tesseract 3.02 的字符置信度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何获得检测到的每个字符的 % 置信度?通过四处搜索,我发现您应该将 save_blob_choices 设置为 T.所以我将它添加到 tessdata/configs 中的 hocr 配置文件中的一行,并用它调用 tesseract.这就是我在生成的 html 文件中得到的全部内容:

How would I get the % confidence per character detected? By searching around I found that you should set save_blob_choices to T. So I added that to as a line in the hocr config file in tessdata/configs and called tesseract with it. This is all I'm getting in the generated html file:

<span class='ocr_line' id='line_1' title="bbox 0 0 50 17"><span class='ocrx_word' id='word_1' title="bbox 3 2 45 15"><strong>31,835</strong></span>

如您所见,甚至每个单词都没有任何置信度注释.

As you can see there isn't any confidence annotations not even per word.

我没有 Visual Studio,因此无法进行任何代码更改.但我也愿意接受描述代码更改以及如何在没有 VS 的情况下编译代码的答案.

I don't have visual studio so I'm not able to make any code changes. But I'm also open to answers describing code changes as well as how I would compile the code without VS.

推荐答案

这里是获取每个单词置信度的示例代码.您甚至可以将 RIL_WORD 替换为 RIL_SYMBOL 以获得对每个字符的信心.

Here is the sample code of getting confidence of each word. You can even replace RIL_WORD with RIL_SYMBOL to get confidence of each character.

mTess.Recognize(0);
tesseract::ResultIterator* ri = mTess.GetIterator();
if(ri != 0)
{
    do
    {
        const char* word = ri->GetUTF8Text(tesseract::RIL_WORD);
        if(word != 0 )
        {
            float conf = ri->Confidence(tesseract::RIL_WORD);
            printf("  word:%s, confidence: %f", word, conf );
        }
        delete[] word;
    } while((ri->Next(tesseract::RIL_WORD)));

    delete ri;
}

这篇关于使用配置文件对 Tesseract 3.02 的字符置信度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆