使用配置文件对 Tesseract 3.02 的字符置信度 [英] Character confidence for Tesseract 3.02 using config file
问题描述
如何获得检测到的每个字符的 % 置信度?通过四处搜索,我发现您应该将 save_blob_choices 设置为 T.所以我将它添加到 tessdata/configs 中的 hocr 配置文件中的一行,并用它调用 tesseract.这就是我在生成的 html 文件中得到的全部内容:
How would I get the % confidence per character detected? By searching around I found that you should set save_blob_choices to T. So I added that to as a line in the hocr config file in tessdata/configs and called tesseract with it. This is all I'm getting in the generated html file:
<span class='ocr_line' id='line_1' title="bbox 0 0 50 17"><span class='ocrx_word' id='word_1' title="bbox 3 2 45 15"><strong>31,835</strong></span>
如您所见,甚至每个单词都没有任何置信度注释.
As you can see there isn't any confidence annotations not even per word.
我没有 Visual Studio,因此无法进行任何代码更改.但我也愿意接受描述代码更改以及如何在没有 VS 的情况下编译代码的答案.
I don't have visual studio so I'm not able to make any code changes. But I'm also open to answers describing code changes as well as how I would compile the code without VS.
推荐答案
这里是获取每个单词置信度的示例代码.您甚至可以将 RIL_WORD 替换为 RIL_SYMBOL 以获得对每个字符的信心.
Here is the sample code of getting confidence of each word. You can even replace RIL_WORD with RIL_SYMBOL to get confidence of each character.
mTess.Recognize(0);
tesseract::ResultIterator* ri = mTess.GetIterator();
if(ri != 0)
{
do
{
const char* word = ri->GetUTF8Text(tesseract::RIL_WORD);
if(word != 0 )
{
float conf = ri->Confidence(tesseract::RIL_WORD);
printf(" word:%s, confidence: %f", word, conf );
}
delete[] word;
} while((ri->Next(tesseract::RIL_WORD)));
delete ri;
}
这篇关于使用配置文件对 Tesseract 3.02 的字符置信度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!