使用tesseract 3.01的字符置信度值 [英] character-wise confidence values using tesseract 3.01

查看:62
本文介绍了使用tesseract 3.01的字符置信度值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我执行了以下代码来生成字符置信度值:

i executed the following code to generate character-wise confidence values:

int main(int argc, char **argv) {

    const char *lang="eng";
    const PIX   *pixs;
     if ((pixs = pixRead(argv[1])) == NULL) {
       cout <<"Unsupported image type"<<endl;
        exit(3);
      }
    TessBaseAPI  api;
    api.SetVariable("save_blob_choices", "T");
    api.SetPageSegMode(tesseract::PSM_SINGLE_WORD  );        
    api.SetImage(pixs);
    int rc = api.Init(argv[0], lang);
    api.Recognize(NULL);
    ResultIterator* ri = api.GetIterator();
    if(ri != 0)
    {
        do
        {
            const char* symbol = ri->GetUTF8Text(RIL_SYMBOL);
            if(symbol != 0)
            {
                float conf = ri->Confidence(RIL_SYMBOL);
                cout<<"\nnext symbol: "<< symbol << " confidence: " << conf <<"\n" <<endl;

             }


            delete[] symbol;
                }    while((ri->Next(RIL_SYMBOL)));
    }
    return 0;
}

图片链接

上图得到的输出是:

下一个符号:N 置信度:72.3563 下一个符号:B 置信度:72.3563

next symbol: N confidence: 72.3563 next symbol: B confidence: 72.3563

下一个符号:E 置信度:69.9937 下一个符号:T 置信度:69.9937
下一个符号:R 信心:69.9937 下一个符号:A 信心:69.9937
下一个符号:N 置信度:69.9937 下一个符号:G 置信度:69.9937
下一个符号:- 信心:69.9937 下一个符号:我信心:69.9937

next symbol: E confidence: 69.9937 next symbol: T confidence: 69.9937
next symbol: R confidence: 69.9937 next symbol: A confidence: 69.9937
next symbol: N confidence: 69.9937 next symbol: G confidence: 69.9937
next symbol: - confidence: 69.9937 next symbol: I confidence: 69.9937

很明显,属于同一单词的字符的置信度值是相同的.这是预期的输出吗?每个角色的置信度值不应该不同吗?我尝试为一个单词执行代码,其中每个字符的字体样式都不同.然而,属于同一个单词的字符的置信度值是相同的.

As is evident, the confidence values for characters belonging to the same word is the same. Is this the expected output? Shouldn't the confidence values be different for each character? I tried executing the code for a word in which each character was in different font style..and yet, the confidence value was the same for characters belonging to the same word.

推荐答案

问题是您在调用 SetVariable 之后 调用了 Init.

The issue is that you're calling Init after the SetVariable call.

这篇关于使用tesseract 3.01的字符置信度值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆