在 tesseract C++ API 中禁用字典辅助 OCR [英] Disable dictionary-assisted OCR in tesseract C++ API

查看:23
本文介绍了在 tesseract C++ API 中禁用字典辅助 OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用 tesseract API 对技术数据表进行 OCR 处理的应用程序.我是这样初始化的:

I have an application where technical datasheets are OCR'd using the tesseract API. I initialize it like this:

tesseract::TessBaseAPI tess;
tess.Init(NULL, "eng", tesseract::OEM_TESSERACT_ONLY);

然而,即使使用了这样的自定义白名单

However, even after using custom whitelists like this

tess.SetVariable("tessedit_char_blacklist", "");
tess.SetVariable("tessedit_char_whitelist", myWhitelist);

某些数据表条目被错误识别,例如 PA3 被识别为 FAB.

some datasheet entries are recognized wrongly, for example PA3 is recognized as FAB.

如何禁用字典辅助 OCR,即 .为了不影响其他工具,如果可能的话,我不想修改全局配置文件.

How can I disable the dictionary-assisted OCR, i.e. . In order to not affect other tools I don't want to modify global config files if possible.

注意:这不是重复这个上一个问题因为上述问题明确要求命令行工具,而我明确要求tesseract API.

Note: This is not a duplicate of this previous question because said question explicitly asks for the command-line tool while I explicitly ask for the tesseract API.

推荐答案

您可以通过以下方式进行

You can do it in following way

tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng"))
{
    fprintf(stderr, "Could not initialize tesseract.\n");
    exit(1);
}

if(!api->SetVariable("tessedit_enable_doc_dict", "0"))
{
    cout << "Unable to enable dictionary" << endl;
}

只需将 "tessedit_enable_doc_dict" 作为参数传递给 SetVariable 函数及其对应的布尔值.

Simply pass "tessedit_enable_doc_dict" as a parameter to SetVariable function and it's corresponding boolean value.

我在 tesseractclass.h https://tesseract-ocr.github.io/a00736_source.html 头文件(第 839 行),我想找到正确参数的最佳方法是查看其中定义的值(与您的版本相对应的头文件.我的是 3.04).我尝试了一些我之前在互联网上找到的但没有奏效的方法.这是我的工作配置.

I found it in tesseractclass.h https://tesseract-ocr.github.io/a00736_source.html header file(line 839) and i guess best way to find correct parameters is by looking at the values defined at it(header file corresponding to your version. mine is 3.04). I tried few i found on internet before but didn't work. This was the working configuration to me.

这篇关于在 tesseract C++ API 中禁用字典辅助 OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆