Tesseract OCR:仅识别完整的字典单词 [英] Tesseract OCR: Recognize complete dictionary words only

查看:1508
本文介绍了Tesseract OCR:仅识别完整的字典单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用tesseract OCR插件for phonegap: https://github.com/jcesarmobile/PhonegapOCRPlugin / i

I'm using the tesseract OCR plugin for phonegap: https://github.com/jcesarmobile/PhonegapOCRPlugin/i

我试图配置tesseract只识别完整的字典单词。这是:没有特殊字符,没有后缀或前缀等。

I'm trying to config tesseract to recognize complete dictionary words only. That is: no special characters, no suffixes or prefixes etc.

由于tessdata文件夹从这个项目不包含任何配置,我以为我会设置配置上init 。
现在我试图通过修改claseAuxiliar.mm设置配置,但我不能说我注意到任何区别,这可能是因为配置错误或我设置他们错了。下面是我的配置,以及我目前如何设置它们:

As the tessdata folder from this project doesn't contain any configs I thought I'd set configs on init. Right now I'm trying to set configs by modifying claseAuxiliar.mm but I can't say I've noticed any difference, this might be because the configs are wrong or that I'm setting them wrong. Below are my configs and how I'm currently trying to set them:

    // init the tesseract engine.
    tesseract = new tesseract::TessBaseAPI();
    tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
    if (!tesseract->SetVariable("segment_penalty_dict_nonword","10"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("segment_penalty_garbage","10"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("stopper_nondict_certainty_base","-100"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("language_model_penalty_non_dict_word","1"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("language_model_penalty_non_freq_dict_word","1"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("GARBAGE_STRING","5"))
    printf("Setting variable failed!!!\n");
    if (!tesseract->SetVariable("NON_WERD","5"))
    printf("Setting variable failed!!!\n");


推荐答案

您可能想要抑制系统字典,加载替代自定义字典。

You may want to try to suppress the system dictionary and load an alternative custom dictionary.

https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc

这篇关于Tesseract OCR:仅识别完整的字典单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆