在Tesseract中禁用字典 [英] Disable dictionary in Tesseract

查看:485
本文介绍了在Tesseract中禁用字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在运行Tesseract for English时禁用字典更正?

How can I disable dictionary corrections when running Tesseract for English language?

我目前正在以子进程的方式运行tesseract。

I'm currently running tesseract as a child process.

推荐答案

尝试将这些变量设置为false:

Try to set these variables (put them in a config file) to false:

load_system_dawg 
load_freq_dawg
load_punc_dawg
load_number_dawg
load_unambig_dawg
load_bigram_dawg
load_fixed_length_dawgs

https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/Disable$20dictionary$20​​in$20Tesseract/tesseract- ocr / 5nvIo1DJxHE / f3gBi2pTKykJ

另请参阅如何增加对字典的信任/力量?在常见问题。从中:

Also read How to increase the trust in/strength of the dictionary? in the FAQ. From it:


对于tesseract-ocr< 3.01尝试将dict / permute.cpp中的NON_WERD和GARBAGE_STRING设置为3或甚至5。

For tesseract-ocr < 3.01 try upping NON_WERD and GARBAGE_STRING in dict/permute.cpp to maybe 3 or even 5.

对于tesseract-ocr> = 3.01,尝试增加变量 language_model_penalty_non_freq_dict_word language_model_penalty_non_dict_word 。默认情况下,它们分别为0.1和0.15。

For tesseract-ocr >= 3.01 try increasing the variables language_model_penalty_non_freq_dict_word and language_model_penalty_non_dict_word in a config file. By default they are 0.1 and 0.15 respectively.

这篇关于在Tesseract中禁用字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆