Tesseract定制词典 [英] Custom Dictionary for Tesseract

查看:81
本文介绍了Tesseract定制词典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用tesseract OCR进行Android项目.我希望通过添加字典来微调提供给用户的结果.根据 http://code.google.com/p/tesseract-ocr/wiki /FAQ ,最好的方法是

I am currently working on a project for android using tesseract OCR. I was hoping to fine tune the results given to the user by adding a dictionary. According to http://code.google.com/p/tesseract-ocr/wiki/FAQ , the best way to go about this would be to

用相同的单词列表替换tessdata/eng.user-words 格式-UTF8文字,每行一个字.

Replace tessdata/eng.user-words with your own word list, in the same format - UTF8 text, one word per line.

但是,在tessdata文件夹中没有eng.user-words文件时,我假设如果我只创建一个带有字典的文本文件,则它将永远不会使用..

However there is no eng.user-words file in the tessdata folder, I assume that if I just make a text file with my dictionary in it, it will never be used..

有人有过类似的经历并且知道该怎么做吗?任何建议将是一个很大的帮助.

Has anybody had a similar experience and knows what to do? Any advice would be a great help.

推荐答案

如果您使用的是tesseract 3(我以为您是). 您必须重建eng.trainddata文件 我打算完全替换word-dawg文件以尝试获得更好的结果(即-我检测到的单词始终相同).

if you're using tesseract 3 (which I assume you are). You'll have to rebuild your eng.trainddata file I intended to replace the word-dawg file completely to try to get better results (ie - the words i'm detecting are always the same).

编译tesseract时,在培训目录中将需要Combine_tessdata和wordlist2dawg可执行文件.

you'll need combine_tessdata and wordlist2dawg executables in the training directory when you compile tesseract.

  1. 解压缩所有内容(我这样做只是为了备份eng.word-dawg,稍后您还需要unicharset)

  1. unpack everything (i did this just to back up my eng.word-dawg, you'll also need the unicharset later)

./combine_tessdata -u eng.traineddata

创建单词列表的文本文件(wordlistfile)

create a textfile of your wordlist (wordlistfile)

创建eng.word-dawg

create a eng.word-dawg

./wordlist2dawg wordlistfile eng.word-dawg trainingdat_backup/.unicharset

替换word-dawg文件

replace the word-dawg file

./combine_tessdata -o eng.traineddata eng.word-dawg

应该的.

这篇关于Tesseract定制词典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆