在Tesseract 3中添加新字体 [英] Adding New Fonts to Tesseract 3

查看：247 发布时间：2020/5/19 19:32:39 ocr tesseract

本文介绍了在Tesseract 3中添加新字体的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试向tesseract ocr添加新字体.我正在关注本教程但我遇到了一些问题.

I'm trying to add new fonts to tesseract ocr. I'm following this tutorial but I'm having some problems.

这是我到目前为止所做的:

Here's what I've done so far:

创建培训文档

Create training document

convert eng.myfont.exp0.pdf eng.myfont.exp0.tif

火车Tesseract

Train Tesseract

tesseract eng.myfont.exp0.tif eng.myfont.exp0 batch.nochop makebox

这创建了我的eng.myfont.exp0.box文件.

This created my eng.myfont.exp0.box file.

我用moshpytt打开文件，并确保已正确检测到它.

I open the file with moshpytt and make sure it was detected correctly.

将盒子文件反馈回tesseract

Feed the box file back into tesseract

tesseract eng.myfont.exp0.tif eng.myfont.exp0.box nobatch box.train.stderr

我有这个结果:

带有Leptonica的Tesseract开源OCR引擎v3.03
APPLY_BOXES:
从boxfile中读取的框:146
找到146个好斑点.
TRAINING ...字体名称= myfont.exp0
生成了6个单词的训练数据

Tesseract Open Source OCR Engine v3.03 with Leptonica
APPLY_BOXES:
Boxes read from boxfile: 146
Found 146 good blobs.
TRAINING ... Font name = myfont.exp0
Generated training data for 6 words

eng.myfont.exp0.box.tr文件和eng.myfont.exp0.box.txt生成

尝试检测框文件中使用的字符集(这是我卡住的地方)

try to detect the Character set used in the box file (this is where I get stuck)

unicharset_extractor *.box

结果:

unicharset_extractor:找不到命令

unicharset_extractor: command not found

我也尝试过unicharset_extractor eng.myfont.exp0.box，结果相同.

I also tred unicharset_extractor eng.myfont.exp0.box with the same result.

我正在使用:

tesseract 3.03
leptonica-1.70
libgif 4.1.6(?):libjpeg 8d:libpng 1.2.50:libtiff 4.0.3:zlib 1.2.8:webp 0.4.0
Ubuntu 14.04.1 LTS

在Tesseract 3中添加新字体 [英] Adding New Fonts to Tesseract 3

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Tesseract 3中添加新字体 [英] Adding New Fonts to Tesseract 3

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭