在Tesseract 3中添加新字体 [英] Adding New Fonts to Tesseract 3

查看:247
本文介绍了在Tesseract 3中添加新字体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试向tesseract ocr添加新字体.我正在关注本教程但我遇到了一些问题.

I'm trying to add new fonts to tesseract ocr. I'm following this tutorial but I'm having some problems.

这是我到目前为止所做的:

Here's what I've done so far:

  1. 创建培训文档

  1. Create training document

convert eng.myfont.exp0.pdf eng.myfont.exp0.tif

火车Tesseract

Train Tesseract

tesseract eng.myfont.exp0.tif eng.myfont.exp0 batch.nochop makebox

这创建了我的eng.myfont.exp0.box文件.

This created my eng.myfont.exp0.box file.

我用moshpytt打开文件,并确保已正确检测到它.

I open the file with moshpytt and make sure it was detected correctly.

将盒子文件反馈回tesseract

Feed the box file back into tesseract

tesseract eng.myfont.exp0.tif eng.myfont.exp0.box nobatch box.train.stderr

我有这个结果:

带有Leptonica的Tesseract开源OCR引擎v3.03
APPLY_BOXES:
从boxfile中读取的框:146
找到146个好斑点.
TRAINING ...字体名称= myfont.exp0
生成了6个单词的训练数据

Tesseract Open Source OCR Engine v3.03 with Leptonica
APPLY_BOXES:
Boxes read from boxfile: 146
Found 146 good blobs.
TRAINING ... Font name = myfont.exp0
Generated training data for 6 words

  • eng.myfont.exp0.box.tr文件和eng.myfont.exp0.box.txt生成
  • 尝试检测框文件中使用的字符集(这是我卡住的地方)

    try to detect the Character set used in the box file (this is where I get stuck)

    unicharset_extractor *.box

    结果:

    unicharset_extractor:找不到命令

    unicharset_extractor: command not found

    我也尝试过unicharset_extractor eng.myfont.exp0.box,结果相同.

    I also tred unicharset_extractor eng.myfont.exp0.box with the same result.

    我正在使用:

    • tesseract 3.03
    • leptonica-1.70
    • libgif 4.1.6(?):libjpeg 8d:libpng 1.2.50:libtiff 4.0.3:zlib 1.2.8:webp 0.4.0
    • Ubuntu 14.04.1 LTS

    推荐答案

    Ubuntu 14.04省略了针对Tesseract 3.03 RC的培训工具.因此,要么退回到Tesseract 3.02,要么升级到应该具有的Ubuntu 14.10.

    The training tools for Tesseract 3.03 RC were omitted from Ubuntu 14.04. So either fall back to Tesseract 3.02 or upgrade to Ubuntu 14.10, which should have it.

    这篇关于在Tesseract 3中添加新字体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆