Tesseract或任何其他OCR库 [英] Tesseract or any other OCR lib

查看:185
本文介绍了Tesseract或任何其他OCR库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个解释/ API文档/如何使用(和训练?)Tesseract在C + +,没有什么有用的在谷歌Tesseract页面,但还没有在网上找到的东西。

I'm looking for an explanation / API doc / examples of how to use (and train?) Tesseract in C++, nothing useful on the google Tesseract page, and yet to find something over the web.

任何有用的来源,经验都会受到欢迎,因为我不知道如何开始。

Anyone useful sources, experiences would be more than welcome, as I have no idea how to begin with it.

PS:


  1. 我对其他
    图书馆提出建议。

  2. 只有免费图书馆

  1. I'm open for suggestions on other libraries.
  2. Only FREE libraries


推荐答案

with Tesseract ...
a简单google的training tesseract显示此页面:
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract
其中必须选择要培训的tesseract版本。
虽然3是最新版本,它是全新的,因此人们仍然淘汰任何问题 - 仍然使用版本2.4。无论如何,你会看到有大约9个步骤在训练tesseract一个特定的语言(或应该被称为字体或字符集)。你也可以使用现有的eng语言 - 但这取决于你的应用程序。例如,在我的应用程序中,我必须做文档分析,并采取一个特定的区域,并希望OCR一个13个字符的数字字符串 - 我需要高精度 - 我不想要读取'5'作为' S'和'0'作为'O'等,所以它是逻辑上创建一个特定的语言的我的特定字体集只包含字符0..9,而你可能不在乎,如果你得到额外的

I have some experience with Tesseract... a simple google of 'training tesseract' reveals this page: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract where you must choose which version of tesseract you wish to train.. While 3 is the latest version, it's brand new and thus people are still ironing out any issues - im still using version 2.4. Anyways, you'll see there are about 9 steps in training tesseract for a particular 'language' (or what should have been called 'fonts' or 'character-sets'). You could also just use the existing 'eng' language - but it depends on your application. For example, in my application I would have to do the document analysis and take a particular region and want to OCR a 13-character string of numbers - and I needed high accuracy - and I didn't want it reading '5' as 'S' and '0' as 'O' etc, so it was logical to create a particular 'language' of my particular font-set consisting only of the characters 0..9, whereas you might not care if you get extra 'noise

这篇关于Tesseract或任何其他OCR库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆