为Tesseract训练特定单词-可能吗? [英] Train Tesseract for specific words - possible?

查看:117
本文介绍了为Tesseract训练特定单词-可能吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Tesseract从文档中提取大约10-20个关键字.该文档将包含所有英文字符/单词.我感兴趣的是年龄:23​​"之类的东西.在这里,年龄是我感兴趣的关键字,并且也要提取23(该值).

I want to use Tesseract to extract about 10-20 keywords from a document. The document will contain all English characters/words. What I am interested in is something like "Age: 23". Here Age is the keyword I am interested in and want to extract the 23 (the value for that) as well.

我想到的第一种方法是将整个页面提取到文本中,然后在可识别的文本中查找关键字.但是,在训练整形方面,如果我知道这些关键字,是否有更好的方法,这可能会导致更好的准确性?

The first approach that comes in my mind is to extract the whole page into text and then look for keywords in the recognized text. But in terms of training the tesseract, is there a better approach if I know the keywords, which might result in a better accuracy?

我或多或少意识到Tesseract OCR的局限性.试图在该限制内最大化.感谢您的所有专家意见.

I am more or less aware of the limitations of Tesseract OCR. Trying to maximize within that limitations. Thanks for all your expert advice.

推荐答案

尝试 查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆