我如何从提高的tesseract的OCR文字的准确性? [英] How do I improve the accuracy of the OCR text from Tesseract?

查看:4564
本文介绍了我如何从提高的tesseract的OCR文字的准确性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个基本的应用程序使用从谷歌的tesseract API识别文本,并将其集成与我的相机应用程序。它工作正常,但唯一的问题是精度,因为有时文本被认为是随机的字符集和我猜的精度约为50%。

I created a basic app for recognizing text using the Tesseract API from Google and integrated it with my camera app. It works fine but the only problem is the accuracy, as sometimes the text is recognized as a random set of characters and I guess the accuracy is about 50 percent.

此外,当尝试扫描图像中的四个以上的话,应用程序崩溃。

Further, when it tries to scan more than four words in an image, the app crashes.

String ocrText = baseApi.getUTF8Text();
baseApi.end();

其中, baseApi 是的tesseract API类的对象。

where baseApi is the object of the Tesseract API class.

我是否需要使用不同的数据结构来保存识别文本或者是有一些其他的原因,超过四个字没有得到认可?

Do I need to use a different data structure to save the recognized text or is there some other reason why more than four words don't get recognized?

推荐答案

的tesseract API类提供了一个<一个href="http://zdenop.github.com/tesseract-doc/group___advanced_a_p_i.html#ga2c06caf08c9a8aa97a08a2de2f6200df"相对=nofollow> isValidWord 方法来检查,如果字符串是一个有效的字。你可以用它来检查识别的字符。这将增加输出的精确度。

Tesseract API class provides a isValidWord Method to check if the string is a valid word. You can use this to check the recognized characters. This will increase the accuracy of the output.

我正在开发使用Tess4j这是一个Java JNA包装的tesseract-OCR,并给出了检查后,效果颇佳。

I am developing using Tess4j Which is a Java JNA wrapper for tesseract-ocr, and it gives quite good results after checking.

不准确的结果可能是由于文字的大小,检查这个出来。它说:准确度低于10PT x 300dpi的脱落,迅速下方8PT x 300dpi的。

Inaccurate results might be due to the text size, check this out. It says "Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi."

此外,不能够检测4个以上的单词取决于很多因素,有什么样(用多少功能)的测试图像时,图像的大小,平台等。

Further, not being able to detect more than 4 words depends on a lot of factors, what kind (with how many features) of test image, the size of the image, platform etc.

这篇关于我如何从提高的tesseract的OCR文字的准确性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆