使用tesseract进行字符识别 [英] Character Recognition using tesseract

查看：194 发布时间：2018/7/30 17:11:51 c++ opencv image-processing ocr tesseract

本文介绍了使用tesseract进行字符识别的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试与 tesseract API进行交互，我也是图像处理的新手，我在最近几天正在努力解决这个问题。我尝试过简单的算法，并且我已经达到了大约70％的准确度。

I am trying to interact with tesseract API also I am new to image processing and I am just struggling with it for last few days. I have tried simple algorithms and I have achieved around 70% accuracy.

我希望它的准确度为90 +％。图像的问题在于它们是72dpi。我也尝试提高分辨率，但没有得到好的结果我想要识别的图像被附加。

I want its accuracy to be 90+%. The problem with the images is that they are in 72dpi. I also tried to increase the resolution but did not get good results the images which I am trying to be recognized are attached.

任何帮助都会受到赞赏，如果我问了一些非常基本的东西，我很抱歉。

Any help would be appreciated and I am sorry if I asked something very basic.

编辑

我忘了提到我试图在2-2.5秒内在 Linux 平台和方法上进行所有处理和识别，以检测此答案中提到的文字需要花费大量时间。此外，我不想使用命令行解决方案，但我更喜欢 Leptonica 或 OpenCV 。

I forgot to mention that I am trying to do all the processing and recognition within 2-2.5 secs on Linux platform and method to detect the text mentioned in this answer is taking a lot of time. Also I prefer not to use command line solution but I would prefer Leptonica or OpenCV.

大部分图片都已上传这里

我已经尝试过以下方法来对票证进行二值化但没有运气

I have tried following things to binarize the tickets but no luck

http：//www.vincent-net .com / luc / papers / 10wiley_morpho_DIAapps.pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.193.6347&rep=rep1&type=pdf

http：// iit。 demokritos.gr/~bgat/PatRec2006.pdf

http://psych.stanford.edu/~jlm/pdfs /Sternberg67.pdf

http://www.vincent-net.com/luc/papers/10wiley_morpho_DIAapps.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.193.6347&rep=rep1&type=pdf
http://iit.demokritos.gr/~bgat/PatRec2006.pdf
http://psych.stanford.edu/~jlm/pdfs/Sternberg67.pdf

票证包含

有点坏光

非文字区域

减少分辨率

我试图直接将图像提供给tesseract API，它在平均1秒内给我70％的好成绩。但是我希望提高注意时间因素的准确性到目前为止我已经尝试了

I tried to feed the image direct to tesseract API and it is giving me 70% good results in 1 sec average. But I want to increase the accuracy in noticing the time factor So far I have tried

检测图像的边缘

blob Blob分析

使用自适应阈值对票证进行二值化

然后我尝试将这些二值化图像提供给tesseract，虽然二值化图像看起来很完美，但精度降低到50-60％以下。

Then I tried to feed those binarized images to tesseract, the accuracy reduced to less than 50-60%, though binarized image look perfect.

使用tesseract进行字符识别 [英] Character Recognition using tesseract

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

使用tesseract进行字符识别 [英] Character Recognition using tesseract

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭