为什么tesseract失败了这张图片? [英] why tesseract fails for this image?

查看:123
本文介绍了为什么tesseract失败了这张图片?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我'我在这张图片上试过了tesseract,还有一些扫描过的图片,里面有一些文字。但它每次都以一些垃圾文本作为输出失败。并纠正文本周围没有框。请告诉我如何处理图像或tesseract?请帮助

I've tried tesseract on this image and some scanned images with some text inside rectangles. but it fails each time with some garbage text as output. and correct without boxes around text. please tell me what to do with image or tesseract ? please help

推荐答案

如前所述,你应该在做OCR之前摆脱粉红色的线条(它们仍然有用
作为角色边界虽然)

As said before, you should get rid of pink lines before doing OCR ( they are still usefull as character boundaries though )

一旦你提取了你的gliph并将它们二进制化(转换为位图),你可以开始使用tesseract来获取
。请记住,tesseract使用形状提取方法并依赖于字典支持 - 您可以获得更好的结果(以及更快的处理时间)和不变的时刻,如胡

Once you extacted your gliphs and binarised them ( converted to bit image ) you may start to use tesseract on it. Keep in mind that tesseract uses shape extraction approach and depends on dictionary support - you may get better results (and faster processing times) with invariant moments like Hu

如果你是对java base方法感兴趣,这里是我们的OCR库在纯java中做这个(可以移植到其他语言):

In case you are interested in java base approach, here is our OCR library doing just this in pure java (can be ported to other languages) :

http://sourceforge.net/projects/javaocr/

这篇关于为什么tesseract失败了这张图片?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆