有什么方法可以改善小字体的tesseract OCR? [英] Is there any way to improve tesseract OCR with small fonts?

查看:105
本文介绍了有什么方法可以改善小字体的tesseract OCR?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过python-tesseract使用tesseract-OCR来读取低分辨率字体,如下所示:

I'm trying to use tesseract-OCR via python-tesseract to read a low resolution font that looks like this:

不幸的是,该图像返回了

Unfortunately that image returns

ZIJZHZI

我认为分辨率太低,这会引起问题.我已经尝试过放大图像,并将其裁剪为单个字符,但是这些都不能提供很大的改进.还有什么我应该考虑做的事情,最好是可以使用Python Imaging Library完成的事情?或者我应该放弃/训练tesseract.

I think the resolution is too low and that is causing problems. I've tried magnifying the image, and cropping it down to individual characters, but neither of these provide much improvement. Is there anything else I should consider doing, preferably something that could be done using the Python Imaging Library? Or should I just give up/train tesseract.

对于它的价值,PIL具有以下内置过滤器:

For what it's worth, the PIL has the following built in filters:

蓝色,轮廓,细节,边缘增强,
EDGE_ENHANCE_MORE,EMBOSS,FIND_EDGES,
SMOOTH,SMOOTH_MORE和SHARPEN

BLUR, CONTOUR, DETAIL, EDGE_ENHANCE,
EDGE_ENHANCE_MORE, EMBOSS, FIND_EDGES,
SMOOTH, SMOOTH_MORE, and SHARPEN

推荐答案

我尝试使用以下方法放大图像:

I've tried to magnify the image with:

  convert -resize 400% in.bmp out.bmp

然后阅读:

  tesseract out.bmp res

结果正确:

  100

这篇关于有什么方法可以改善小字体的tesseract OCR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆