训练Tesseract 3从煤气表的真实图像中识别数字 [英] Training Tesseract 3 to recognize numbers from real images of gas meters

查看:131
本文介绍了训练Tesseract 3从煤气表的真实图像中识别数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试训练tesseract以从煤气表的真实图像中识别数字.

I'm trying to train tesseract to recognize numbers from real images of gas meters.

我用于训练的图像是用相机拍摄的,因此存在很多问题:图像分辨率差,图像模糊,光线不足或由于过度曝光,反射,阴影等导致的对比度低.

The images that I use for training are made with a camera, for this reason there are many problems: poor images resolution, blurred images, poor lighting or low contrast as a result of the overexposure, reflections, shadows, etc...

为了进行培训,我创建了一个大图像,并用燃气表的图像捕获了一系列数字,然后手动编辑了文件框以创建.tr文件.结果是,只有清晰和锐利的图像的数字被识别,而模糊图像的数字没有被tesseract捕获.

For training, I have created a large image with a series of digits captured by the images of the gas meter and I manually edited the file box to create the .tr files. The result is that only the digits of the clearer and sharper images are recognized while the digits of blurred images are not captured by tesseract.

推荐答案

据我所知,您需要OpenCV来识别数字所在的框,但是OpenCV并不是OCR的神.找到框后,只需裁剪该部分,进行图像处理,然后将其移交给tesseract进行OCR.

As far as I can tell you need to OpenCV to recognize box in which numbers are located, but OpenCV is not god for OCR. After you locate box, just crop that part, do image processing and then hand it over to tesseract for OCR.

我需要有关OpenCV的帮助,因为我不知道如何在OpenCV中编程.

I need help with OpenCV because I don't know how to program in OpenCV.

以下是一些现实世界的例子.

Here are few real world examples.

  • 第一张图片是原始图片(功率计的编号)
  • 第二张图像在GIMP中略微整理干净,在tesseract中的OCR准确度约为50%
  • 第三张图像是完全清洁的图像-无需任何培训即可识别100%OCR!

这篇关于训练Tesseract 3从煤气表的真实图像中识别数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆