为OCR准备复杂的图像 [英] Prepare complex image for OCR

查看:154
本文介绍了为OCR准备复杂的图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想识别信用卡中的数字。更糟糕的是,源图像不能保证高质量。 OCR将通过神经网络实现,但这不应该是这里的主题。



当前的问题是图像预处理。由于信用卡可以具有背景和其他复杂图形,因此文本不像扫描文档那样清晰。我用边缘检测(Canny Edge,Sobel)进行了实验,但并没有那么成功。
同时计算灰度图像和模糊图像之间的差异(如



全局阈值(用于二值化或削减边缘强度)可能不会削减它为此应用程序,而是你应该看看本地化的阈值。在您的示例图像中,31之后的02特别弱,因此搜索该区域中最强的局部边缘将比使用单个阈值过滤字符串中的所有边缘更好。



如果您可以识别部分字符段,那么您可以使用一些方向形态操作来帮助连接段。例如,如果您有两个几乎水平的片段,如下所示,其中0是背景,1是前景......

  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 1 1 1 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0

然后你可以执行形态学关闭沿水平方向操作只是为了加入这些段。内核可能类似于

  xxxxx 
1 1 1 1 1
xxxxx

有更复杂的方法可以使用Bezier拟合甚至欧拉螺旋(也称为回旋曲线)来完成曲线完成,但是预处理以识别段到加入和后处理以消除不良连接可能会变得非常棘手。


I want to recognize digits from a credit card. To make things worse, the source image is not guaranteed to be of high quality. The OCR is to be realized through a neural network, but that shouldn't be the topic here.

The current issue is the image preprocessing. As credit cards can have backgrounds and other complex graphics, the text is not as clear as with scanning a document. I made experiments with edge detection (Canny Edge, Sobel), but it wasn't that successful. Also calculating the difference between the greyscale image and a blurred one (as stated at Remove background color in image processing for OCR) did not lead to an OCRable result.

I think most approaches fail because the contrast between a specific digit and its background is not strong enough. There is probably a need to do a segmentation of the image into blocks and find the best preprocessing solution for each block?

Do you have any suggestions how to convert the source to a readable binary image? Is edge detection the way to go or should I stick with basic color thresholding?

Here is a sample of a greyscale-thresholding approach (where I am obviously not happy with the results):

Original image:

Greyscale image:

Thresholded image:

Thanks for any advice, Valentin

解决方案

If it's at all possible, request that better lighting be used to capture the images. A low-angle light would illuminate the edges of the raised (or sunken) characters, thus greatly improving the image quality. If the image is meant to be analyzed by a machine, then the lighting should be optimized for machine readability.

That said, one algorithm you should look into is the Stroke Width Transform, which is used to extract characters from natural images.

Stroke Width Transform (SWT) implementation (Java, C#...)

A global threshold (for binarization or clipping edge strengths) probably won't cut it for this application, and instead you should look at localized thresholds. In your example images the "02" following the "31" is particularly weak, so searching for the strongest local edges in that region would be better than filtering all edges in the character string using a single threshold.

If you can identify partial segments of characters, then you might use some directional morphology operations to help join segments. For example, if you have two nearly horizontal segments like the following, where 0 is the background and 1 is the foreground...

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0

then you could perform a morphological "close" operation along the horizontal direction only to join those segments. The kernel could be something like

x x x x x
1 1 1 1 1
x x x x x

There are more sophisticated methods to perform curve completion using Bezier fits or even Euler spirals (a.k.a. clothoids), but preprocessing to identify segments to be joined and postprocessing to eliminate poor joins can get very tricky.

这篇关于为OCR准备复杂的图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆