复杂场景图像中数字的本地化 [英] Localization of numbers within a complex scene image

查看:171
本文介绍了复杂场景图像中数字的本地化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我非常感谢SO专家提供的帮助。许多人提出并由专家回答的问题对我来说是非常有益的。几个月前,当我还是学生做论文时,它帮助我解决了一个非常关键的问题。

First of all, I very much appreciate the help provided by the experts here at SO. The questions posed by many and answered by the experts has been of immense benefit to me. It had helped me with a very crucial problem few months back when I was a student doing my thesis.

现在我正在研究一个问题来检测(然后识别)复杂场景图像中的数字。您可以在此处查看这些图片: http://imageshack.us/g/823/dsc1757w。 JPG / 。这些是马拉松运动员的照片,他们的衬衫前面有数字。我必须检测图像中出现的所有数字然后识别它们。识别并不困难,因为这些似乎是OCR友好的角色。关键是如何检测这些数字。

Right now I am working on a problem to detect (and then recognize) numbers in a complex scene image. You can check out these images here: http://imageshack.us/g/823/dsc1757w.jpg/. These are pictures of marathon runners with their numbers on the front of their shirts. I have to detect all the numbers that appear in the image and then recognize them. The recognition wont be difficult as these appear to be OCR friendly characters. The crucial thing is how to detect these numbers.

我有一个想法,首先将其滤色为黑色。但是当我在Matlab中尝试时,结果并不令人鼓舞,因为我们可以看到图像中的许多区域符合这个标准(衣服,跑步者后面的一些阴影,树叶中的阴影等)。要么我需要从这些其他区域对这些字符进行分类,要么需要其他一些好的技术。
有可用的论文,我已经通过了其中的一些,比如SWT,DWT等,但我觉得它们不会有太多帮助。我在想某种训练算法可能有用。还有另一个原因,将来可能会有其他可能有不同字体的照片等,所以我认为专用的算法方法可能会失败。任何人都可以指出我正确的方向吗?

I had an idea to first color filter it for black color. But when I tried in Matlab, the results were not encouraging, as we can see that many of the regions in the image qualify this criteria (the clothes, some shadows behind the runners, the shadows in the foliage, etc). Either I need to classify these characters from these other regions or need some other good technique. There are papers available and I have gone through some of them, like the SWT, DWT, etc., but I have a feeling they wont be of much help. I was thinking some kind of training algorithm might be useful. There is another reason for this, in future there might be other photos with possibly different fonts, etc., so I think a dedicated algorithmic approach might fail. Can anyone point me in the right direction?

我不是图像处理的新手,但也不是专家。所以,在这方面的任何和所有帮助/建议将不胜感激:)。

I am not a novice in image processing, but not an expert either. So, any and all help/suggestion in this regard will be greatly appreciated :) .

谢谢,
MD

Thanks, MD

推荐答案

你知道你的问题不是一个简单的问题,但它看起来很有趣!
虽然我没有为你提供任何解决方案,但我会分享我的想法,希望你可以用它做点什么。

You know that your problem is not a simple one, but it seems very interesting! Although I don't have any solutions for you, I will just share my thoughts in hope that you can make something out of it.

让我们拿2你的照片作为例子:

Let's take 2 of your photos as examples:

照片-A: http://imageshack.us/photo/my-images/59/dsc0275a.jpg/
它显示了一个有亲戚的人大绿色标签,衬衫上有数字。

Photo-A: http://imageshack.us/photo/my-images/59/dsc0275a.jpg/ It shows a single person with a relative "big" green label with numbers in his shirt.

Photo-B: http://imageshack.us/photo/my-images/546/dsc0243u.jpg/
显示很多人在他们的衬衫上有红色较小的标签。
(标签的高度(以像素为单位)约为Photo-A标签的1/5。

Photo-B: http://imageshack.us/photo/my-images/546/dsc0243u.jpg/ It shows a lot of people with red smaller labels in their shirts. (The labels' height in pixels is about 1/5 of the label in Photo-A)

考虑到上面的照片,我会尝试写一些随机的想法可能有帮助...

Considering the above photos, I will try to write some random thoughts which may help...

(a)定义你的比例:没有必要应用搜索算法来查找标签2x2像素,达到完整图像分辨率。您必须定义宽度和宽度的最小/最大限制。标签的高度。这些限制可能取决于许多不同的因素:

(a) Define your scale: There is no point to apply a search algorithm to find labels from 2x2 pixels up-to the full image resolution. You must define the minimum/maximum limits for width & height of a label. Those limits may depend on many different factors:

(1)一个因素是标签的实际尺寸(由人与相机的距离定义),可以定义为图像宽度的百分比&高度。

(1) One factor is the real size of labels (defined by the distance of people from camera) which can be defined as a percentage of the image width & height.

(2)另一个因素是您将要使用的OCR的实际读数。如果数字的图像高度小于Y1像素或大于Y2像素,则OCR将无法读取它(这听起来很奇怪,但确实如此:人眼看起来非常清晰的大图像,但是OCR可能有问题读它)。

(2) Another factor is the actual reading accurracy of the OCR you are going to use. If the numbers' image height is smaller than Y1 pixels or bigger than Y2 pixels the OCR will not be able to read it (it sounds strange but it's true: big images may seem very clear to the human eye, but an OCR may have problems reading it).

(b)找到感兴趣的区域:在你的情况下,这相当于查找近似值标签的位置。我们可以将运动员标签大致定义为一个(几乎)矩形区域,相对于照片边界可能有点倾斜,并且包含:黑色的中心区域+颜色C1 [例如红色或绿色] +白色(=中性) )它顶部和/或底部的区域。

(b) Find the area(s) of interest: In your case, this is equivalent to "Find the approximate position of labels". We can define an athlete label roughly as "An (almost) rectangular area, which may be a bit inclined relative to photo borders, and contains: A central area of black + color C1 [e.g. red or green] + a white (=neutral) area on top and/or bottom of it".

查找标签大致位置的可能算法是:

A possible algorithm to find the approximate position of a label is:

(1)从左到右,从上到下遍历所有图像并检查MinHeight / 2 x MinHeight / 2的正方形区域

(1) Traverse all image left-to-right, top-to-bottom and examine a square area of MinHeight/2 x MinHeight/2

(2)创建方形区域的直方图(或将其分级为8级)并尝试查找是否只有黑色+另一种颜色C1,例如百分比黑色:40%+ / - 10,颜色:60%+ / - 10%

(2) Create the histogram of the square area (or posterize it e.g. to 8 levels) and try to find if there is only Black + Another color C1 in a percentage of e.g. Black: 40% +/- 10, Color: 60% +/- 10%

(3)如果(2)为真,请尝试将区域扩展到右侧底部,而百分比保持在指定的限制内

(3) If (2) is true try to expand the area to Right and Bottom while the percentages are kept in the specified limits

(4)如果方形完全展开,检查扩展区域大小是否在宽度的最小/最大限制范围内你在(a)中指定的/身高。如果没有,请转到步骤1

(4) If the square is fully expanded, check if the expanded area size is inside the min/max limits of width/height you specified in (a). If not, go to step 1

(5)处理扩展区域以读取数字 - 请参阅(c)以下

(5) Process the expanded area to read the numbers - see (c) bellow

(6)转到第1步

(c)处理感兴趣的区域:尝试以下步骤:

(1)通过应用将Color C1刻录为白色的滤镜将每个图像区域转换为灰度。

(1) Convert each image-area to Grayscale by applying a color filter that burn Color C1 to white.

(2)均衡灰度以使黑色字母突出显示

(2) Equalize the Grayscale to make the black letters stand-out

(3)如果检测到倾斜,则在图像区域执行反向旋转使字母尽可能水平。

(3) If an inclination has been detected, perform a reverse rotation on the image-area to make the letters as horizontal as possible.

(4)将区域送到仅为数字训练的OCR

(4) Feed the area to an OCR trained only for numbers

祝你的项目好运!

这篇关于复杂场景图像中数字的本地化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆