使用python和opencv检测图像中的文本区域 [英] Detect text area in an image using python and opencv

查看:133
本文介绍了使用python和opencv检测图像中的文本区域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用python 2.7和opencv 2.4.9
检测图像的文本区域,并在其周围绘制一个矩形区域。如下面的示例图片中所示。

I want to detect the text area of images using python 2.7 and opencv 2.4.9 and draw a rectangle area around it. Like shown in the example image below.

我是图像处理的新手,所以任何想法如何做到这一点将不胜感激。

I am new to image processing so any idea how to do this will be appreciated.

推荐答案

有多种方法可以检测图片中的文字。

There are multiple ways to go about detecting text in an image.

我建议查看这个问题这里,因为它也可以回答你的情况。虽然它不在python中,但代码可以很容易地从c ++转换为python(只需看看API并将方法从c ++转换为python,而不是很难。当我为自己的单独问题尝试代码时,我自己做了) 。这里的解决方案可能不适用于您的情况,但我建议您尝试使用它们。

I recommend looking at this question here, for it may answer your case as well. Although it is not in python, the code can be easily translated from c++ to python (Just look at the API and convert the methods from c++ to python, not hard. I did it myself when I tried their code for my own separate problem). The solutions here may not work for your case, but I recommend trying them out.

如果我要解决这个问题,我会执行以下过程:

If I were to go about this I would do the following process:

准备你的图像:
如果你要编辑的所有图像大致与你提供的图像一样,实际设计包含一系列灰色,以及文字总是黑的。我会先将所有非黑色(或已经是白色)的内容清空。这样做只会留下黑色文本。

Prep your image: If all of your images you want to edit are roughly like the one you provided, where the actual design consists of a range of gray colors, and the text is always black. I would first white out all content that is not black (or already white). Doing so will leave only the black text left.

# must import if working with opencv in python
import numpy as np
import cv2

# removes pixels in image that are between the range of
# [lower_val,upper_val]
def remove_gray(img,lower_val,upper_val):
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    lower_bound = np.array([0,0,lower_val])
    upper_bound = np.array([255,255,upper_val])
    mask = cv2.inRange(gray, lower_bound, upper_bound)
    return cv2.bitwise_and(gray, gray, mask = mask)

现在你所拥有的只是黑色文本,目标是获得那些盒子。如前所述,有不同的解决方法。

Now that all you have is the black text the goal is to get those boxes. As stated before, there are different ways of going about this.

查找文本区域的典型方法:您可以使用描边宽度变换找到文本区域,如用笔画宽度变换检测自然场景中的文字。说实话,如果这和我认为的一样快,可靠,那么这种方法比我的代码更有效。您仍然可以使用上面的代码删除蓝图设计,并且可能有助于swt算法的整体性能。

The typical way to find text areas: you can find text regions by using stroke width transform as depicted in "Detecting Text in Natural Scenes with Stroke Width Transform " by Boris Epshtein, Eyal Ofek, and Yonatan Wexler. To be honest, if this is as fast and reliable as I believe it is, then this method is a more efficient method than my below code. You can still use the code above to remove the blueprint design though, and that may help the overall performance of the swt algorithm.

这是一个实现他们的算法的交流库,但据说这是非常原始的,文件说明不完整。显然,为了将这个库与python一起使用,需要一个包装器,目前我没有看到正式的提供。

Here is a c library that implements their algorithm, but it is stated to be very raw and the documentation is stated to be incomplete. Obviously, a wrapper will be needed in order to use this library with python, and at the moment I do not see an official one offered.

我链接的库是< a href =http://libccv.org/ =noreferrer> CCV 。它是一个旨在用于您的应用程序的库,而不是重新创建算法。因此,这是一个可以使用的工具,这违背了OP希望从第一原则制定它,如评论中所述。如果您不想自己编码算法,仍然有用,知道它存在。

The library I linked is CCV. It is a library that is meant to be used in your applications, not recreate algorithms. So this is a tool to be used, which goes against OP's want for making it from "First Principles", as stated in comments. Still, useful to know it exists if you don't want to code the algorithm yourself.

如果你有每个图像的元数据,比如在xml文件中,它说明每个图像中标记了多少个房间,那么你可以访问该xml文件,获取有关图像中标签数量的数据,然后将该数字存储在某个变量中,例如 num_of_labels 。现在拍摄你的图像并通过一个while循环,它以你指定的设定速率侵蚀,在每个循环中找到图像中的外部轮廓,并在你的 num_of_labels 。然后只需找到每个轮廓的边界框即可完成。

If you have meta data for each image, say in an xml file, that states how many rooms are labeled in each image, then you can access that xml file, get the data about how many labels are in the image, and then store that number in some variable say, num_of_labels. Now take your image and put it through a while loop that erodes at a set rate that you specify, finding external contours in the image in each loop and stopping the loop once you have the same number of external contours as your num_of_labels. Then simply find each contours' bounding box and you are done.

# erodes image based on given kernel size (erosion = expands black areas)
def erode( img, kern_size = 3 ):
    retval, img = cv2.threshold(img, 254.0, 255.0, cv2.THRESH_BINARY) # threshold to deal with only black and white.
    kern = np.ones((kern_size,kern_size),np.uint8) # make a kernel for erosion based on given kernel size.
    eroded = cv2.erode(img, kern, 1) # erode your image to blobbify black areas
    y,x = eroded.shape # get shape of image to make a white boarder around image of 1px, to avoid problems with find contours.
    return cv2.rectangle(eroded, (0,0), (x,y), (255,255,255), 1)

# finds contours of eroded image
def prep( img, kern_size = 3 ):    
    img = erode( img, kern_size )
    retval, img = cv2.threshold(img, 200.0, 255.0, cv2.THRESH_BINARY_INV) #   invert colors for findContours
    return cv2.findContours(img,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # Find Contours of Image

# given img & number of desired blobs, returns contours of blobs.
def blobbify(img, num_of_labels, kern_size = 3, dilation_rate = 10):
    prep_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count.
    while len(contours) > num_of_labels:
        kern_size += dilation_rate # add dilation_rate to kern_size to increase the blob. Remember kern_size must always be odd.
        previous = (prep_img, contours, hierarchy)
        processed_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count, again.
    if len(contours) < num_of_labels:
        return (processed_img, contours, hierarchy)
    else:
        return previous

# finds bounding boxes of all contours
def bounding_box(contours):
    bBox = []
    for curve in contours:
        box = cv2.boundingRect(curve)
    bBox.append(box)
    return bBox

上述方法的结果框在标签周围会有空格,这可能包括原始设计的一部分,如果这些框应用于原始图像。为了避免这种情况,请通过新发现的方框创建感兴趣的区域并修剪空白区域。然后将该roi的形状保存为新盒子。

The resulting boxes from the above method will have space around the labels, and this may include part of the original design, if the boxes are applied to the original image. To avoid this make regions of interest via your new found boxes and trim the white space. Then save that roi's shape as your new box.

也许你无法知道图像中有多少个标签。如果是这种情况,那么我建议您使用侵蚀值,直到找到最适合您的情况并获得所需的斑点。

Perhaps you have no way of knowing how many labels will be in the image. If this is the case, then I recommend playing around with erosion values until you find the best one to suit your case and get the desired blobs.

或者你可以在删除设计后尝试在剩余内容上找到轮廓,并根据它们彼此的距离将边界框组合成一个矩形。

Or you could try find contours on the remaining content, after removing the design, and combine bounding boxes into one rectangle based on their distance from each other.

找到你的盒子后,只需使用相对于原始图片的那些盒子即可完成。

After you found your boxes, simply use those boxes with respect to the original image and you will be done.

如您对问题的评论所述,已经存在一种场景文本检测方法(在opencv 3中没有文档文本检测。我知道你没有能力切换版本,但是对于那些有相同问题并且不限于旧的opencv版本的人,我决定在最后包括这个。可以通过简单的谷歌搜索找到场景文本检测的文档。

As mentioned in the comments to your question, there already exists a means of scene text detection (not document text detection) in opencv 3. I understand you do not have the ability to switch versions, but for those with the same question and not limited to an older opencv version, I decided to include this at the end. Documentation for the scene text detection can be found with a simple google search.

用于文本检测的opencv模块还附带文本识别,实现tessaract,这是一个免费的开放 - 源文本识别模块。 tessaract的垮台,以及opencv的场景文本识别模块,它不像商业应用程序那样精致,使用起来很耗时。因此降低了它的性能,但它可以免费使用,所以它是我们在没有付钱的情况下得到的最好的,如果你想要文本识别的话。

The opencv module for text detection also comes with text recognition that implements tessaract, which is a free open-source text recognition module. The downfall of tessaract, and therefore opencv's scene text recognition module is that it is not as refined as commercial applications and is time consuming to use. Thus decreasing its performance, but its free to use, so its the best we got without paying money, if you want text recognition as well.

链接:

  • Documentation OpenCv
  • Older Documentation
  • The source code is located here, for analysis and understanding

老实说,我缺乏在opencv和图像处理方面的经验和专业知识,以提供实现其文本检测模块的详细方法。与SWT算法相同。在过去的几个月里,我刚刚介绍了这些内容,但随着我的了解,我将编辑这个答案。

Honestly, I lack the experience and expertise in both opencv and image processing in order to provide a detailed way in implementing their text detection module. The same with the SWT algorithm. I just got into this stuff this past few months, but as I learn more I will edit this answer.

这篇关于使用python和opencv检测图像中的文本区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆