如何删除重叠的轮廓并将每个字符作为单独的轮廓分开以进行字符提取? [英] How to remove overlapping contours and separate each character as an individual contour for character extraction?

查看:133
本文介绍了如何删除重叠的轮廓并将每个字符作为单独的轮廓分开以进行字符提取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 opencv 中的 MSER 从Python图像中实现字符提取.到目前为止,这是我的代码:

I am trying to implement character extraction from images in Python using the MSER in opencv. This is my code till now:

import cv2
import numpy as np

# create MSER object
mser = cv2.MSER_create()
# convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# detect the regions
regions,_ = mser.detectRegions(gray)
# find convex hulls of the regions
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
# initialize threshold area of the contours
ThresholdContourArea = 10000
# initialize empty list for the characters and their locations
char = []
loc =[]
# get the character part of the image and it's location if the area of contour less than threshold
for contour in hulls:
    if cv2.contourArea(contour) > ThresholdContourArea:
        continue
    # get the bounding rectangle around the contour
    bound_rect = cv2.boundingRect(contour)
    loc.append(bound_rect)
    det_char = gray[bound_rect[1]:bound_rect[1]+bound_rect[3],bound_rect[0]:bound_rect[0]+bound_rect[2]]
    char.append(det_char)

但是此方法为同一字母提供多个轮廓,并且在某些位置将多个单词放入一个轮廓中.这是一个例子:原始图片:

But this method gives multiple contours for the same letter and at some places multiple words are put into one contour. Here is an eg: original image:

添加轮廓后:

这里第一个T周围有多个轮廓,两个rs组合成一个轮廓.我该如何预防?

Here the first T has multiple contours around and the two rs are combined into one contour. How do I prevent that?

推荐答案

这是使用阈值+轮廓过滤的简单方法,而不是使用 MSER .我们首先删除边界,然后删除Otsu的阈值以获得二进制图像.这个想法是每个字母应该是一个单独的轮廓.我们找到轮廓并绘制每个矩形.

Instead of using MSER, here's a simple approach using thresholding + contour filtering. We first remove the border then Otsu's threshold to obtain a binary image. The idea is that each letter should be an individual contour. We find contours and draw each rectangle.

已删除边框-> 二进制图像-> 结果

Removed border -> binary image -> result

注意::在某些情况下,字母是连接在一起的,因此要删除合并的字符,我们可以先使用 imutils.resize()放大图像,然后执行侵蚀或形态学开口来分隔每个字符.但是,我无法获得很好的结果,因为即使使用最小的内核,文本也将消失.

Note: In some cases, the letters are connected so to remove the merged characters, we can first enlarge the image using imutils.resize() then perform erosion or morphological opening to separate each character. However, I was unable to obtain great results since the text would disappear even with the smallest sized kernel.

代码

import cv2
import imutils

# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=500)

# Remove border
kernel_vertical = cv2.getStructuringElement(cv2.MORPH_RECT, (1,50))
temp1 = 255 - cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel_vertical)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50,1))
temp2 = 255 - cv2.morphologyEx(image, cv2.MORPH_CLOSE, horizontal_kernel)
temp3 = cv2.add(temp1, temp2)
result = cv2.add(temp3, image)

# Convert to grayscale and Otsu's threshold
gray = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Find contours and filter using contour area
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(result, (x, y), (x + w, y + h), (36,255,12), 2)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

这篇关于如何删除重叠的轮廓并将每个字符作为单独的轮廓分开以进行字符提取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆