平面图文字识别和光学字符识别 [英] Floor Plan Text Recognition & OCR

查看:91
本文介绍了平面图文字识别和光学字符识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标是使用文本识别方法(例如:OpenCV)为美国平面图图像创建边界框,然后将其输入文本阅读器(例如:LSTM或tesseract)中.

The objective is to create bounding boxes using text recognition methods (eg: OpenCV) for US floor plan images, which can then be fed into a text reader (eg: LSTM or tesseract).

已经尝试了几种方法cv2.findContours和cv2.boundingRect方法,但是在很大程度上未能推广到不同类型的平面图(平面图的外观存在很大差异).

Several methods which have been tried cv2.findContours and cv2.boundingRect methods have been attempted but have largely failed to generalise to different types of floor plans (there is a wide deviation in how the floor plans look).

例如,在应用cv2.findContours函数之前,使用灰度,自适应阈值,腐蚀和膨胀(具有各种迭代)的cv2.findContours会产生以下结果.请注意,卧室2和厨房未正确拾取.

For example, cv2.findContours using grayscale, adaptive thresholds, erosion and dilation (with various iterations) before applying the cv2.findContours function results in the bellow. Note that Bedroom 2 and Kitchen are not being picked up correctly.

无法找到任何区域的其他示例:

Additional example which fails to find any regions:

对文本识别模型或清理程序有任何想法,最好通过代码示例来提高文本识别模型的准确性?

Any thoughts on text recognition models or cleaning procedures that will improve the accuracy of the text recognition model, preferably with code examples?

推荐答案

此答案基于以下假设:图像彼此相似(例如图像的大小,壁厚,字母...).如果不是,那么这不是一个好方法,因为您必须为每个图像更改阈值.话虽这么说,我会尝试将图像转换为二进制并搜索轮廓.之后,您可以添加标准,例如身高,体重等,以过滤掉墙壁.之后,您可以在蒙版上绘制轮廓,然后对图像进行放大.这样会将彼此靠近的字母组合成一个轮廓.然后,您可以为所有轮廓创建边界框,这就是您的投资回报率.然后,您可以在该区域上使用任何OCR.希望能有所帮助.干杯!

This answer is based on the assumption that images are similar one to another (like their size, thickness of walls, letters...). If they are not this wouldn't be a good approach because you would have to change the thresholders for every image. That being said, I would try to transform the image to binary and search for contours. After that you can add criterion like height, weight etc. to filter out the walls. After that You can draw contours on a mask and then dilate the image. That will combine letters close to each other into one contour. Then you can create bounding box for all the contours which is your ROI. Then you can use any OCR on that region. Hope it helps a bit. Cheers!

示例:

import cv2
import numpy as np

img = cv2.imread('floor.png')
mask = np.zeros(img.shape, dtype=np.uint8)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, threshold = cv2.threshold(gray,150,255,cv2.THRESH_BINARY_INV)
_, contours, hierarchy = cv2.findContours(threshold,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)

ROI = []

for cnt in contours:
    x,y,w,h = cv2.boundingRect(cnt)
    if h < 20:
        cv2.drawContours(mask, [cnt], 0, (255,255,255), 1)

kernel = np.ones((7,7),np.uint8)
dilation = cv2.dilate(mask,kernel,iterations = 1)
gray_d = cv2.cvtColor(dilation, cv2.COLOR_BGR2GRAY)
_, threshold_d = cv2.threshold(gray_d,150,255,cv2.THRESH_BINARY)
_, contours_d, hierarchy = cv2.findContours(threshold_d,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)

for cnt in contours_d:
    x,y,w,h = cv2.boundingRect(cnt)
    if w > 35:
        cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
        roi_c = img[y:y+h, x:x+w]
        ROI.append(roi_c)

cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

结果:

这篇关于平面图文字识别和光学字符识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆