如何识别带有彩色背景图像的文本? [英] How to recognize text with colored background images?

查看:134
本文介绍了如何识别带有彩色背景图像的文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是opencv和python以及tesseract的新手.现在,我正在创建一个脚本,该脚本将识别图像中的文本.我的代码在黑色文本和白色背景或带有黑色背景的白色文本上可以完美地工作,但在彩色图像中却不能.例如,带有蓝色背景的白色文本,例如按钮.字体也会影响到它吗?在这种情况下,我会找到重新启动文本(按钮)

I am new to opencv and python as well as tesseract. Now, I am creating a script that will recognize text from an image. My code works perfectly on black text and white background or white text with black background but not in colored images. Example, white text with blue background such as a button. Is the font also affecting this? In this case, I am finding the Reboot text (the button)

这是示例图像

我在通过opencv进行图像预处理时尝试了一堆代码和方法,但未能获得结果.图像二值化,降噪,灰度但是不好.

I tried bunch of codes and methods on image preprocessing via opencv but failed to get the result. Image binarizing, noise reduction, grayscale but no good.

这是示例代码:

from PIL import Image
import pytesseract
import cv2
import numpy as np

# image = Image.open('image.png')
# image = image.convert('-1')
# image.save('new.png')

filename = 'image.png'
outputname = 'converted.png'

# grayscale -----------------------------------------------------
image = cv2.imread(filename)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite(outputname,gray_image)

# binarize -----------------------------------------------------
im_gray = cv2.imread(outputname, cv2.IMREAD_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
cv2.imwrite(outputname, im_bw)

# remove noise -----------------------------------------------------
im = cv2.imread(outputname)
morph = im.copy()

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
image_channels = np.split(np.asarray(morph), 3, axis=2)

channel_height, channel_width, _ = image_channels[0].shape

# apply Otsu threshold to each channel
for i in range(0, 3):
    _, image_channels[i] = cv2.threshold(image_channels[i], 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
    image_channels[i] = np.reshape(image_channels[i], newshape=(channel_height, channel_width, 1))

# merge the channels
image_channels = np.concatenate((image_channels[0], image_channels[1], image_channels[2]), axis=2)

# save the denoised image
cv2.imwrite(outputname, image_channels)

image = Image.open(outputname)
data_string = pytesseract.image_to_data(image, config='--oem 1')
data_string = data_string.encode('utf-8')
open('image.tsv', 'wb').write(data_string)

通过运行代码,我得到以下图像:

By running the code, I get this image:

和带有TSV参数的tesseract的结果:

And the result of tesseract with TSV parameter:

level   page_num    block_num   par_num line_num    word_num    left    top width   height  conf    text
1   1   0   0   0   0   0   0   1024    768 -1  
2   1   1   0   0   0   2   13  1002    624 -1  
3   1   1   1   0   0   2   13  1002    624 -1  
4   1   1   1   1   0   172 13  832 22  -1  
5   1   1   1   1   1   172 13  127 22  84  CONFIGURATION
5   1   1   1   1   2   822 17  59  11  92  CENTOS
5   1   1   1   1   3   887 17  7   11  95  7
5   1   1   1   1   4   900 17  104 11  95  INSTALLATION
4   1   1   1   2   0   86  29  900 51  -1  
5   1   1   1   2   1   86  35  15  45  12  4
5   1   1   1   2   2   825 30  27  40  50  Bes
5   1   1   1   2   3   952 29  34  40  51  Hel
4   1   1   1   3   0   34  91  87  17  -1  
5   1   1   1   3   1   34  91  87  17  90  CentOS
4   1   1   1   4   0   2   116 9   8   -1  
5   1   1   1   4   1   2   116 9   8   0   ‘
4   1   1   1   5   0   184 573 57  14  -1  
5   1   1   1   5   1   184 573 57  14  90  Complete!
4   1   1   1   6   0   634 606 358 14  -1  
5   1   1   1   6   1   634 606 43  10  89  CentOS
5   1   1   1   6   2   683 609 7   7   96  is
5   1   1   1   6   3   696 609 24  7   96  now
5   1   1   1   6   4   725 606 67  14  96  successfully
5   1   1   1   6   5   797 606 45  10  96  installed
5   1   1   1   6   6   848 606 18  10  96  and
5   1   1   1   6   7   872 599 29  25  96  ready
5   1   1   1   6   8   906 599 15  25  95  for
5   1   1   1   6   9   928 609 20  11  96  you
5   1   1   1   6   10  953 608 12  8   96  to
5   1   1   1   6   11  971 606 21  10  95  use!
4   1   1   1   7   0   775 623 217 14  -1  
5   1   1   1   7   1   775 623 15  10  95  Go
5   1   1   1   7   2   796 623 31  10  96  ahead
5   1   1   1   7   3   833 623 18  10  96  and
5   1   1   1   7   4   857 623 38  10  96  reboot
5   1   1   1   7   5   900 625 12  8   96  to
5   1   1   1   7   6   918 625 25  8   95  start
5   1   1   1   7   7   949 626 28  11  96  using
5   1   1   1   7   8   983 623 9   10  93  it!

如您所见,重新启动"文本未显示.也许是因为字体?还是颜色?

As you can see, the "Reboot" text is not showing. Maybe it is because of the font? Or the color?

推荐答案

以下是两种不同的方法:

Here are two different approaches:

1.传统图像处理和轮廓过滤

主要思想是提取ROI,然后应用Tesseract OCR.

The main idea is to extract the ROI then apply Tesseract OCR.

  • 将图像转换为灰度和高斯模糊
  • 自适应阈值
  • 找到轮廓
  • 遍历轮廓并使用轮廓逼近和面积过滤
  • 提取投资回报率

一旦我们从自适应阈值获得二值图像,我们就会找到轮廓并使用轮廓逼近(cv2.arcLength()cv2.approxPolyDP())进行滤波.如果轮廓有四个点,我们假定它是矩形或正方形.此外,我们使用轮廓区域应用第二个过滤器,以确保隔离正确的ROI.这是提取的投资回报率

Once we obtain a binary image from adaptive thresholding, we find contours and filter using contour approximation with cv2.arcLength() and cv2.approxPolyDP(). If the contour has four points, we assume it is either a rectangle or square. In addition, we apply a second filter using contour area to ensure that we isolate the correct ROI. Here's the extracted ROI

import cv2

image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,9,3)

cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

ROI_number = 0
for c in cnts:
    area = cv2.contourArea(c)
    peri = cv2.arcLength(c, True)
    approx = cv2.approxPolyDP(c, 0.05 * peri, True)
    if len(approx) == 4 and area > 2200:
        x,y,w,h = cv2.boundingRect(approx)
        ROI = image[y:y+h, x:x+w]
        cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
        ROI_number += 1

现在,我们可以将其放入Pytesseract.注意Pytesseract要求图像文本为黑色,而背景为白色,因此我们首先要进行一些预处理.这是Pytesseract的预处理图像和结果

Now we can throw this into Pytesseract. Note Pytesseract requires that the image text be in black while the background in white so we do a bit of preprocessing first. Here's the preprocessed image and result from Pytesseract

重启

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('ROI.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

result = 255 - thresh 

data = pytesseract.image_to_string(result, lang='eng',config='--psm 10 ')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

通常,您还需要使用形态学转换来平滑图像,但在这种情况下,文本足够好

Normally, you would also need to use morphological transformations to smooth the image but for this case, the text is good enough

2.颜色阈值

第二种方法是使用带有较低和较高HSV阈值的颜色阈值来创建蒙版,我们可以在其中提取ROI.有关完整示例,请此处.一旦提取了ROI,我们将按照相同的步骤对图像进行预处理,然后再将其投入Pytesseract

The second approach is to use color thresholding with lower and upper HSV thresholds to create a mask where we can extract the ROI. Look here for a complete example. Once the ROI is extracted, we follow the same steps to preprocess the image before throwing it into Pytesseract

这篇关于如何识别带有彩色背景图像的文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆