如何识别带有彩色背景图像的文本? [英] How to recognize text with colored background images?
问题描述
我是opencv和python以及tesseract的新手.现在,我正在创建一个脚本,该脚本将识别图像中的文本.我的代码在黑色文本和白色背景或带有黑色背景的白色文本上可以完美地工作,但在彩色图像中却不能.例如,带有蓝色背景的白色文本,例如按钮.字体也会影响到它吗?在这种情况下,我会找到重新启动文本(按钮)
I am new to opencv and python as well as tesseract. Now, I am creating a script that will recognize text from an image. My code works perfectly on black text and white background or white text with black background but not in colored images. Example, white text with blue background such as a button. Is the font also affecting this? In this case, I am finding the Reboot text (the button)
这是示例图像
我在通过opencv进行图像预处理时尝试了一堆代码和方法,但未能获得结果.图像二值化,降噪,灰度但是不好.
I tried bunch of codes and methods on image preprocessing via opencv but failed to get the result. Image binarizing, noise reduction, grayscale but no good.
这是示例代码:
from PIL import Image
import pytesseract
import cv2
import numpy as np
# image = Image.open('image.png')
# image = image.convert('-1')
# image.save('new.png')
filename = 'image.png'
outputname = 'converted.png'
# grayscale -----------------------------------------------------
image = cv2.imread(filename)
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imwrite(outputname,gray_image)
# binarize -----------------------------------------------------
im_gray = cv2.imread(outputname, cv2.IMREAD_GRAYSCALE)
(thresh, im_bw) = cv2.threshold(im_gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
cv2.imwrite(outputname, im_bw)
# remove noise -----------------------------------------------------
im = cv2.imread(outputname)
morph = im.copy()
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
image_channels = np.split(np.asarray(morph), 3, axis=2)
channel_height, channel_width, _ = image_channels[0].shape
# apply Otsu threshold to each channel
for i in range(0, 3):
_, image_channels[i] = cv2.threshold(image_channels[i], 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
image_channels[i] = np.reshape(image_channels[i], newshape=(channel_height, channel_width, 1))
# merge the channels
image_channels = np.concatenate((image_channels[0], image_channels[1], image_channels[2]), axis=2)
# save the denoised image
cv2.imwrite(outputname, image_channels)
image = Image.open(outputname)
data_string = pytesseract.image_to_data(image, config='--oem 1')
data_string = data_string.encode('utf-8')
open('image.tsv', 'wb').write(data_string)
通过运行代码,我得到以下图像:
By running the code, I get this image:
和带有TSV参数的tesseract的结果:
And the result of tesseract with TSV parameter:
level page_num block_num par_num line_num word_num left top width height conf text
1 1 0 0 0 0 0 0 1024 768 -1
2 1 1 0 0 0 2 13 1002 624 -1
3 1 1 1 0 0 2 13 1002 624 -1
4 1 1 1 1 0 172 13 832 22 -1
5 1 1 1 1 1 172 13 127 22 84 CONFIGURATION
5 1 1 1 1 2 822 17 59 11 92 CENTOS
5 1 1 1 1 3 887 17 7 11 95 7
5 1 1 1 1 4 900 17 104 11 95 INSTALLATION
4 1 1 1 2 0 86 29 900 51 -1
5 1 1 1 2 1 86 35 15 45 12 4
5 1 1 1 2 2 825 30 27 40 50 Bes
5 1 1 1 2 3 952 29 34 40 51 Hel
4 1 1 1 3 0 34 91 87 17 -1
5 1 1 1 3 1 34 91 87 17 90 CentOS
4 1 1 1 4 0 2 116 9 8 -1
5 1 1 1 4 1 2 116 9 8 0 ‘
4 1 1 1 5 0 184 573 57 14 -1
5 1 1 1 5 1 184 573 57 14 90 Complete!
4 1 1 1 6 0 634 606 358 14 -1
5 1 1 1 6 1 634 606 43 10 89 CentOS
5 1 1 1 6 2 683 609 7 7 96 is
5 1 1 1 6 3 696 609 24 7 96 now
5 1 1 1 6 4 725 606 67 14 96 successfully
5 1 1 1 6 5 797 606 45 10 96 installed
5 1 1 1 6 6 848 606 18 10 96 and
5 1 1 1 6 7 872 599 29 25 96 ready
5 1 1 1 6 8 906 599 15 25 95 for
5 1 1 1 6 9 928 609 20 11 96 you
5 1 1 1 6 10 953 608 12 8 96 to
5 1 1 1 6 11 971 606 21 10 95 use!
4 1 1 1 7 0 775 623 217 14 -1
5 1 1 1 7 1 775 623 15 10 95 Go
5 1 1 1 7 2 796 623 31 10 96 ahead
5 1 1 1 7 3 833 623 18 10 96 and
5 1 1 1 7 4 857 623 38 10 96 reboot
5 1 1 1 7 5 900 625 12 8 96 to
5 1 1 1 7 6 918 625 25 8 95 start
5 1 1 1 7 7 949 626 28 11 96 using
5 1 1 1 7 8 983 623 9 10 93 it!
如您所见,重新启动"文本未显示.也许是因为字体?还是颜色?
As you can see, the "Reboot" text is not showing. Maybe it is because of the font? Or the color?
推荐答案
以下是两种不同的方法:
Here are two different approaches:
1.传统图像处理和轮廓过滤
主要思想是提取ROI,然后应用Tesseract OCR.
The main idea is to extract the ROI then apply Tesseract OCR.
- 将图像转换为灰度和高斯模糊
- 自适应阈值
- 找到轮廓
- 遍历轮廓并使用轮廓逼近和面积过滤
- 提取投资回报率
一旦我们从自适应阈值获得二值图像,我们就会找到轮廓并使用轮廓逼近(cv2.arcLength()
和cv2.approxPolyDP()
)进行滤波.如果轮廓有四个点,我们假定它是矩形或正方形.此外,我们使用轮廓区域应用第二个过滤器,以确保隔离正确的ROI.这是提取的投资回报率
Once we obtain a binary image from adaptive thresholding, we find contours and filter using contour approximation with cv2.arcLength()
and cv2.approxPolyDP()
. If the contour has four points, we assume it is either a rectangle or square. In addition, we apply a second filter using contour area to ensure that we isolate the correct ROI. Here's the extracted ROI
import cv2
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,9,3)
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
ROI_number = 0
for c in cnts:
area = cv2.contourArea(c)
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.05 * peri, True)
if len(approx) == 4 and area > 2200:
x,y,w,h = cv2.boundingRect(approx)
ROI = image[y:y+h, x:x+w]
cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
ROI_number += 1
现在,我们可以将其放入Pytesseract.注意Pytesseract要求图像文本为黑色,而背景为白色,因此我们首先要进行一些预处理.这是Pytesseract的预处理图像和结果
Now we can throw this into Pytesseract. Note Pytesseract requires that the image text be in black while the background in white so we do a bit of preprocessing first. Here's the preprocessed image and result from Pytesseract
重启
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('ROI.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 10 ')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
通常,您还需要使用形态学转换来平滑图像,但在这种情况下,文本足够好
Normally, you would also need to use morphological transformations to smooth the image but for this case, the text is good enough
2.颜色阈值
第二种方法是使用带有较低和较高HSV阈值的颜色阈值来创建蒙版,我们可以在其中提取ROI.有关完整示例,请此处.一旦提取了ROI,我们将按照相同的步骤对图像进行预处理,然后再将其投入Pytesseract
The second approach is to use color thresholding with lower and upper HSV thresholds to create a mask where we can extract the ROI. Look here for a complete example. Once the ROI is extracted, we follow the same steps to preprocess the image before throwing it into Pytesseract
这篇关于如何识别带有彩色背景图像的文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!