使用OpenCV进行图像处理,去除图像中的背景文字和噪点 [英] Remove background text and noise from an image using image processing with OpenCV
问题描述
我有这些图片
我要为其删除背景中的文本.仅captcha characters
应该保留(即K6PwKA,YabVzu).任务是稍后使用tesseract识别这些字符.
For which I want to remove the text in the background. Only the captcha characters
should remain(i.e K6PwKA, YabVzu). The task is to identify these characters later using tesseract.
这是我尝试过的方法,但是准确性不高.
This is what I have tried, but it isn't giving much good accuracy.
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"
img = cv2.imread("untitled.png")
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_filtered = cv2.inRange(gray_image, 0, 75)
cv2.imwrite("cleaned.png", gray_filtered)
我该如何改善呢?
注意: 我尝试了所有有关此问题的建议,但没有一个对我有用.
Note : I tried all the suggestion that I was getting for this question and none of them worked for me.
根据Elias的说法,我尝试通过使用Photoshop将验证码文本的颜色转换为灰度来发现其颜色,灰度级介于[100,105]之间.然后,我根据该范围对图像进行阈值处理.但是我得到的结果并没有从tesseract得到令人满意的结果.
EDIT : According to Elias, I tried finding the color of the captcha text using photoshop by converting it to grayscale which came out to be somewhere in between [100, 105]. I then threshold the image based on this range. But the result which I got did not give satisfactory result from tesseract.
gray_filtered = cv2.inRange(gray_image, 100, 105)
cv2.imwrite("cleaned.png", gray_filtered)
gray_inv = ~gray_filtered
cv2.imwrite("cleaned.png", gray_inv)
data = pytesseract.image_to_string(gray_inv, lang='eng')
输出:
'KEP wKA'
结果:
def get_text(img_name):
lower = (100, 100, 100)
upper = (104, 104, 104)
img = cv2.imread(img_name)
img_rgb_inrange = cv2.inRange(img, lower, upper)
neg_rgb_image = ~img_rgb_inrange
cv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image)
data = pytesseract.image_to_string(neg_rgb_image, lang='eng')
return data
给予:
,文字为
GXuMuUZ
有什么方法可以使它柔和一些
Is there any way to soften it a little
推荐答案
以下是两种可能的方法以及一种纠正文本失真的方法:
Here are two potential approaches and a method to correct distorted text:
方法1:形态学操作+轮廓过滤
Method #1: Morphological operations + contour filtering
Obtain binary image. Load image, grayscale, then Otsu's threshold.
删除文本轮廓.使用形态学操作以消除噪声.
过滤并消除小噪音. 查找轮廓,并使用轮廓区域以去除小颗粒.通过使用 cv2.drawContours
Filter and remove small noise. Find contours and filter using contour area to remove small particles. We effectively remove the noise by filling in the contour with cv2.drawContours
执行OCR.我们反转图像,然后稍加应用
高斯模糊 .然后,我们使用 Pytesseract 和--psm 6
配置选项进行OCR,将图像视为单个图像块.文本.查看 Tesseract提高质量的其他方法,以提高检测效率和
Perform OCR. We invert the image then apply a slight
Gaussian blur. We then OCR using Pytesseract with the --psm 6
configuration option to treat the image as a single block of text. Look at Tesseract improve quality for other methods to improve detection and Pytesseract configuration options for additional settings.
输入图像->
二进制->
变形打开
Input image ->
Binary ->
Morph opening
轮廓区域过滤->
反转->
应用模糊以获得结果
Contour area filtering ->
Invert ->
Apply blur to get result
OCR的结果
YabVzu
代码
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold
image = cv2.imread('2.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Find contours and remove small noise
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area < 50:
cv2.drawContours(opening, [c], -1, 0, -1)
# Invert and apply slight Gaussian blur
result = 255 - opening
result = cv2.GaussianBlur(result, (3,3), 0)
# Perform OCR
data = pytesseract.image_to_string(result, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()
方法2:颜色分割
观察到要提取的所需文本与图像中的噪声具有明显的对比度,因此我们可以使用颜色阈值分离文本.想法是先转换为HSV格式,然后转换为颜色阈值,以使用较低/较高的颜色范围获得蒙版.从开始,我们对Pytesseract使用相同的过程进行OCR.
With the observation that the desired text to extract has a distinguishable contrast from the noise in the image, we can use color thresholding to isolate the text. The idea is to convert to HSV format then color threshold to obtain a mask using a lower/upper color range. From were we use the same process to OCR with Pytesseract.
输入图像->
遮罩->
结果
代码
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('2.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)
# Invert image and OCR
invert = 255 - mask
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('mask', mask)
cv2.imshow('invert', invert)
cv2.waitKey()
更正扭曲的文本
当图像为水平时,OCR效果最佳.为了确保文本是用于OCR的理想格式,我们可以执行透视变换.消除所有噪声以隔离文本后,我们可以执行变形以将单个文本轮廓组合为单个轮廓.在这里,我们可以使用 cv2.minAreaRect
,然后执行 imutils.perspective.four_point_transform
.从清洁的面膜继续,得出以下结果:
OCR works best when the image is horizontal. To ensure that the text is in an ideal format for OCR, we can perform a perspective transform. After removing all the noise to isolate the text, we can perform a morph close to combine individual text contours into a single contour. From here we can find the rotated bounding box using cv2.minAreaRect
and then perform a four point perspective transform using imutils.perspective.four_point_transform
. Continuing from the cleaned mask, here's the results:
蒙版->
变形关闭->
检测到旋转的边界框->
结果
Mask ->
Morph close ->
Detected rotated bounding box ->
Result
与其他图像一起输出
更新的代码包括透视变换
Updated code to include perspective transform
import cv2
import pytesseract
import numpy as np
from imutils.perspective import four_point_transform
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, convert to HSV, color threshold to get mask
image = cv2.imread('1.png')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([0, 0, 0])
upper = np.array([100, 175, 110])
mask = cv2.inRange(hsv, lower, upper)
# Morph close to connect individual text into a single contour
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=3)
# Find rotated bounding box then perspective transform
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
rect = cv2.minAreaRect(cnts[0])
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(image,[box],0,(36,255,12),2)
warped = four_point_transform(255 - mask, box.reshape(4, 2))
# OCR
data = pytesseract.image_to_string(warped, lang='eng', config='--psm 6')
print(data)
cv2.imshow('mask', mask)
cv2.imshow('close', close)
cv2.imshow('warped', warped)
cv2.imshow('image', image)
cv2.waitKey()
注意:颜色阈值范围是使用此HSV阈值脚本确定的
Note: The color threshold range was determined using this HSV threshold script
import cv2
import numpy as np
def nothing(x):
pass
# Load image
image = cv2.imread('2.png')
# Create a window
cv2.namedWindow('image')
# Create trackbars for color change
# Hue is from 0-179 for Opencv
cv2.createTrackbar('HMin', 'image', 0, 179, nothing)
cv2.createTrackbar('SMin', 'image', 0, 255, nothing)
cv2.createTrackbar('VMin', 'image', 0, 255, nothing)
cv2.createTrackbar('HMax', 'image', 0, 179, nothing)
cv2.createTrackbar('SMax', 'image', 0, 255, nothing)
cv2.createTrackbar('VMax', 'image', 0, 255, nothing)
# Set default value for Max HSV trackbars
cv2.setTrackbarPos('HMax', 'image', 179)
cv2.setTrackbarPos('SMax', 'image', 255)
cv2.setTrackbarPos('VMax', 'image', 255)
# Initialize HSV min/max values
hMin = sMin = vMin = hMax = sMax = vMax = 0
phMin = psMin = pvMin = phMax = psMax = pvMax = 0
while(1):
# Get current positions of all trackbars
hMin = cv2.getTrackbarPos('HMin', 'image')
sMin = cv2.getTrackbarPos('SMin', 'image')
vMin = cv2.getTrackbarPos('VMin', 'image')
hMax = cv2.getTrackbarPos('HMax', 'image')
sMax = cv2.getTrackbarPos('SMax', 'image')
vMax = cv2.getTrackbarPos('VMax', 'image')
# Set minimum and maximum HSV values to display
lower = np.array([hMin, sMin, vMin])
upper = np.array([hMax, sMax, vMax])
# Convert to HSV format and color threshold
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, lower, upper)
result = cv2.bitwise_and(image, image, mask=mask)
# Print if there is a change in HSV value
if((phMin != hMin) | (psMin != sMin) | (pvMin != vMin) | (phMax != hMax) | (psMax != sMax) | (pvMax != vMax) ):
print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax))
phMin = hMin
psMin = sMin
pvMin = vMin
phMax = hMax
psMax = sMax
pvMax = vMax
# Display result image
cv2.imshow('image', result)
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
这篇关于使用OpenCV进行图像处理,去除图像中的背景文字和噪点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!