Python opencv消除验证码的噪音 [英] Python opencv remove noise from captcha
问题描述
我需要自动解析验证码才能从站点中获取公共数据.
I need to resolve captcha automatically to grab the public data from sites.
我使用python和opencv.我是解决图像处理方面的新手.搜索后,我想出了一种解决验证码的方法.由于Captha中的文本使用了一组相关的颜色,所以我尝试使用HSV格式和蒙版,然后将图像转换为灰度并使用阈值(Adaptive_THRESH_MEAN_C)来消除图像中的噪声.
I use python and opencv. I'm newbee in solving the images processing. After search, as a method to resolve captcha I came up with next. As the text in Captha uses group of related colours I try to use the HSV format and mask, then convert image to Grayscale and use Threshold (Adaptive_THRESH_MEAN_C) to remove noise from the image.
但这还不足以消除噪音并使用OCR(Tesseract)提供自动文本识别.参见下面的图片.
But this is not enough to remove noise and provide automatic text recognition with OCR (Tesseract). See images below.
我的解决方案有什么可以改进的吗?
Is there something I can improve in my solution or there is a better way?
原始图片:
已处理的图像:
代码:
import cv2
import numpy as np
img = cv2.imread("1.jpeg")
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv, (36, 0, 0), (70, 255,255)) #green
# mask = cv2.inRange(hsv, (0, 0, 0), (10, 255, 255))
# mask = cv2.inRange(hsv, (125, 0, 0), (135, 255,255))
img = cv2.bitwise_and(img, img, mask=mask)
img[np.where((img == [0,0,0]).all(axis = 2))] = [255,255,255]
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, 2)
cv2.imwrite("out.png", img)
推荐答案
我认为您可以通过应用一些平滑方法并在找到图像边缘后达到良好的性能. 这是代码:
I think you can reach a good performance by applying some smoothing methods and after that finding image edges. Here is the code:
import cv2
img = cv2.imread("input.jpg")
# smoothing the image
img = cv2.medianBlur(img, 5)
#edge detection
edges = cv2.Canny(img, 100, 200)
cv2.imwrite('output.png', edges)
这篇关于Python opencv消除验证码的噪音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!