使用cv2/pytesseract增强数字识别的局部对比度 [英] Local Contrast Enhancement for Digit Recognition with cv2 / pytesseract

查看:78
本文介绍了使用cv2/pytesseract增强数字识别的局部对比度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用pytesseract从图像中读取数字.图片如下:

I want to use pytesseract to read digits from images. The images look as follows:

数字是点缀的,为了能够使用pytesseract,我需要在白色背景上的黑色连接数字.为此,我考虑过使用 erode dilate 作为预处理技术.如您所见,图像相似,但在某些方面却大不相同.例如,第一个图像中的点比背景暗,而第二个图像中的点更白.这意味着,在第一个图像中,我可以使用腐蚀来获得黑色的连接线,而在第二个图像中,我可以使用膨胀来获得白色的连接线,然后反转颜色.这将导致以下结果:

The digits are dotted and in order to be able to use pytesseract, I need black connected digits on a white background. To do so, I thought about using erode and dilate as preprocessing techniques. As you can see, the images are similar, yet quite different in certain aspects. For example, the dots in the first image are darker than the background, while the dots in the second are whiter. That means, in the first image I can use erode to get black connected lines and in the second image I can use dilate to get white connected lines and then inverse the colors. This leads to the following results:

使用适当的阈值,可以很容易地用pytesseract读取第一张图像.第二张图片,无论是谁,都比较棘手.问题是,例如"4"的一部分被保留.比这三个背景的背景要暗.因此,一个简单的门槛将行不通.我需要像局部阈值或局部对比度增强之类的东西.有人在这里有个主意吗?

Using an appropriate threshold, the first image can easily be read with pytesseract. The second image, whoever, is more tricky. The problem is, that for example parts of the "4" are darker than the background around the three. So a simple threshold is not going to work. I need something like local threshold or local contrast enhancement. Does anybody have an idea here?

OTSU,平均阈值和高斯阈值导致以下结果:

OTSU, mean threshold and gaussian threshold lead to the following results:

推荐答案

您的图像分辨率很低,但是您可以尝试一种称为"增益划分" 的方法.这个想法是,您尝试构建背景模型,然后通过该模型对每个输入像素进行加权.在大多数图像中,输出增益应相对恒定.

Your images are pretty low res, but you can try a method called gain division. The idea is that you try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image.

执行增益划分后,您可以尝试通过应用区域滤镜形态来改善图像.我只尝试了您的第一张图片,因为它是最差"的图片.

After gain division is performed, you can try to improve the image by applying an area filter and morphology. I only tried your first image, because it is the "least worst".

以下是获取增益划分图像的步骤:

These are the steps to get the gain-divided image:

  1. 应用软的中值模糊过滤器以消除高频噪声.
  2. 通过局部最大值获取背景模型.
  3. 使用一个很大的 structuring元素进行非常强大的 close 操作(我使用的是大小为 15 的矩形核).通过在每个局部最大像素之间划分 255
  4. 进行增益调整.权衡每个输入图像像素的值.
  5. 您应该获得一个很好的图像,其背景照度非常标准化阈值,以获得该字符的二进制掩码.
  1. Apply a soft median blur filter to get rid of high frequency noise.
  2. Get the model of the background via local maximum. Apply a very strong close operation, with a big structuring element (I’m using a rectangular kernel of size 15).
  3. Perform gain adjustment by dividing 255 between each local maximum pixel. Weight this value with each input image pixel.
  4. You should get a nice image where the background illumination is pretty much normalized, threshold this image to get a binary mask of the characters.

现在,您可以通过以下附加步骤来提高图像质量:

Now, you can improve the quality of the image with the following, additional steps:

  1. 阈值通过 Otsu ,但添加了少量偏差.(不幸的是,这是手动操作,具体取决于输入内容.)

  1. Threshold via Otsu, but add a little bit of bias. (This, unfortunately, is a manual step depending on the input).

应用一个区域过滤器以过滤出较小的噪声斑点.

Apply an area filter to filter out the smaller blobs of noise.

让我们看看代码:

import numpy as np
import cv2

# image path
path = "C:/opencvImages/"
fileName = "iA904.png"

# Reading an image in default mode:
inputImage = cv2.imread(path+fileName)

# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)

# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)

# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))

# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)

# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8") 

# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)

这是收益分配给您的:

请注意,照明更为平衡.现在,让我们应用一些对比度增强功能:

Note that the lighting is more balanced. Now, let's apply a little bit of contrast enhancement:

# Contrast Enhancement:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))

您将获得此效果,这会在前景和背景之间产生更多的对比度:

You get this, which creates a little bit more contrast between the foreground and the background:

现在,让我们尝试对该图像进行阈值处理以获得漂亮的二进制蒙版.正如我建议的那样,请尝试使用Otsu的阈值,但要对结果增加(或减去)一点偏差.如上所述,此步骤取决于您输入的质量:

Now, let's try to threshold this image to get a nice, binary mask. As I suggested, try Otsu's thresholding but add (or subtract) a little bit of bias to the result. This step, as mentioned, is dependent on the quality of your input:

# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)

threshValue = 0.9 * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)

您最终得到了这个二进制掩码:

You end up with this binary mask:

将其反转并滤除小斑点.我将 area 阈值设置为 10 像素:

Invert this and filter out the small blobs. I set an area threshold value of 10 pixels:

# Invert image:
binaryImage = 255 - binaryImage

# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)

# Set the minimum pixels for the area filter:
minArea = 10

# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]

# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype("uint8")

这是最终的二进制掩码:

And this is the final binary mask:

如果您打算将此图像发送到 OCR ,则可能要先应用一些形态.也许结束尝试将组成字符的点连接起来.另外,请确保使用与实际尝试识别的字体接近的字体来训练您的 OCR 分类器.这是经过 3 次迭代的 3 rectangular closing 操作后的(倒置)掩码:

If you plan on sending this image to an OCR, you might want to apply some morphology first. Maybe a closing to try and join the dots that make up the characters. Also be sure to train your OCR classifier with a font that is close to what you are actually trying to recognize. This is the (inverted) mask after a size 3 rectangular closing operation with 3 iterations:

要获取最后一张图像,请按以下步骤处理过滤后的输出:

To get the last image, process the filtered output as follows:

# Set kernel (structuring element) size:
kernelSize = 3

# Set operation iterations:
opIterations = 3

# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))

# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)

# Invert image to obtain black numbers on white background:
closingImage = 255 - closingImage

这篇关于使用cv2/pytesseract增强数字识别的局部对比度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆