使用PYTHON PIL从Captcha图像中删除背景嘈杂的线条 [英] Removing background noisy lines from Captcha Image using PYTHON PIL

查看:861
本文介绍了使用PYTHON PIL从Captcha图像中删除背景嘈杂的线条的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个经过处理的验证码图片(已放大),如下所示:

I have a processed captcha image(Enlarged) look like :

如您所见,"TEXT"的字体大小比噪点线"的宽度大一点.
因此,我需要一种算法或代码来删除该图像中的噪点.

As you can see, the font-size of the "TEXT" is bit larger than the width of the Noisy Lines.
So I need an algorithm or code to remove the noisy lines from this image.

借助Python PIL库和下面提到的斩波算法,我无法获得OCR可以轻松读取的输出图像.

With the help of Python PIL Library and the chopping algorithm mentioned below I din't get the output image which could be easily read by OCRs.

这是我尝试过的Python代码:

Here's Python code that I tried :

import PIL.Image
import sys

# python chop.py [chop-factor] [in-file] [out-file]

chop = int(sys.argv[1])
image = PIL.Image.open(sys.argv[2]).convert('1')
width, height = image.size
data = image.load()

# Iterate through the rows.
for y in range(height):
    for x in range(width):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from x to image.width.
        for c in range(x, width):

            # If the pixel is dark, add it to the total.
            if data[c, y] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x + c, y] = 255

        # Skip this sequence we just altered.
        x += total


# Iterate through the columns.
for x in range(width):
    for y in range(height):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from y to image.height.
        for c in range(y, height):
            # If the pixel is dark, add it to the total.
            if data[x, c] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x, y + c] = 255

        # Skip this sequence we just altered.
        y += total

image.save(sys.argv[3])

因此,基本上,我想知道一种更好的算法/代码来消除噪声,从而能够使图像由OCR(Tesseract或pytesser)可读.

So, basically I would like to know a better algorithm/code to get rid of the noise and thus able to make the image readable by the OCR (Tesseract or pytesser).

推荐答案

要快速去除大部分行,可以将所有黑色像素和两个或更少的相邻黑色像素变为白色.那应该解决流浪线.然后,当您有很多块"时,可以删除较小的块.

To quickly get rid of most of the lines, you can turn all black pixels with two or less adjacent black pixels white. That should fix the stray lines. Then, when you have a lot of "blocks" you can remove the smaller ones.

这是假设示例图像已放大,并且线条只有一个像素宽.

This is assuming the sample image has been enlarged, and the lines are only one pixel wide.

这篇关于使用PYTHON PIL从Captcha图像中删除背景嘈杂的线条的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆