去除图像的颜色 [英] Remove color from an image

查看:72
本文介绍了去除图像的颜色的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从下面的图像中删除颜色,由于这种颜色,我无法从图像中清楚地提取文本.

I want to remove the color from the below image, due to this color I am unable to extract the text clearly from the image.

我正在使用下面的代码,但是我没有得到清晰的文字,

I am using the below code, but I am not getting the clear text,

import numpy as np
from PIL import Image

im = Image.open('my_file.tif')
im = im.convert('RGBA')
data = np.array(im)
# just use the rgb values for comparison
rgb = data[:,:,:3]
color = [246, 213, 139]   # Original value
black = [0,0,0, 255]
white = [255,255,255,255]
mask = np.all(rgb == color, axis = -1)
# change all pixels that match color to white
data[mask] = white

# change all pixels that don't match color to black
##data[np.logical_not(mask)] = black
new_im = Image.fromarray(data)
new_im.save('new_file.tif')

def black_and_white(input_image_path,
                output_image_path):
color_image = Image.open(input_image_path)
bw = color_image.convert('L')
bw.save(output_image_path)

请帮助我...

图片2:

推荐答案

我假设您要提取报价.为此,您可以执行一系列过滤操作以删除非文本轮廓.获得处理结果后,可以使用Pytesseract等OCR工具提取文本.

I'm assuming you want to extract the quote. To do this, you can do a series of filtering operations to remove non-text contours. Once you have the processed result you can use an OCR tool such as Pytesseract for text extraction.

OCR的结果

On behalf of the hundreds of ACLU activists who
called on Governor Walker to veto House Bill
156, we are disappointed that he did not put
students or the Constitution first today."
—Joshua A. Decker
Executive Director

代码

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image and threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# Connect text with a horizontal shaped kernel
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,3))
dilate = cv2.dilate(thresh, kernel, iterations=3)

# Remove non-text contours using aspect ratio filtering
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    aspect = w/h
    if aspect < 3:
        cv2.drawContours(thresh, [c], -1, (0,0,0), -1)

# Invert image and OCR
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)

cv2.imshow('result', result)
cv2.waitKey()

这篇关于去除图像的颜色的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆