如何从文档中删除所有肖像图片 [英] How to remove all portrait pictures from a document

查看:65
本文介绍了如何从文档中删除所有肖像图片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在OCRing文档图像.我想检测所有图片并从文档图像中删除.我想在文档图像中保留表格.一旦检测到图片,我将删除并想要进行OCR.我试图找到试图检测所有较大区域的轮廓.不幸的是,它也检测到表.还有如何删除在文档映像中保留其他数据的对象.我正在使用opencv和python

I am working on OCRing a document image. I want to detect all pictures and remove from the document image. I want to retain tables in the document image. Once I detect pictures I will remove and then want to OCR. I tried to find contour tried to detect all the bigger areas. unfortunately it detects tables also. Also how to remove the objects keeping other data in the doc image. I am using opencv and python

这是我的代码

import os
from PIL import Image
import pytesseract

img = cv2.imread('block2.jpg' , 0)
mask = np.ones(img.shape[:2], dtype="uint8") * 255


ret,thresh1 = cv2.threshold(img,127,255,0)
contours, sd = cv2.findContours(thresh1,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)

areacontainer = []

for cnt in contours:
    area = cv2.contourArea(cnt)
    areacontainer.append(area)

avgArea = sum(areacontainer)/len(areacontainer)

    [enter code here][1]

for c in contours:# average area heuristics
    if cv2.contourArea(c)>6*avgArea:
        cv2.drawContours(mask, [c], -1, 0, -1)

binary = cv2.bitwise_and(img, img, mask=mask) # subtracting
cv2.imwrite("bin.jpg" , binary)
cv2.imwrite("mask.jpg" , mask) 

推荐答案

这里是一种方法:

  • 将图像转换为灰度和高斯模糊
  • 执行Canny边缘检测
  • 执行形态学运算以平滑图像
  • 使用最小/最大阈值区域查找轮廓并进行过滤
  • 删除人像图像

这里检测到的肖像以绿色突出显示

Here's the detected portraits highlighted in green

现在有了边界框ROI,我们可以通过用白色填充图片来有效地删除它们.这是结果

Now that we have the bounding box ROIs, we can effectively remove the pictures by filling them in with white. Here's the result

import cv2

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
canny = cv2.Canny(blur, 120, 255, 1)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
close = cv2.morphologyEx(canny, cv2.MORPH_CLOSE, kernel, iterations=2)

cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    area = cv2.contourArea(c)
    if area > 15000 and area < 35000:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (x, y), (x + w, y + h), (255,255,255), -1)

cv2.imshow('image', image)
cv2.waitKey()

这篇关于如何从文档中删除所有肖像图片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆