如何从文档中删除所有肖像图片 [英] How to remove all portrait pictures from a document
问题描述
我正在OCRing文档图像.我想检测所有图片并从文档图像中删除.我想在文档图像中保留表格.一旦检测到图片,我将删除并想要进行OCR.我试图找到试图检测所有较大区域的轮廓.不幸的是,它也检测到表.还有如何删除在文档映像中保留其他数据的对象.我正在使用opencv和python
I am working on OCRing a document image. I want to detect all pictures and remove from the document image. I want to retain tables in the document image. Once I detect pictures I will remove and then want to OCR. I tried to find contour tried to detect all the bigger areas. unfortunately it detects tables also. Also how to remove the objects keeping other data in the doc image. I am using opencv and python
这是我的代码
import os
from PIL import Image
import pytesseract
img = cv2.imread('block2.jpg' , 0)
mask = np.ones(img.shape[:2], dtype="uint8") * 255
ret,thresh1 = cv2.threshold(img,127,255,0)
contours, sd = cv2.findContours(thresh1,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
areacontainer = []
for cnt in contours:
area = cv2.contourArea(cnt)
areacontainer.append(area)
avgArea = sum(areacontainer)/len(areacontainer)
[enter code here][1]
for c in contours:# average area heuristics
if cv2.contourArea(c)>6*avgArea:
cv2.drawContours(mask, [c], -1, 0, -1)
binary = cv2.bitwise_and(img, img, mask=mask) # subtracting
cv2.imwrite("bin.jpg" , binary)
cv2.imwrite("mask.jpg" , mask)
推荐答案
这里是一种方法:
- 将图像转换为灰度和高斯模糊
- 执行Canny边缘检测
- 执行形态学运算以平滑图像
- 使用最小/最大阈值区域查找轮廓并进行过滤
- 删除人像图像
这里检测到的肖像以绿色突出显示
Here's the detected portraits highlighted in green
现在有了边界框ROI,我们可以通过用白色填充图片来有效地删除它们.这是结果
Now that we have the bounding box ROIs, we can effectively remove the pictures by filling them in with white. Here's the result
import cv2
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
canny = cv2.Canny(blur, 120, 255, 1)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
close = cv2.morphologyEx(canny, cv2.MORPH_CLOSE, kernel, iterations=2)
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
area = cv2.contourArea(c)
if area > 15000 and area < 35000:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (255,255,255), -1)
cv2.imshow('image', image)
cv2.waitKey()
这篇关于如何从文档中删除所有肖像图片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!