使用OpenCV检测.pdf表单图像中的水平空白行 [英] Detect horizontal blank lines in .pdf form image with OpenCV

查看:219
本文介绍了使用OpenCV检测.pdf表单图像中的水平空白行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有.pdf个文件已为此项目转换为.jpg个图像.我的目标是识别通常在.pdf表格中发现的空格(例如____________),该空格表示供用户签名以填写某种信息的空间.我一直在使用cv2.Canny()cv2.HoughlinesP()函数进行边缘检测.

I have .pdf files that have been converted to .jpg images for this project. My goal is to identify the blanks (e.g ____________) that you would generally find in a .pdf form that indicate a space for the user to sign of fill out some kind of information. I have been using edge detection with the cv2.Canny() and cv2.HoughlinesP() functions.

这工作得很好,但是有很多误报是由看似无处产生的.当我查看"edges"文件时,在其他单词周围显示出一堆杂音.我不确定这种噪音是从哪里来的.

This works fairly well, but there are quite a few false positives that come about from seemingly nowhere. When I look at the 'edges' file it shows a bunch of noise around the other words. I'm uncertain where this noise comes from.

我应该继续调整参数,还是找到一种更好的方法来查找这些空白的位置?

Should I continue to tweak the parameters, or is there a better method to find the location of these blanks?

推荐答案

假设您要尝试在.pdf表单上查找水平线,这是一种简单的方法:

Assuming that you're trying to find horizontal lines on a .pdf form, here's a simple approach:

  • 将图像转换为灰度和自适应阈值图像
  • 构造特殊内核以仅检测水平线
  • 进行形态转换
  • 找到轮廓并绘制在图像上

使用此示例图片

转换为灰度和自适应阈值以获得二进制图像

Convert to grayscale and adaptive threshold to obtain a binary image

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

然后我们用cv2.getStructuringElement()创建一个内核并执行形态转换以隔离水平线

Then we create a kernel with cv2.getStructuringElement() and perform morphological transformations to isolate horizontal lines

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

从这里我们可以使用cv2.HoughLinesP()来检测线条,但是由于我们已经对图像进行了预处理并隔离了水平线条,因此我们只能找到轮廓并绘制结果

From here we can use cv2.HoughLinesP() to detect lines but since we have already preprocessed the image and isolated the horizontal lines, we can just find contours and draw the result

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (36,255,12), 3)

完整代码

import cv2

image = cv2.imread('2.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15,1))
detected_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)

cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

for c in cnts:
    cv2.drawContours(image, [c], -1, (36,255,12), 3)

cv2.imshow('thresh', thresh)
cv2.imshow('detected_lines', detected_lines)
cv2.imshow('image', image)
cv2.waitKey()

这篇关于使用OpenCV检测.pdf表单图像中的水平空白行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆