从笔记本中提取文本 [英] Text Extraction from Notebook

查看:91
本文介绍了从笔记本中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从图像中提取手写文本.我将python与opencv函数(例如find_contours)一起使用.当我使用这样的图像时,一切都很好:

I am trying to extract handwritten text from images. I use python with opencv functions such us find_contours. It was all going pretty well when I used images like this one:

效果很好,因为我有简单的背景.但是后来我用这张图片对其进行了测试:

It works fine because I have a plain background. But then I tested it with this image:

由于背景中笔记本的线条,我无法仅提取文本.尽管文本为红色,但我将所有图像都设为灰度或有时达到阈值,因此所有颜色都变为黑色,就像笔记本上的线条一样.这样,文本的颜色就无关紧要.所以我的问题是:有人可以给我建议或可能的解决方案,以处理这种背景以提取文本.我真的不想使用滑动窗口方法. 预先谢谢你

Because of the notebook's lines in the background, I am not able to extract the text only. Although the text is red, I turn all images to grayscale or sometimes threshold so it all turns black just like the notebook lines. That way the colour of the text does not matter. So my question here is: could anyone please give me advice or a possible solution on how to deal with this kind of background in order to extract the text. I really don't want to use the sliding window method. Thank you in advance

推荐答案

我决定再次尝试使用OpenCV中的HoughLinesP功能,这一次给了我更加令人满意的结果.这是我用来删除大部分行的代码的片段:

I decided to try again with the HoughLinesP functionality in OpenCV which this time gave me a much more promising and satisfying result. Here's a snippet for the code I used to remove most of the lines:

import cv2
import numpy

img = cv2.imread('thresh.png')
edges = cv2.Canny(img, 50, 150, apertureSize=3)
minLineLength = 0
maxLineGap = 5
lines = cv2.HoughLinesP(edges, 1, numpy.pi / 180, 100, minLineLength, maxLineGap)

for x in range(len(lines)):
    for x1, y1, x2, y2 in lines[x]:
        cv2.line(img, (x1, y1), (x2, y2), (0, 0, 0), 2)

cv2.imwrite('houghlines3.jpg', img)

其他信息:thresh.png是我存储初始图片的阈值版本的图像.整个过程的工作方式是在图像中找到线条并将其绘制为黑色(因为在我的阈值中,接近白色的部分变为黑色,反之亦然).这就是清除线条的方法.

Additional Info: thresh.png is the image in which I store the threshold version of the initial pic. The way this whole thing works is that it finds the lines in the image and paints them black(because in my threshold what is close to white becomes black and vice-versa). That's how it clears the lines.

PS:希望我能帮助别人!干杯!

PS: Hope I helped somebody! Cheers!

这篇关于从笔记本中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆