使用opencv进行分词 [英] word segmentation using opencv

查看:127
本文介绍了使用opencv进行分词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一些扫描的文本图像,我需要突出显示该图像中的所有单词。我知道问题等同于查找周围有额外空格的子图像。

I am working on some scanned text images and I need to highlight all the words in that image.I know the problem is equivalent to finding subimages with extra whitespaces around them.

OCR无法使用,我只需要用边框勾勒每个单词。
有人可以建议如何使用OpenCV来完成。

OCR cannot be used and I just need to outline each word with a border. Can someone suggest how it might be done using OpenCV.

我已经尝试过关于阈值处理和分段的阅读。我只是想找人指点我相关材料。

I have tried reading about thresholding and segmenting.I am just looking for someone to point me to some relevant material.

推荐答案

我认为您的图片有多行文字。在这种情况下,首先要做的是检测这些线。

I think your image has a multiline text. In that case, first you have to do is to detect these lines.

为此,首先使用Otsu方法或自适应阈值对图像进行二值化。

然后,您可以使用所谓的水平直方图。它就像直方图本身,但显示有线条的地方和有空格的地方。所以将图像分成空白行,然后得到每一行。下面是水平直方图的图像。

Then,you can use something what is called as "Horizontal histogram". It is like a histogram itself, but shows where there are lines and where there are blank spaces. So devide the images at blank lines, and you get each line. Below is the image of a horizontal histogram.

现在,对于每一行,找到水平直方图。在此之前,尝试做一些扩张和侵蚀,以便将所有字母组合在一起。然后,您可以在每一行上找到连接的组件来获取每个单词。然后绘制边界。

Now for each line, find horizontal histogram. Before that, try to do some dilatation and erosion, so that all letters are grouped together. Then you can find connected components on each line to get each word. Then draw boundaries.

下图显示水平和垂直直方图:

Below image shows both horizontal and vertical histograms:

此SOF可能有所帮助:如何将图片转换为字符段?

This SOF might help : How to convert an image into character segments?

这篇关于使用opencv进行分词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆