识别Talmud页面上的文本区域 [英] Identify text areas on a Talmud page

查看:96
本文介绍了识别Talmud页面上的文本区域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似这样的Talmud页面:


并且我想用 opencv 查找文本区域以获得这样的结果,即每个文本都将像这样:



在所附图像中,每个区域都用不同的颜色标记,并且文本带有数字,重要的是要识别


用眼睛做它确实很容易,文本之间传递的白色条纹,但是我尝试使用 opencv 来完成,但是我做不到。


在以下代码中我试着抓住所有字母,将它们变成黑色矩形,
,然后放大每个矩形,使其与相邻的矩形,
会合,因此文本的整个区域将是黑色的,并且在文本之间会是一条清晰的白色条纹。


我不知道如何进行操作,这是否是一个好方法。

  public List< Rectangle> getRects(Mat grayImg)
{
BlobCounter blobCounter = new BlobCounter();
blobCounter.ObjectsOrder = ObjectsOrder.None;
blobCounter.ProcessImage(grayImg);
IEnumerable< Blob> blobs = blobCounter.GetObjectsInformation();

var blackBlobs = grayImg.Clone;
foreach(blob中的var b)
blackBlobs.Rectangle(b.Rectangle.ToCvRect,Scalar.Black,-1);

var widths = blobs.Select(X => X.Rectangle.Width).ToList;
widths.Sort();
var中位数= widths(widths.Count /(double)2);

Mat erodet = new Mat();
Cv2.Erode(grayImg,erodet,null,迭代次数:中位数);

使用(Window win = new Window())
{
win.ShowImage(erodet);
win.WaitKey();
}
}

在此先感谢您的帮助。


其他说明:


如上图所示,文本区域不是矩形,
但是这些区域可以描述为一堆不同大小的矩形的集合,这些矩形排列成一堆,一个在另一个的顶部。


请注意,当两个矩形属于同一文本时,请勿排列一个矩形


我想要实现的是这些矩形的集合,并知道其所属的每个矩形。


答案可以是任何编程语言,尤其是在 C ++ Python C#

解决方案

我相信可以使用


(如果不是下一页中令人讨厌的单词,打印机将在每个部分的底部添加...)


膨胀没有填补一些较大的空白,我们可以使用


使用


向后扩展

  d2 = imdilate(e,ones(1,5 * gap)); 

使用此二进制掩码:


您现在可以简单地查看


我希望这可以算作一个 Daf Yomi对我来说...




更新:

下一步-从线段到矩形多边形需要一些几何运算,我将在此处概述该方法,并将实现细节留给您。

最终,我们需要一个边界

第二个图像:


I have a Talmud page like these: And I want to find the text areas with opencv to get such a result, that each text will be on its own like this:

In the attached image, each area is marked in a different color, and text has a number, what is important is to identify the area belonging to each text, and differentiate it from the area belonging to another text, the numerical order does not matter.

Doing it with the eyes is really easy, according to the white stripes that pass between the texts, but I tried to do it with opencv and I could not.

In the following code I try to catch all the letters and turn them into black rectangles, Then magnify each rectangle to meet with a neighboring rectangle, And so the whole area of the text will be black, and between the texts there will be a clear white stripe.

I do not know how to proceed, and if it is a good approach.

public List<Rectangle> getRects(Mat grayImg)
{
    BlobCounter blobCounter = new BlobCounter();
    blobCounter.ObjectsOrder = ObjectsOrder.None;
    blobCounter.ProcessImage(grayImg);
    IEnumerable<Blob> blobs = blobCounter.GetObjectsInformation();

    var blackBlobs = grayImg.Clone;
    foreach (var b in blobs)
        blackBlobs.Rectangle(b.Rectangle.ToCvRect, Scalar.Black, -1);

    var widths = blobs.Select(X => X.Rectangle.Width).ToList;
    widths.Sort();
    var median = widths(widths.Count / (double)2);

    Mat erodet = new Mat();
    Cv2.Erode(grayImg, erodet, null, iterations: median);

    using (Window win = new Window())
    {
        win.ShowImage(erodet);
        win.WaitKey();
    }
}

Thanks in advance, any help would be appreciated.

Additional clarification:

As you can see in the previous image, the text areas are not rectangular, But these areas can be described as a collection of rectangles of different sizes arranged in a pile, one on top of the other.

Note that when two rectangles belong to the same text, do not arrange one rectangle next to another rectangle, but only one above the other.

What I am trying to achieve is a collection of these rectangles and knowing each rectangle to which text it belongs.

An answer can be in any programming language, especially in C++ Python and C#

解决方案

I believe this task can be done mostly using morphological operations.
It is easier to show the concept in , but has equivalent operations.

We start with a rough estimation of the size of the gap between the different sections of the page. Looking at your example, the gap is about 1% of the page's height.

img = im2single(rgb2gray(imread('https://i.stack.imgur.com/LoV5x.jpg')));  % read the image into 1ch gray scale image in range [0, 1]
gap = ceil(size(img,1) * 0.01);  % gap estimation

First, we would like to use image dilation to create a mask where all words in the same section are connected to each other:

d1 = imdilate(img < 0.5, ones(gap));

Resulting with:

(If it wasn't for the annoying words from the next page the printer adds at the bottom of each section we would have been done...)

There are some large gaps the dilation did not fill, we can use floodfill to complete them:

f = imfill(d1, 'holes');

Now we have full masks for the text regions:

Using erosion to cut between the different sections:

e = imerode(f, ones(1, 5*gap));  % erosion only horizontally

Resulting with correct partition, although too thinned:

Dilating back

d2 = imdilate(e, ones(1, 5*gap));

gives this binary mask:

You can now simply look at the connected components of this binary mask:

I hope this will count as a "Daf Yomi" for me...


Update:
The next step - going from segments to rectangular polygons requires some geometrical operations, I'll outline the approach here and leave the implementation details to you.
Eventually, we want a bounding polygon for each segment, with the basic polygon being the rectangle bounding box of the segment. You'll have to implement this "polygon" class. A crucial method of this class is "polygon subtraction" - that is poly_result = poly_a - poly_b create a new polygon poly_result which is poly_a minus the intersection between poly_a and poly_b.

Here's the algorithm:

  1. For each segment compute it's bounding box, the area of the bounding box and the actual number of pixels in the segment.
    Init the polygon of each segment to its bounding box.
  2. Sort the segments based on the ratio between the number of pixels and the bounding box area in a descending order.
  3. For each segment in a descending order:
    subtract all previous polygons from this segment's.

You should get something like this:


And for the second image:

这篇关于识别Talmud页面上的文本区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆