垂直投影和水平投影 [英] Vertical projection and horizontal projection

查看:301
本文介绍了垂直投影和水平投影的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在该论文中为ocr实现以下算法.

I'm trying to implement the following algorithm for ocr in that paper.

https://arxiv.org/ftp/arxiv/papers/1707 /1707.00800.pdf

我对那部分感到困惑

我构造了图像的垂直轮廓:

I constructed the vertical profile of an image:

env = np.sum(img, axis=1)

这就是我得到的

我正在寻找有关该算法的清晰说明,也许是使用伪代码

I'm looking for a clear explanation of the algorithm, maybe with a pseudo code

推荐答案

根据我的理解,该算法旨在分离单个阿拉伯字母,当它们通过水平线连接时(我对阿拉伯字母的了解为零)

From my understanding, this algorithm is designed to separate individual Arab letters, which when written are connected via a horizontal line (I have exactly zero knowledge in Arab letters).

因此,该算法假定给定的图像是水平对齐的(否则它将无法工作),并且它正在寻找具有黑色像素上键相似的区域.

So the algorithm assumes that the given image is horizontally aligned (otherwise it won't work), and it is looking for areas with similar upper bonds of the black pixels.

构造图像的垂直轮廓后,只需查找单词中最常见的高度(图像中第二高).比起您,您只需要将特定高度的区域与其余区域之间的图像分开即可.

After you have constructed the vertical profile of an image, you just need to find the most common height within the word (second highest in the image). Than you just separate the image between areas of that specific height and the rest.

使用图片:

红线是您需要查找的第二常见高度(可以使用直方图完成).

The red line is the second most common height that you need to find (can be done with a histogram).

绿线表示各个字符之间的分隔符(因此,您将获得4个字符).

The green lines represent the separations between individual characters (so here you will get 4 characters).

顺便说一句,您的图像比论文中使用的图像更嘈杂和失真,因此您可能应该找到一些值范围以将高度值离散化(例如,使用直方图).

By the way, your image is much more noisier and distorted than the one used in the paper, so you should probably find some range of values to discretize your height values to (for example with an histogram).

伪代码(或未经确认的未经测试的代码):

Pseudo-code (or unconfirmed untested code):

# Discretize the y values to n_bins (noisier image will mean you can use less bins):
height_hist = np.histogram(y, bins=n_bins)

# Find bin with the second largest number of values:
bin = np.argsort(height_hist[0])[-2]

# Get the limit values of the bin:
y_low, y_high = height_hist[1][bin], height_hist[1][bin+1]

# Go over the vertical projection values and separate to characters:

zero = y[0] # Assuming the first projected value is outside of the word
char_list = []
i = 0
inside_char = False
while i < len(y):
    if y[i] != zero:
        start = i # start of char

        # Find end of current char:
        for j in range(i, len(y)):
            if y_low<=y[i] and  y[i]<=y_high:
                end = j # end of char
                char_list.append([start, end]) # add to char list
                i = end

        # Find the start of the next char:
        for j in range(i, len(y)):
            if y_low>y[i] or  y[i]>y_high:
                i = j
    else:
        i += 1

这篇关于垂直投影和水平投影的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆