是否有一个有效的算法用于手写文本切分? [英] Is there an efficient algorithm for segmentation of handwritten text?

查看:257
本文介绍了是否有一个有效的算法用于手写文本切分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过线(和词语的将来)自动将古老的手写文字的图像。

第一个明显的部分是preprocessing图像...

我只是用一个简单的数字化(基于像素的亮度)。之后,我将数据存储成二维数组。

下一个明显的部分是分析二进制数组。

  1. 我的第一个算法是pretty的简单 - 如果有更多的黑色像素在一排阵比根均平方的最大最小<的/ em>的值,则该行是行的一部分。

    成型生产线的名单后,我断绝的的小于平均线。 最后它原来变成某种线性回归,试图尽量减少空白行和文本行之间的差异。 (我认为的事实)

  2. 我的第二次​​尝试 - 我试图用遗传算法与一些健身的功能。 染色体包含3个值 - 的 XO,X1,X2。 XO [-1; 0] X1 [0; 0.5] X2 [0; 0.5]

函数,它确定身份的行线的(XO +α1X1 +α2×2)> 0 ,其中α1缩放黑色像素行的总和,α2之间范围的中间值极端的黑色像素行。 (A1,A2 [0,1]) 另一个功能,那我想是的(X1&LT;α1和X2>α2)(1 / XO + A1 X1] / [A2×2])> 0 最后功能是最有效的。 健身功能 (1 /(HeigthRange + SpacesRange)

其中范围是最大值和最小值之差。它重新presents文字的均匀性。此函数的全局最优。 - 最平稳的方式将图像划分为行

我是用我的自我codeD GA C#(古典,有2个点交叉,灰度code染色体,最大的人口为40,突变率0.05)

现在我跑出来的想法如何划分这个图像与〜100%的准确度线。

什么是有效的算法来做到这一点?


更新: 原始图片 原BMP(1.3 MB)


UPDATE2: 在这个文本100%业绩改善

我怎么做的:

  • 在固定范围内的小错误计数
  • 改变的适应度函数为1 /(distancesRange + 1)*(heightsRange + 1))
  • 最小化分级函数(1 / XO + X2 /范围)> 0(在点列现在不影响分类) (即优化的输入数据,并做出适应度函数的优化更明确的)

问题:

GA居然没有认识到这条线。我看着发现肆虐功能,发现调试数据,有在无法识别的地方太多的噪音。 该函数code是如下:

 公共双[]范围()
{
            VAR范围=新的双[_original.Height]

            对于(INT Y = 0; Y&LT; _original.Height; Y ++)
            {
                范围[Y] = 0;
                变种DX =新的名单,其中,INT&GT;();
                INT最后= 0;
                INT X = 0;

                而(最后== 0安培;&放大器,X&LT; _original.Width)
                {
                    如果(_bit [X,Y])
                        最后= X;
                    X ++;
                }

                如果(最后== 0)
                {
                    范围[Y] = 0;
                    继续;
                }

                为(X =最后,X&LT; _original.Width; X ++)
                {
                    如果(_bit [X,Y]!)继续;

                    如果(最后= X  -  1)
                    {
                        dx.Add((X-最后一个)+1);
                    }
                    最后= X;
                }
                如果(dx.Count→2)
                {
                    dx.Sort();
                    范围[Y] = DX [dx.Count / 2];
                    //范围[Y] = dx.Average();
                }
                其他
                    范围[Y] = 0;
            }

        变种最大= ranges.Max();
        的for(int i = 0; I&LT; ranges.Length;我++)
        {
            如果(Math.Abs​​(范围[I]  -  0)&所述; 0.9)
                范围[I] =最大;
        }
        返回的范围;
}
 

我使用一些黑客在这code。主要的原因 - 我希望尽量减少最近黑像素之间的范围内,但如果没有像素,值变为0,并且变得不可能解决这个问题,发现OPTIMAS。第二个原因 - 这code为改变过于频繁。 我会尝试完全改变这个code,但我不知道该怎么做。

问:

  1. 如果有更有效的适应度函数?
  2. 如何找到更灵活的判断功能?
解决方案

虽然我不知道如何翻译下面的算法为GA(我不知道为什么你需要使用遗传算法对于这个问题),以及我可以提出它是关闭基地,在这里不用。

的简单技术我建议是计数每行的黑色像素的数量。 (其实这是每行的暗像素密度)。这需要非常少的业务,并与一些额外的计算中不难发现,在像素和的直方图的峰值。

一个原始直方图会是这个样子,其中沿左侧的图示显示一排暗的像素数。对于知名度,实际计数归伸出到x = 200。

在一些额外的,简单的处理被添加(以下描述),就可以产生这样,可以在某些阈值被限幅的直方图。剩下的山峰,指示文本行的中心。

从那里,这是一个简单的事情,找到行:刚刚剪辑(阈值)直方图在某一值,如1/2或2/3的最大值,可选检查峰在您的限幅阈值的宽度一些最小值瓦特

一个实施全(!但仍然简单)算法找到更好直方图如下:

  1. 在二值化利用移动平均阈值或类似的本地阈值技术的情况下,一个标准的大津阈值的操作上的像素边缘附近的图像是不能令人满意的。或者,如果你有一个很好的黑的白图像,只需用128作为二值化的阈值。
  2. 创建一个数组来存储你的柱状图。这个阵列的长度将是图像的高度。
  3. 对于在二值化图像的每个像素(x,y)时,发现上面和下面暗像素的数量(X,Y)在一些半径R也就是,从(X,Y计数暗像素的数量 - R)为x(Y + R),包容性。
  4. 如果暗像素在垂直半径R的数目等于或大于于R - 也就是说,至少有一半的像素是暗 - 然后像素(x,y)的具有足够的垂直暗邻居。增加你的箱数为y行。
  5. 当你在每一行进军,跟踪最左边和最右边的x值,并提供足够的邻居像素。只要宽度(右 - 左+ 1)超过某个最小值,根据该宽度除以暗的像素的总数。这种标准化的计数,以确保短行像文本的最后一行都包括在内。
  6. (可选)平滑得到的直方图。我只是用平均超过3行。

的垂直计(步骤3)消除了正好位于上方或下方文本的中心线水平笔划。一个更复杂的算法将只是检查直接的上方和下方(X,Y),同时也向左上方,右上方,左下方和右下方。

使用我,而原油实现在C#中,我能够处理图像,在不到75毫秒。在C ++中,并与一些基本的优化,我已经毫无疑问的时间可以大大减少。

这直方图法假定该文本是水平的。由于该算法是相当快的,你可以有足够的时间来计算像素数直方图与水平面成每5度的增量。扫描方向的最大峰值/谷的差异将表明旋转。

我不熟悉的GA术语,但如果我所建议的是一些价值我敢肯定,你可以把它翻译成GA条款。在任何情况下,我感兴趣的是这个问题,无论如何,所以我也可以分享。

编辑:也许使用GA,最好是想,因为previous暗像素的距离X的条款(或沿角度theta)和自previous暗像素Y中距离(或沿着角度θ - PI / 2])。你也可以检查由白色像素的距离暗像素在所有的径向方向(找环路)。

 字节[,] ARR = get2DArrayFromBitamp();从originalBitmap //源数组
INT瓦特= arr.GetLength(0); //宽二维数组
INT H = arr.GetLength(1); //二维数组的高度

//我们可以使用暗像素的第二二维数组属于垂直笔划
字节[,]字节=新字节[W,H]。在垂直方向的行程//暗像素


//初始变形
INT R = 4; //半径检查暗像素
诠释计数= 0; //半径范围内的暗像素数量

//填补字节[,]数组只属于垂直笔划像素
为(中间体X = 0 X  - 其中;瓦特; X ++)
{
    //为第r行,刚刚成立的像素为白色
    对于(INT Y = 0; Y&LT; R,Y ++)
    {
        字节[X,Y] = 255;
    }

    //假设值小于像素; 128是文字暗像素
    对于(INTŸ= R,Y&LT; H  - 的R  -  1; Y ++)
    {
        计数= 0;

        //计数上方和下方的暗像素(X,Y)
        检查//总范围为2R,从-r至+ R
        对于(INT J = -r; J&LT; = R,J ++)
        {
            如果(ARR [X,Y + J] LT; 128)的计++;
        }

        //如果有一半的象素为暗,[X,Y]为垂直行程的一部分
        字节[X,Y] =计数&GT; = R (字节)0:(字节)255;
    }

    //最后r行,刚刚成立的像素为白色
    对于(INT Y = H  - 的R  -  1; Y&LT; H; Y ++)
    {
        字节[X,Y] = 255;
    }
}

//计数每一行中有效暗像素的数量
浮动最大= 0;

浮动[]垃圾桶=新的浮动[H] //正常化暗像素力量为在H列
诠释左,右,宽度; //最左边和最右边的一排暗像素
布尔黑= FALSE; //跟踪变量

对于(INT Y = 0; Y&LT; H; Y ++)
{
    //在循环迭代的开始初始化值
    左边= 0;
    右= 0;
    宽= 100;

    为(中间体X = 0 X  - 其中;瓦特; X ++)
    {
        128作为光明与黑暗之间的阈//使用价值
        暗=字节[X,Y]&LT; 128;

        如果像素是暗//增量斌
        仓[Y] + =黑暗? 1:0;

        //更新最左边和最右边的暗像素
        如果(暗)
        {
            如果(左== 0)左= X;
            如果(X&GT;右)右= X;
        }
    }

    宽度=右 - 左+ 1;

    //为箱与几个像素,将其视为空
    如果(仓[y]的小于10)仓[Y] = 0;

    根据宽度//规范化值
    //通过宽度除以箱数(最左边到最右边)
    仓[Y] / =宽度;

    //计算最大的合并值,这样可以垃圾桶时绘制缩放
    如果(箱[Y]&GT;最大)最大=仓[Y]
}

//计算每个仓的平滑值进行平均箱柜i-1,i和i + 1的我
浮动[] =光滑新浮法[bins.Length]

平滑[0] =仓[0];
顺利[smooth.Length  -  1] =仓[bins.Length  -  1];

的for(int i = 1; I&LT; bins.Length  -  1;我++)
{
    平滑[I] =(垃圾箱[I  -  1] +仓[I] +仓[I + 1])/ 3;
}

//创建一个基于原始位新位图,然后绘制在上面箱
BMP位图=新位图(originalBitmap);

使用(图形克= Graphics.FromImage(BMP))
{
    对于(INT Y = 0; Y&LT; bins.Length; Y ++)
    {
        //刻度每个仓,以便它与左边缘绘制200个像素宽
        浮点值= 200 *(浮点)顺利[Y] /最大;
        gr.DrawLine(Pens.Red,新的PointF(0,y)的,新的PointF(,y值));
    }
}

pictureBox1.Image = BMP;
 

I want to automatically divide an image of ancient handwritten text by lines (and by words in future).

The first obvious part is preprocessing the image...

I'm just using a simple digitization (based on brightness of pixel). After that I store data into two-dimensional array.

The next obvious part is analyzing the binary array.

  1. My first algorithm was pretty simple - if there are more black pixels in a row of the array than the root-mean-square of Maximum and Minimum value, then this row is part of line.

    After forming the list of lines I cut off lines with height that is less than average. Finally it turned out into some kind of linear regression, trying to minimize the difference between the blank rows and text rows. (I assumed that fact)

  2. My second attempt - I tried to use GA with several fitness functions. The chromosome contained 3 values - xo, x1, x2. xo [-1;0] x1 [0;0.5] x2 [0;0.5]

Function, that determines identity the row to line is (xo + α1 x1 + α2 x2) > 0, where α1 is scaled sum of black pixels in row, α2 is median value of ranges between the extreme black pixels in row. (a1,a2 [0,1]) Another functions, that i tried is (x1 < α1 OR x2 > α2) and (1/xo + [a1 x1] / [a2 x2] ) > 0 The last function is the most efficient. The fitness function is (1 / (HeigthRange + SpacesRange)

Where range is difference between maximum and minimum. It represents the homogeneity of text. The global optimum of this function - the most smooth way to divide the image into lines.

I am using C# with my self-coded GA (classical, with 2-point crossover, gray-code chromosomes, maximum population is 40, mutation rate is 0.05)

Now I ran out of ideas how to divide this image into lines with ~100% accuracy.

What is the efficient algorithm to do this?


UPDATE: Original image Original BMP (1.3 MB)


UPDATE2: Improved results on this text to 100%

How i did it:

  • fixed minor bug in range count
  • changed fitness function to 1/(distancesRange+1)*(heightsRange+1))
  • minimized classifying function to (1/xo + x2/range) > 0 (points in row now don't affect classification) (i.e. optimized input data and made fitness function optimizations more explicit)

Problem:

GA surprisingly failed to recognize this line. I looked at debug data of 'find rages' function and found, that there is too much noise in 'unrecognized' place. The function code is below:

public double[] Ranges()
{
            var ranges = new double[_original.Height];

            for (int y = 0; y < _original.Height; y++ )
            {
                ranges[y] = 0;
                var dx = new List<int>();
                int last = 0;
                int x = 0; 

                while (last == 0 && x<_original.Width)
                {
                    if (_bit[x, y])
                        last = x;
                    x++;
                }

                if (last == 0)
                {
                    ranges[y] = 0;
                    continue;
                }

                for (x = last; x<_original.Width; x++)
                {
                    if (!_bit[x, y]) continue; 

                    if (last != x - 1)
                    {
                        dx.Add((x-last)+1);
                    }
                    last = x;
                }
                if (dx.Count > 2)
                {
                    dx.Sort();
                    ranges[y] = dx[dx.Count / 2];
                    //ranges[y] = dx.Average();
                }
                else
                    ranges[y] = 0;
            }

        var maximum = ranges.Max();
        for (int i = 0; i < ranges.Length; i++)
        {
            if (Math.Abs(ranges[i] - 0) < 0.9)
                ranges[i] = maximum;
        }
        return ranges;
}

I'm using some hacks in this code. The main reason - I want to minimize the range between nearest black pixels, but if there are no pixels, the value becomes '0', and it becomes impossible to solve this problem with finding optimas. The second reason - this code is changing too frequently. I'll try to fully change this code, but I have no idea how to do it.

Q:

  1. If there is more efficient fitness function?
  2. How to find more versatile determination function?

解决方案

Although I'm not sure how to translate the following algorithm into GA (and I'm not sure why you need to use GA for this problem), and I could be off base in proposing it, here goes.

The simple technique I would propose is to count the number of black pixels per row. (Actually it's the dark pixel density per row.) This requires very few operations, and with a few additional calculations it's not difficult to find peaks in the pixel-sum histogram.

A raw histogram will look something like this, where the profile along the left side shows the number of dark pixels in a row. For visibility, the actual count is normalized to stretch out to x = 200.

After some additional, simple processing is added (described below), we can generate a histogram like this that can be clipped at some threshold value. What remains are peaks indicating the center of lines of text.

From there it's a simple matter to find the lines: just clip (threshold) the histogram at some value such as 1/2 or 2/3 the maximum, and optionally check that the width of the peak at your clipping threshold is some minimum value w.

One implementation of the full (yet still simple!) algorithm to find the nicer histogram is as follows:

  1. Binarize the image using a "moving average" threshold or similar local thresholding technique in case a standard Otsu threshold operating on pixels near edges isn't satisfactory. Or, if you have a nice black-on-white image, just use 128 as your binarization threshold.
  2. Create an array to store your histogram. This array's length will be the height of the image.
  3. For each pixel (x,y) in the binarized image, find the number of dark pixels above and below (x,y) at some radius R. That is, count the number of dark pixels from (x, y - R) to x (y + R), inclusive.
  4. If the number of dark pixels within a vertical radius R is equal or greater to R--that is, at least half the pixels are dark--then pixel (x,y) has sufficient vertical dark neighbors. Increment your bin count for row y.
  5. As you march along each row, track the leftmost and rightmost x-values for pixels with sufficient neighbors. As long as the width (right - left + 1) exceeds some minimum value, divide the total count of dark pixels by this width. This normalizes the count to ensure the short lines like the very last line of text are included.
  6. (Optional) Smooth the resulting histogram. I just used the mean over 3 rows.

The "vertical count" (step 3) eliminates horizontal strokes that happen to be located above or below the center line of text. A more sophisticated algorithm would just check directly above and below (x,y), but also to the upper left, upper right, lower left, and lower right.

With my rather crude implementation in C# I was able to process the image in less than 75 milliseconds. In C++, and with some basic optimization, I've little doubt the time could be cut down considerably.

This histogram method assumes the text is horizontal. Since the algorithm is reasonably fast, you may have enough time to calculate pixel count histograms at increments of every 5 degrees from the horizontal. The scan orientation with the greatest peak/valley differences would indicate the rotation.

I'm not familiar with GA terminology, but if what I've suggested is of some value I'm sure you can translate it into GA terms. In any case, I was interested in this problem anyway, so I might as well share.

EDIT: maybe for use GA, it's better to think in terms of "distance since previous dark pixel in X" (or along angle theta) and "distance since previous dark pixel in Y" (or along angle [theta - pi/2]). You might also check distance from white pixel to dark pixel in all radial directions (to find loops).

byte[,] arr = get2DArrayFromBitamp();   //source array from originalBitmap
int w = arr.GetLength(0);               //width of 2D array
int h = arr.GetLength(1);               //height of 2D array

//we can use a second 2D array of dark pixels that belong to vertical strokes
byte[,] bytes = new byte[w, h];         //dark pixels in vertical strokes


//initial morph
int r = 4;        //radius to check for dark pixels
int count = 0;    //number of dark pixels within radius

//fill the bytes[,] array only with pixels belonging to vertical strokes
for (int x = 0; x < w; x++)
{
    //for the first r rows, just set pixels to white
    for (int y = 0; y < r; y++)
    {
        bytes[x, y] = 255;
    }

    //assume pixels of value < 128 are dark pixels in text
    for (int y = r; y < h - r - 1; y++)
    {
        count = 0;

        //count the dark pixels above and below (x,y)
        //total range of check is 2r, from -r to +r
        for (int j = -r; j <= r; j++)
        {
            if (arr[x, y + j] < 128) count++;
        }

        //if half the pixels are dark, [x,y] is part of vertical stroke
        bytes[x, y] = count >= r ? (byte)0 : (byte)255;
    }

    //for the last r rows, just set pixels to white
    for (int y = h - r - 1; y < h; y++)
    {
        bytes[x, y] = 255;
    }
}

//count the number of valid dark pixels in each row
float max = 0;

float[] bins = new float[h];    //normalized "dark pixel strength" for all h rows
int left, right, width;         //leftmost and rightmost dark pixels in row
bool dark = false;              //tracking variable

for (int y = 0; y < h; y++)
{
    //initialize values at beginning of loop iteration
    left = 0;
    right = 0;
    width = 100;

    for (int x = 0; x < w; x++)
    {
        //use value of 128 as threshold between light and dark
        dark = bytes[x, y] < 128;  

        //increment bin if pixel is dark
        bins[y] += dark ? 1 : 0;    

        //update leftmost and rightmost dark pixels
        if (dark)
        {
            if (left == 0) left = x;    
            if (x > right) right = x;   
        }
    }

    width = right - left + 1;

    //for bins with few pixels, treat them as empty
    if (bins[y] < 10) bins[y] = 0;      

    //normalize value according to width
    //divide bin count by width (leftmost to rightmost)
    bins[y] /= width;

    //calculate the maximum bin value so that bins can be scaled when drawn
    if (bins[y] > max) max = bins[y];   
}

//calculated the smoothed value of each bin i by averaging bin i-1, i, and i+1
float[] smooth = new float[bins.Length];

smooth[0] = bins[0];
smooth[smooth.Length - 1] = bins[bins.Length - 1];

for (int i = 1; i < bins.Length - 1; i++)
{
    smooth[i] = (bins[i - 1] + bins[i] + bins[i + 1])/3;
}

//create a new bitmap based on the original bitmap, then draw bins on top
Bitmap bmp = new Bitmap(originalBitmap);

using (Graphics gr = Graphics.FromImage(bmp))
{
    for (int y = 0; y < bins.Length; y++)
    {
        //scale each bin so that it is drawn 200 pixels wide from the left edge
        float value = 200 * (float)smooth[y] / max;
        gr.DrawLine(Pens.Red, new PointF(0, y), new PointF(value, y)); 
    }
}

pictureBox1.Image = bmp;

这篇关于是否有一个有效的算法用于手写文本切分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆