自动保存细分结果-Matlab阿拉伯语OCR [英] Saving Segmentation Result Automatically - Matlab Arabic OCR

查看:86
本文介绍了自动保存细分结果-Matlab阿拉伯语OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

完整的细分代码:

% Preprocessing + Segmentation
 % // Original Code of Segmentation by Soumyadeep Sinha with several modification by Ana//
 % Saving each  single segmented character as one file 
    function [s] = seg (a)

myFolder = 'D:\1. Thesis FINISH!!!\Simulasi I\Segmented Images';
% a = imread ('adv1.png');

% Binarization %
level = graythresh (a);
b = im2bw (a, level);

% Complement %
c = imcomplement (b);

% Morphological Operation - Dilation %
se = strel ('square', 1);
% se = strel('rectangle', [1 2]);
r = imerode(c, se); 

i=padarray(r,[0 10]);
% i=padarray(c,[0 10]);

% Morphological Operation - Dilation %
% se = strel('rectangle', [1 2]);
% se = strel ('square', 1);
% i = imerode(r, se); 

%VP
verticalProjection = sum(i, 1);
set(gcf, 'Name', 'Segmentation Trial', 'NumberTitle', 'Off') 
subplot(2, 2, 1);imshow(i); 
subplot(2,2,3);
plot(verticalProjection, 'b-');
grid on;
t = verticalProjection;
t(t==0) = inf;
mayukh=min(t)
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > mayukh; 
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);

% Extract each region
y=1;
for k = 1 : length(startingColumns)
  % Get sub image of just one character...
  subImage = i(:, startingColumns(k):endingColumns(k)); 
   % im = subImage;
   s = subImage;
   % figure, imshow (s);

   % Normalization %
   [p] =  normalization (s); 

%  se = strel ('square', 1);
%  se = strel('rectangle', [2 1]);
%  im = imdilate(p, se); 

   % Morphological Operation - Thinning %
   im = bwmorph(p,'thin',Inf);

% Save %
[L,num] = bwlabel(im);
for z= 1 : num
    bw= ismember( L, z);
    % Construct filename for this particular image.
    baseFileName = sprintf('data.%d.png', y);
    y=y+1;
    % Prepend the folder to make the full file name.
    fullFileName = fullfile(myFolder, baseFileName);
    % Do the write to disk.
    imwrite(bw, fullFileName);
    subplot(2,2,4);
    pause(1);
    imshow(bw);
end
% y=y+1;
end;
s = (im);

  • 我已将图像加载到matlab工作区中以对单词图像进行字符分割.例如:data(1).png,data(2).png等.
  • 分段过程将为每个分段字符提供多个图像作为输出.文字图像包含各种字符,因此输出也将有所不同.例如,图像= data(1).png的分段结果输出变为data(1)_1.png,data(1)_2.png,data(1)_3.png和data(2).png成为data (2)_1.png,data(2)_2.png等.
  • 文字图像

    最近,我是手动完成的,但是数据集会更大,因此浪费了时间来对一张图像进行分割. 有什么建议,我应该怎么做才能使其更简单,更有效?获取每个分段字符的结果(按顺序).

    Lately, I was did it manually, but the data set will be bigger and it so wasting time to run segmentation for one by one images. Is there any suggestion, how should I do to make it simple and more effective? Get the result for every segmented character (in sequence).

        % Save %
        [L,num] = bwlabel(im);
        for z= 1 : num
        bw= ismember( L, z);
        % Construct filename for this particular image.
        % Change basefilename for each word images %
        baseFileName = sprintf('data (1).%d.png', y);
        y=y+1;
        % Prepend the folder to make the full file name.
        fullFileName = fullfile(myFolder, baseFileName);
        % Do the write to disk.
        imwrite(bw, fullFileName);
        subplot(2,2,4);
        pause(1);
        imshow(bw);end
    

    使用此代码后,它会产生不错的结果,但是仅对于一个数据,下一个数据将替换最近的数据.因此,最近,对于每个单词图像,我都会一个一个地运行分割过程,并更改此部分以获得适当的结果.将sprintf('data(1).%d.png',y)更改为sprintf('data(2).%d.png',y);等等.

    after using this code, it create a good result, but just for one data, the next data will replace the recent data. So, lately, for every word image, I run the segmentation process one by one and change this part to get an appropriate result. Change sprintf('data (1).%d.png', y) to become sprintf('data (2).%d.png', y); and so on.

    % Change basefilename for each word images %
        baseFileName = sprintf('data (1).%d.png', y);
        y=y+1;
    

    我希望得到的结果.希望我能自动获取.

    The result that I hope. I hope, I can get it automatically.

    任何帮助将不胜感激.

    Any help will be very appreciated.

    推荐答案

    由于阿拉伯语是我的母语,因此我将为您提供帮助. 首先,我要明确指出:某些阿拉伯字母包含未连接的区域. 因此,使用图像处理技术是不够的. 几年前,我设计了一个利用这一思想的系统:与同一字母相关的区域位于该字母的上方或下方. 步骤:

    Since Arabic is my native language I will help you in this. Let me start by making clear : that some Arabic letters contains non-connected regions. Because of this using image processing techniques will not be enough. Few years ago I designed a system that takes advantage of the idea: regions related to the same letter are either above or below the letter. The steps :

    1. 将图像转换为二进制.
    2. 对图像进行补充.文本为白色,背景为黑色
    3. 处理图像以将其分成多个 行. 这可以通过在垂直轴上投影来完成.谷与线之间的空间有关.
    4. 每一行都会拼命地前进: 区域检测. 彼此之上或之下的所有区域都将被分段
    1. Convert the image to binary.
    2. Complement the image. The texts is white and the background is black
    3. Process the image to divide it into multiple rows. This can be done by performing projection onto the vertical axis. Valleys will be related to spaces between lines.
    4. Each line will be proceeds desperately : Regions detection. All regions above or below each other will be segmented together

    如果文字是由人们手写"的,那么问题将变得更加复杂. 然后,您需要一个机器学习解决方案来验证分割的区域.

    If the text is written by people " hand written" the problem will become more complex. Then you need a machine learning solution to verify the segmented regions.

    这篇关于自动保存细分结果-Matlab阿拉伯语OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆