自动保存分割结果 - Matlab 阿拉伯语 OCR [英] Saving Segmentation Result Automatically - Matlab Arabic OCR

查看:24
本文介绍了自动保存分割结果 - Matlab 阿拉伯语 OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

完整的分割代码:

% Preprocessing + Segmentation
 % // Original Code of Segmentation by Soumyadeep Sinha with several modification by Ana//
 % Saving each  single segmented character as one file 
    function [s] = seg (a)

myFolder = 'D:\1. Thesis FINISH!!!\Simulasi I\Segmented Images';
% a = imread ('adv1.png');

% Binarization %
level = graythresh (a);
b = im2bw (a, level);

% Complement %
c = imcomplement (b);

% Morphological Operation - Dilation %
se = strel ('square', 1);
% se = strel('rectangle', [1 2]);
r = imerode(c, se); 

i=padarray(r,[0 10]);
% i=padarray(c,[0 10]);

% Morphological Operation - Dilation %
% se = strel('rectangle', [1 2]);
% se = strel ('square', 1);
% i = imerode(r, se); 

%VP
verticalProjection = sum(i, 1);
set(gcf, 'Name', 'Segmentation Trial', 'NumberTitle', 'Off') 
subplot(2, 2, 1);imshow(i); 
subplot(2,2,3);
plot(verticalProjection, 'b-');
grid on;
t = verticalProjection;
t(t==0) = inf;
mayukh=min(t)
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > mayukh; 
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);

% Extract each region
y=1;
for k = 1 : length(startingColumns)
  % Get sub image of just one character...
  subImage = i(:, startingColumns(k):endingColumns(k)); 
   % im = subImage;
   s = subImage;
   % figure, imshow (s);

   % Normalization %
   [p] =  normalization (s); 

%  se = strel ('square', 1);
%  se = strel('rectangle', [2 1]);
%  im = imdilate(p, se); 

   % Morphological Operation - Thinning %
   im = bwmorph(p,'thin',Inf);

% Save %
[L,num] = bwlabel(im);
for z= 1 : num
    bw= ismember( L, z);
    % Construct filename for this particular image.
    baseFileName = sprintf('data.%d.png', y);
    y=y+1;
    % Prepend the folder to make the full file name.
    fullFileName = fullfile(myFolder, baseFileName);
    % Do the write to disk.
    imwrite(bw, fullFileName);
    subplot(2,2,4);
    pause(1);
    imshow(bw);
end
% y=y+1;
end;
s = (im);

  • 我已将图像加载到 matlab 工作区中,以便对单词图像进行字符分割处理.例如:data(1).png、data(2).png 等等.
  • 分割过程将为每个分割的字符提供多个图像作为输出.文字图像包含不同数量的字符,因此输出也会有所不同.例如,图像分割结果的输出 = data(1).png 变为 data(1)_1.png, data(1)_2.png, data(1)_3.png, and data(2).png 变为 data(2)_1.png、数据(2)_2.png 等.
  • 文字图片

    最近,我是手动完成的,但是数据集会更大,因此浪费时间对一张一张的图像进行分割.有什么建议,我应该怎么做才能让它变得更简单更有效?获取每个分割字符的结果(按顺序).

    Lately, I was did it manually, but the data set will be bigger and it so wasting time to run segmentation for one by one images. Is there any suggestion, how should I do to make it simple and more effective? Get the result for every segmented character (in sequence).

        % Save %
        [L,num] = bwlabel(im);
        for z= 1 : num
        bw= ismember( L, z);
        % Construct filename for this particular image.
        % Change basefilename for each word images %
        baseFileName = sprintf('data (1).%d.png', y);
        y=y+1;
        % Prepend the folder to make the full file name.
        fullFileName = fullfile(myFolder, baseFileName);
        % Do the write to disk.
        imwrite(bw, fullFileName);
        subplot(2,2,4);
        pause(1);
        imshow(bw);end
    

    使用此代码后,它创建了一个很好的结果,但是对于一个数据,下一个数据将替换最近的数据.所以,最近,对于每一个单词图像,我都一个一个地运行分割过程并改变这部分以获得合适的结果.将 sprintf('data (1).%d.png', y) 改为 sprintf('data (2).%d.png', y);等等.

    after using this code, it create a good result, but just for one data, the next data will replace the recent data. So, lately, for every word image, I run the segmentation process one by one and change this part to get an appropriate result. Change sprintf('data (1).%d.png', y) to become sprintf('data (2).%d.png', y); and so on.

    % Change basefilename for each word images %
        baseFileName = sprintf('data (1).%d.png', y);
        y=y+1;
    

    我希望的结果.我希望,我可以自动获得它.

    The result that I hope. I hope, I can get it automatically.

    任何帮助将不胜感激.

    推荐答案

    因为阿拉伯语是我的母语,所以我会在这方面为您提供帮助.首先让我明确:一些阿拉伯字母包含非连接区域.因此,使用图像处理技术是不够的.几年前,我设计了一个利用这个想法的系统:与同一字母相关的区域要么位于字母上方,要么位于字母下方.步骤:

    Since Arabic is my native language I will help you in this. Let me start by making clear : that some Arabic letters contains non-connected regions. Because of this using image processing techniques will not be enough. Few years ago I designed a system that takes advantage of the idea: regions related to the same letter are either above or below the letter. The steps :

    1. 将图像转换为二进制.
    2. 补充图像.文字为白色,背景为黑色
    3. 处理图像以将其分成多个行.这可以通过在垂直轴上执行投影来完成.山谷将与线之间的空间相关.
    4. 每一行都会拼命地进行:区域检测.彼此上方或下方的所有区域将被分割在一起

    如果文本是由人手写"的,问题会变得更加复杂.那么你需要一个机器学习解决方案来验证分割的区域.

    If the text is written by people " hand written" the problem will become more complex. Then you need a machine learning solution to verify the segmented regions.

    这篇关于自动保存分割结果 - Matlab 阿拉伯语 OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆