自动保存细分结果-Matlab阿拉伯语OCR [英] Saving Segmentation Result Automatically - Matlab Arabic OCR
问题描述
完整的细分代码:
% Preprocessing + Segmentation
% // Original Code of Segmentation by Soumyadeep Sinha with several modification by Ana//
% Saving each single segmented character as one file
function [s] = seg (a)
myFolder = 'D:\1. Thesis FINISH!!!\Simulasi I\Segmented Images';
% a = imread ('adv1.png');
% Binarization %
level = graythresh (a);
b = im2bw (a, level);
% Complement %
c = imcomplement (b);
% Morphological Operation - Dilation %
se = strel ('square', 1);
% se = strel('rectangle', [1 2]);
r = imerode(c, se);
i=padarray(r,[0 10]);
% i=padarray(c,[0 10]);
% Morphological Operation - Dilation %
% se = strel('rectangle', [1 2]);
% se = strel ('square', 1);
% i = imerode(r, se);
%VP
verticalProjection = sum(i, 1);
set(gcf, 'Name', 'Segmentation Trial', 'NumberTitle', 'Off')
subplot(2, 2, 1);imshow(i);
subplot(2,2,3);
plot(verticalProjection, 'b-');
grid on;
t = verticalProjection;
t(t==0) = inf;
mayukh=min(t)
% 0 where there is background, 1 where there are letters
letterLocations = verticalProjection > mayukh;
% Find Rising and falling edges
d = diff(letterLocations);
startingColumns = find(d>0);
endingColumns = find(d<0);
% Extract each region
y=1;
for k = 1 : length(startingColumns)
% Get sub image of just one character...
subImage = i(:, startingColumns(k):endingColumns(k));
% im = subImage;
s = subImage;
% figure, imshow (s);
% Normalization %
[p] = normalization (s);
% se = strel ('square', 1);
% se = strel('rectangle', [2 1]);
% im = imdilate(p, se);
% Morphological Operation - Thinning %
im = bwmorph(p,'thin',Inf);
% Save %
[L,num] = bwlabel(im);
for z= 1 : num
bw= ismember( L, z);
% Construct filename for this particular image.
baseFileName = sprintf('data.%d.png', y);
y=y+1;
% Prepend the folder to make the full file name.
fullFileName = fullfile(myFolder, baseFileName);
% Do the write to disk.
imwrite(bw, fullFileName);
subplot(2,2,4);
pause(1);
imshow(bw);
end
% y=y+1;
end;
s = (im);
- 我已将图像加载到matlab工作区中以对单词图像进行字符分割.例如:data(1).png,data(2).png等.
- 分段过程将为每个分段字符提供多个图像作为输出.文字图像包含各种字符,因此输出也将有所不同.例如,图像= data(1).png的分段结果输出变为data(1)_1.png,data(1)_2.png,data(1)_3.png和data(2).png成为data (2)_1.png,data(2)_2.png等.
文字图像
最近,我是手动完成的,但是数据集会更大,因此浪费了时间来对一张图像进行分割. 有什么建议,我应该怎么做才能使其更简单,更有效?获取每个分段字符的结果(按顺序).
Lately, I was did it manually, but the data set will be bigger and it so wasting time to run segmentation for one by one images. Is there any suggestion, how should I do to make it simple and more effective? Get the result for every segmented character (in sequence).
% Save %
[L,num] = bwlabel(im);
for z= 1 : num
bw= ismember( L, z);
% Construct filename for this particular image.
% Change basefilename for each word images %
baseFileName = sprintf('data (1).%d.png', y);
y=y+1;
% Prepend the folder to make the full file name.
fullFileName = fullfile(myFolder, baseFileName);
% Do the write to disk.
imwrite(bw, fullFileName);
subplot(2,2,4);
pause(1);
imshow(bw);end
使用此代码后,它会产生不错的结果,但是仅对于一个数据,下一个数据将替换最近的数据.因此,最近,对于每个单词图像,我都会一个一个地运行分割过程,并更改此部分以获得适当的结果.将sprintf('data(1).%d.png',y)更改为sprintf('data(2).%d.png',y);等等.
after using this code, it create a good result, but just for one data, the next data will replace the recent data. So, lately, for every word image, I run the segmentation process one by one and change this part to get an appropriate result. Change sprintf('data (1).%d.png', y) to become sprintf('data (2).%d.png', y); and so on.
% Change basefilename for each word images %
baseFileName = sprintf('data (1).%d.png', y);
y=y+1;
我希望得到的结果.希望我能自动获取.
The result that I hope. I hope, I can get it automatically.
任何帮助将不胜感激.
Any help will be very appreciated.
推荐答案
由于阿拉伯语是我的母语,因此我将为您提供帮助. 首先,我要明确指出:某些阿拉伯字母包含未连接的区域. 因此,使用图像处理技术是不够的. 几年前,我设计了一个利用这一思想的系统:与同一字母相关的区域位于该字母的上方或下方. 步骤:
Since Arabic is my native language I will help you in this. Let me start by making clear : that some Arabic letters contains non-connected regions. Because of this using image processing techniques will not be enough. Few years ago I designed a system that takes advantage of the idea: regions related to the same letter are either above or below the letter. The steps :
- 将图像转换为二进制.
- 对图像进行补充.文本为白色,背景为黑色
- 处理图像以将其分成多个 行. 这可以通过在垂直轴上投影来完成.谷与线之间的空间有关.
- 每一行都会拼命地前进: 区域检测. 彼此之上或之下的所有区域都将被分段
- Convert the image to binary.
- Complement the image. The texts is white and the background is black
- Process the image to divide it into multiple rows. This can be done by performing projection onto the vertical axis. Valleys will be related to spaces between lines.
- Each line will be proceeds desperately : Regions detection. All regions above or below each other will be segmented together
如果文字是由人们手写"的,那么问题将变得更加复杂. 然后,您需要一个机器学习解决方案来验证分割的区域.
If the text is written by people " hand written" the problem will become more complex. Then you need a machine learning solution to verify the segmented regions.
这篇关于自动保存细分结果-Matlab阿拉伯语OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!