Matlab:如何用逗号作为小数点分隔符来读取数字? [英] Matlab: How to read in numbers with a comma as decimal separator?
问题描述
使用像
seds /,//的外部工具是不切实际的。
当分隔符是一个点,我只是使用 textscan(fid,'%f%f%f')
,但我看不到更改十进制分隔器。
如何以有效的方式读取这样的文件?
从文件中取样:
5,040000 18,040000 -0,000000000
注意:有一个类似的问题R ,但我使用Matlab。
解决方案使用测试脚本,我发现了一个小于1.5的因子。我的代码看起来像:
tmco = {'NumHeaderLines',1,...
'NumColumns', 5,...
'ConvString','%f',...
'InfoLevel',0,...
'ReadMode','block',...
'ReplaceChar',{',。'}};
A = txt2mat(filename,tmco {:});
注意不同的'ReplaceChar'值和'ReadMode''block'。
在我的(不是太新的)机器上得到了一个〜5MB的文件的结果如下:
- txt2mat测试逗号平均。时间:0.63231
- txt2mat test dot avg。时间:0.45715
- textscan测试点平均值。时间:0.4787
我的测试脚本的完整代码:
%%生成示例文件
fdot ='C:\temp\cDot.txt';
fcom ='C:\temp\cCom.txt';
c = 5; %#列
r = 100000; %#行
test = round(1e8 * rand(r,c))/ 1e6;
tdot = sprintf([repmat('%f',1,c),'\r\\\
'],test。'); %'
tdot = ['header line',char([13,10]),tdot];
tcom = strrep(tdot,'。',',');
%写点文件
fid = fopen(fdot,'w');
fprintf(fid,'%s',tdot);
fclose(fid);
%写逗号文件
fid = fopen(fcom,'w');
fprintf(fid,'%s',tcom);
fclose(fid);
disp('-----')
%%回读带有txt2mat和textscan的示例文件
%txt2mat-options with comma十进制
tmco = {'NumHeaderLines',1,...
'NumColumns',5,...
'ConvString','%f',...
' InfoLevel',0,...
'ReadMode','block',...
'ReplaceChar',{',。'}};
%txt2mat-options带点十进制sep。
tmdo = {'NumHeaderLines',1,...
'NumColumns',5,...
'ConvString','%f',...
' InfoLevel',0,...
'ReadMode','block'};
textscan-options
tsco = {'HeaderLines',1,...
'CollectOutput',true};
A = txt2mat(fcom,tmco {:});
B = txt2mat(fdot,tmdo {:});
fid = fopen(fdot);
C = textscan(fid,repmat('%f',1,c),tsco {:});
fclose(fid);
C = C {1};
$ b $ disp(['txt2mat test comma(1 = Ok):'num2str(isequal(A,test))])
disp(['txt2mat test dot(1 = Ok): 'num2str(isequal(B,test))])
disp(''textscan test dot(1 = Ok):'num2str(isequal(C,test))])
disp(' ---')
%%速度测试
numTest = 20;
%A)txt2mat用逗号
tic
for k = 1:numTest
A = txt2mat(fcom,tmco {:});
清除A
结束
ttmc = toc;
disp(['txt2mat test comma avg。time:'num2str(ttmc / numTest)])
%B)txt2mat with dot
tic
for k = 1:numTest
B = txt2mat(fdot,tmdo {:});
清除B
结束
ttmd = toc;
disp(['txt2mat test dot avg。time:'num2str(ttmd / numTest)])
%C)textscan with dot
tic
for k = 1:numTest
fid = fopen(fdot);
C = textscan(fid,repmat('%f',1,c),tsco {:});
fclose(fid);
C = C {1};
清除C
结束
ttsc = toc;
disp(['textscan test dot avg。time:'num2str(ttsc / numTest)])
disp('-----')
I have a whole lot (hundreds of thousands) of rather large (>0.5MB) files, where data are numerical, but with a comma as decimal separator.
It's impractical for me to use an external tool like sed "s/,/./g"
.
When the separator is a dot, I just use textscan(fid, '%f%f%f')
, but I see no option to change the decimal separator.
How can I read such a file in an efficient manner?
Sample line from a file:
5,040000 18,040000 -0,030000
Note: There is a similar question for R, but I use Matlab.
解决方案 With a test script I've found a factor of less than 1.5. My code would look like:
tmco = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block', ...
'ReplaceChar' , {',.'} } ;
A = txt2mat(filename, tmco{:});
Note the different 'ReplaceChar' value and 'ReadMode' 'block'.
I get the following results for a ~5MB file on my (not too new) machine:
- txt2mat test comma avg. time: 0.63231
- txt2mat test dot avg. time: 0.45715
- textscan test dot avg. time: 0.4787
The full code of my test script:
%% generate sample files
fdot = 'C:\temp\cDot.txt';
fcom = 'C:\temp\cCom.txt';
c = 5; % # columns
r = 100000; % # rows
test = round(1e8*rand(r,c))/1e6;
tdot = sprintf([repmat('%f ', 1,c), '\r\n'], test.'); % '
tdot = ['a header line', char([13,10]), tdot];
tcom = strrep(tdot,'.',',');
% write dot file
fid = fopen(fdot,'w');
fprintf(fid, '%s', tdot);
fclose(fid);
% write comma file
fid = fopen(fcom,'w');
fprintf(fid, '%s', tcom);
fclose(fid);
disp('-----')
%% read back sample files with txt2mat and textscan
% txt2mat-options with comma decimal sep.
tmco = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block', ...
'ReplaceChar' , {',.'} } ;
% txt2mat-options with dot decimal sep.
tmdo = {'NumHeaderLines', 1 , ...
'NumColumns' , 5 , ...
'ConvString' , '%f' , ...
'InfoLevel' , 0 , ...
'ReadMode' , 'block'} ;
% textscan-options
tsco = {'HeaderLines' , 1 , ...
'CollectOutput' , true } ;
A = txt2mat(fcom, tmco{:});
B = txt2mat(fdot, tmdo{:});
fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};
disp(['txt2mat test comma (1=Ok): ' num2str(isequal(A,test)) ])
disp(['txt2mat test dot (1=Ok): ' num2str(isequal(B,test)) ])
disp(['textscan test dot (1=Ok): ' num2str(isequal(C,test)) ])
disp('-----')
%% speed test
numTest = 20;
% A) txt2mat with comma
tic
for k = 1:numTest
A = txt2mat(fcom, tmco{:});
clear A
end
ttmc = toc;
disp(['txt2mat test comma avg. time: ' num2str(ttmc/numTest) ])
% B) txt2mat with dot
tic
for k = 1:numTest
B = txt2mat(fdot, tmdo{:});
clear B
end
ttmd = toc;
disp(['txt2mat test dot avg. time: ' num2str(ttmd/numTest) ])
% C) textscan with dot
tic
for k = 1:numTest
fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};
clear C
end
ttsc = toc;
disp(['textscan test dot avg. time: ' num2str(ttsc/numTest) ])
disp('-----')
这篇关于Matlab:如何用逗号作为小数点分隔符来读取数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!