Matlab:如何用逗号作为小数点分隔符读取数字? [英] Matlab: How to read in numbers with a comma as decimal separator?

查看:31
本文介绍了Matlab:如何用逗号作为小数点分隔符读取数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多(数十万)相当大(>0.5MB)的文件,其中数据是数字,但用逗号作为小数点分隔符.使用像 sed "s/,/./g" 这样的外部工具对我来说是不切实际的.当分隔符是点时,我只使用 textscan(fid, '%f%f%f'),但我看不到更改小数分隔符的选项.我怎样才能有效地读取这样的文件?

I have a whole lot (hundreds of thousands) of rather large (>0.5MB) files, where data are numerical, but with a comma as decimal separator. It's impractical for me to use an external tool like sed "s/,/./g". When the separator is a dot, I just use textscan(fid, '%f%f%f'), but I see no option to change the decimal separator. How can I read such a file in an efficient manner?

来自文件的示例行:

5,040000    18,040000   -0,030000

注意:有一个类似的问题对于 R,但我使用 Matlab.

Note: There is a similar question for R, but I use Matlab.

推荐答案

通过测试脚本,我发现因子小于 1.5.我的代码看起来像:

With a test script I've found a factor of less than 1.5. My code would look like:

tmco = {'NumHeaderLines', 1      , ...
        'NumColumns'    , 5      , ...
        'ConvString'    , '%f'   , ...
        'InfoLevel'     , 0      , ...
        'ReadMode'      , 'block', ...
        'ReplaceChar'   , {',.'} } ;

A = txt2mat(filename, tmco{:});

注意不同的ReplaceChar"值和ReadMode"block".

Note the different 'ReplaceChar' value and 'ReadMode' 'block'.

我在我的(不是太新)机器上得到了一个大约 5MB 的文件的以下结果:

I get the following results for a ~5MB file on my (not too new) machine:

  • txt2mat 测试逗号平均值.时间:0.63231
  • txt2mat 测试点平均.时间:0.45715
  • textscan 测试点平均.时间:0.4787

我的测试脚本的完整代码:

The full code of my test script:

%% generate sample files

fdot = 'C:	empcDot.txt';
fcom = 'C:	empcCom.txt';

c = 5;       % # columns
r = 100000;  % # rows
test = round(1e8*rand(r,c))/1e6;
tdot = sprintf([repmat('%f ', 1,c), '
'], test.'); % '
tdot = ['a header line', char([13,10]), tdot];

tcom = strrep(tdot,'.',',');

% write dot file
fid = fopen(fdot,'w');
fprintf(fid, '%s', tdot);
fclose(fid);
% write comma file
fid = fopen(fcom,'w');
fprintf(fid, '%s', tcom);
fclose(fid);

disp('-----')

%% read back sample files with txt2mat and textscan

% txt2mat-options with comma decimal sep.
tmco = {'NumHeaderLines', 1      , ...
        'NumColumns'    , 5      , ...
        'ConvString'    , '%f'   , ...
        'InfoLevel'     , 0      , ...
        'ReadMode'      , 'block', ...
        'ReplaceChar'   , {',.'} } ;

% txt2mat-options with dot decimal sep.
tmdo = {'NumHeaderLines', 1      , ...
        'NumColumns'    , 5      , ...
        'ConvString'    , '%f'   , ...
        'InfoLevel'     , 0      , ...
        'ReadMode'      , 'block'} ;

% textscan-options
tsco = {'HeaderLines'   , 1      , ...
        'CollectOutput' , true   } ;


A = txt2mat(fcom, tmco{:});
B = txt2mat(fdot, tmdo{:});

fid = fopen(fdot);
C = textscan(fid, repmat('%f',1,c) , tsco{:} );
fclose(fid);
C = C{1};

disp(['txt2mat  test comma (1=Ok): ' num2str(isequal(A,test)) ])
disp(['txt2mat  test dot   (1=Ok): ' num2str(isequal(B,test)) ])
disp(['textscan test dot   (1=Ok): ' num2str(isequal(C,test)) ])
disp('-----')

%% speed test

numTest = 20;

% A) txt2mat with comma
tic
for k = 1:numTest
    A = txt2mat(fcom, tmco{:});
    clear A
end
ttmc = toc;
disp(['txt2mat  test comma avg. time: ' num2str(ttmc/numTest) ])

% B) txt2mat with dot
tic
for k = 1:numTest
    B = txt2mat(fdot, tmdo{:});
    clear B
end
ttmd = toc;
disp(['txt2mat  test dot   avg. time: ' num2str(ttmd/numTest) ])

% C) textscan with dot
tic
for k = 1:numTest
    fid = fopen(fdot);
    C = textscan(fid, repmat('%f',1,c) , tsco{:} );
    fclose(fid);
    C = C{1};
    clear C
end
ttsc = toc;
disp(['textscan test dot   avg. time: ' num2str(ttsc/numTest) ])
disp('-----')

这篇关于Matlab:如何用逗号作为小数点分隔符读取数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆