Matlab:如何用逗号作为小数点分隔符来读取数字? [英] Matlab: How to read in numbers with a comma as decimal separator?

查看:819
本文介绍了Matlab:如何用逗号作为小数点分隔符来读取数字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多(数十万)相当大(> 0.5MB)的文件,其中数据是数字的,但用逗号作为小数分隔符。
使用像 seds /,//的外部工具是不切实际的。
当分隔符是一个点,我只是使用 textscan(fid,'%f%f%f'),但我看不到更改十进制分隔器。
如何以有效的方式读取这样的文件?



从文件中取样:

  5,040000 18,040000 -0,000000000 

注意:有一个类似的问题R ,但我使用Matlab。

解决方案

使用测试脚本,我发现了一个小于1.5的因子。我的代码看起来像:

  tmco = {'NumHeaderLines',1,... 
'NumColumns', 5,...
'ConvString','%f',...
'InfoLevel',0,...
'ReadMode','block',...
'ReplaceChar',{',。'}};

A = txt2mat(filename,tmco {:});

注意不同的'ReplaceChar'值和'ReadMode''block'。



在我的(不是太新的)机器上得到了一个〜5MB的文件的结果如下:


  • txt2mat测试逗号平均。时间:0.63231

  • txt2mat test dot avg。时间:0.45715
  • textscan测试点平均值。时间:0.4787


    我的测试脚本的完整代码:

      %%生成示例文件

    fdot ='C:\temp\cDot.txt';
    fcom ='C:\temp\cCom.txt';

    c = 5; %#列
    r = 100000; %#行
    test = round(1e8 * rand(r,c))/ 1e6;
    tdot = sprintf([repmat('%f',1,c),'\r\\\
    '],test。'); %'
    tdot = ['header line',char([13,10]),tdot];

    tcom = strrep(tdot,'。',',');

    %写点文件
    fid = fopen(fdot,'w');
    fprintf(fid,'%s',tdot);
    fclose(fid);
    %写逗号文件
    fid = fopen(fcom,'w');
    fprintf(fid,'%s',tcom);
    fclose(fid);

    disp('-----')

    %%回读带有txt2mat和textscan的示例文件

    %txt2mat-options with comma十进制
    tmco = {'NumHeaderLines',1,...
    'NumColumns',5,...
    'ConvString','%f',...
    ' InfoLevel',0,...
    'ReadMode','block',...
    'ReplaceChar',{',。'}};

    %txt2mat-options带点十进制sep。
    tmdo = {'NumHeaderLines',1,...
    'NumColumns',5,...
    'ConvString','%f',...
    ' InfoLevel',0,...
    'ReadMode','block'};

    textscan-options
    tsco = {'HeaderLines',1,...
    'CollectOutput',true};


    A = txt2mat(fcom,tmco {:});
    B = txt2mat(fdot,tmdo {:});

    fid = fopen(fdot);
    C = textscan(fid,repmat('%f',1,c),tsco {:});
    fclose(fid);
    C = C {1};
    $ b $ disp(['txt2mat test comma(1 = Ok):'num2str(isequal(A,test))])
    disp(['txt2mat test dot(1 = Ok): 'num2str(isequal(B,test))])
    disp(''textscan test dot(1 = Ok):'num2str(isequal(C,test))])
    disp(' ---')

    %%速度测试

    numTest = 20;

    %A)txt2mat用逗号
    tic
    for k = 1:numTest
    A = txt2mat(fcom,tmco {:});
    清除A
    结束
    ttmc = toc;
    disp(['txt2mat test comma avg。time:'num2str(ttmc / numTest)])

    %B)txt2mat with dot
    tic
    for k = 1:numTest
    B = txt2mat(fdot,tmdo {:});
    清除B
    结束
    ttmd = toc;
    disp(['txt2mat test dot avg。time:'num2str(ttmd / numTest)])

    %C)textscan with dot
    tic
    for k = 1:numTest
    fid = fopen(fdot);
    C = textscan(fid,repmat('%f',1,c),tsco {:});
    fclose(fid);
    C = C {1};
    清除C
    结束
    ttsc = toc;
    disp(['textscan test dot avg。time:'num2str(ttsc / numTest)])
    disp('-----')


    I have a whole lot (hundreds of thousands) of rather large (>0.5MB) files, where data are numerical, but with a comma as decimal separator. It's impractical for me to use an external tool like sed "s/,/./g". When the separator is a dot, I just use textscan(fid, '%f%f%f'), but I see no option to change the decimal separator. How can I read such a file in an efficient manner?

    Sample line from a file:

    5,040000    18,040000   -0,030000
    

    Note: There is a similar question for R, but I use Matlab.

    解决方案

    With a test script I've found a factor of less than 1.5. My code would look like:

    tmco = {'NumHeaderLines', 1      , ...
            'NumColumns'    , 5      , ...
            'ConvString'    , '%f'   , ...
            'InfoLevel'     , 0      , ...
            'ReadMode'      , 'block', ...
            'ReplaceChar'   , {',.'} } ;
    
    A = txt2mat(filename, tmco{:});
    

    Note the different 'ReplaceChar' value and 'ReadMode' 'block'.

    I get the following results for a ~5MB file on my (not too new) machine:

    • txt2mat test comma avg. time: 0.63231
    • txt2mat test dot avg. time: 0.45715
    • textscan test dot avg. time: 0.4787

    The full code of my test script:

    %% generate sample files
    
    fdot = 'C:\temp\cDot.txt';
    fcom = 'C:\temp\cCom.txt';
    
    c = 5;       % # columns
    r = 100000;  % # rows
    test = round(1e8*rand(r,c))/1e6;
    tdot = sprintf([repmat('%f ', 1,c), '\r\n'], test.'); % '
    tdot = ['a header line', char([13,10]), tdot];
    
    tcom = strrep(tdot,'.',',');
    
    % write dot file
    fid = fopen(fdot,'w');
    fprintf(fid, '%s', tdot);
    fclose(fid);
    % write comma file
    fid = fopen(fcom,'w');
    fprintf(fid, '%s', tcom);
    fclose(fid);
    
    disp('-----')
    
    %% read back sample files with txt2mat and textscan
    
    % txt2mat-options with comma decimal sep.
    tmco = {'NumHeaderLines', 1      , ...
            'NumColumns'    , 5      , ...
            'ConvString'    , '%f'   , ...
            'InfoLevel'     , 0      , ...
            'ReadMode'      , 'block', ...
            'ReplaceChar'   , {',.'} } ;
    
    % txt2mat-options with dot decimal sep.
    tmdo = {'NumHeaderLines', 1      , ...
            'NumColumns'    , 5      , ...
            'ConvString'    , '%f'   , ...
            'InfoLevel'     , 0      , ...
            'ReadMode'      , 'block'} ;
    
    % textscan-options
    tsco = {'HeaderLines'   , 1      , ...
            'CollectOutput' , true   } ;
    
    
    A = txt2mat(fcom, tmco{:});
    B = txt2mat(fdot, tmdo{:});
    
    fid = fopen(fdot);
    C = textscan(fid, repmat('%f',1,c) , tsco{:} );
    fclose(fid);
    C = C{1};
    
    disp(['txt2mat  test comma (1=Ok): ' num2str(isequal(A,test)) ])
    disp(['txt2mat  test dot   (1=Ok): ' num2str(isequal(B,test)) ])
    disp(['textscan test dot   (1=Ok): ' num2str(isequal(C,test)) ])
    disp('-----')
    
    %% speed test
    
    numTest = 20;
    
    % A) txt2mat with comma
    tic
    for k = 1:numTest
        A = txt2mat(fcom, tmco{:});
        clear A
    end
    ttmc = toc;
    disp(['txt2mat  test comma avg. time: ' num2str(ttmc/numTest) ])
    
    % B) txt2mat with dot
    tic
    for k = 1:numTest
        B = txt2mat(fdot, tmdo{:});
        clear B
    end
    ttmd = toc;
    disp(['txt2mat  test dot   avg. time: ' num2str(ttmd/numTest) ])
    
    % C) textscan with dot
    tic
    for k = 1:numTest
        fid = fopen(fdot);
        C = textscan(fid, repmat('%f',1,c) , tsco{:} );
        fclose(fid);
        C = C{1};
        clear C
    end
    ttsc = toc;
    disp(['textscan test dot   avg. time: ' num2str(ttsc/numTest) ])
    disp('-----')
    

    这篇关于Matlab:如何用逗号作为小数点分隔符来读取数字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆