将整个文本文件一次读入一个MATLAB变量 [英] Read a whole text file into a MATLAB variable at once

查看:1367
本文介绍了将整个文本文件一次读入一个MATLAB变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在一个步骤中将一个(相当大的)日志文件读入一个MATLAB字符串单元格。我曾经用过:

  s = {}; 
fid = fopen('test.txt');
tline = fgetl(fid);
ischar(tline)
s = [s; tline];
tline = fgetl(fid);
end

但这只是缓慢的。我发现,

  fid = fopen('test.txt'); 
x = fread(fid,'* char');

速度更快,但是我得到了一个 nx1 char矩阵, x 。我可以尝试将 x 转换为一个字符串单元格,但是后来我进入了字符编码的地狱;行分隔符似乎是\ n \ r,或者 ASCII 中的10和56已经看过了第一行的结尾),但是这两个角色往往不会跟随对方,甚至有时会单独出现。

有没有一个简单的快速如何在一个步骤中将ASCII文件读入字符串单元格,或将 x 转换为字符串单元格?



代码调用总计时间%时间
tline = lower(fgetl(fid));通过fgetl读取:

  903113 14.907 s 61.2%



通过fread读取:

 >> tic; for i = 1:length(files),fid = open(files(i).name); x = fread(fid,'* char * 1'); fclose(fid);结束; toc 

已用时间为0.208614秒。

我测试了预分配,并没有帮助:($ / b
$ b

  files = dir('。'); 
tic
for i = 1:length(files),
if files i).isdir || isempty(strfind(files(i).name,'.log')),continue; end
%#预赋值给某个大单元阵列
sizS = 50000;
s = cell(sizS,1);

lineCt = 1;
fid = fopen(files(i).name);
tline = fgetl(fid);
whilechar(tline)
s {lineCt} = tline;
lineCt = lineCt + 1;
%#如果需要,增加
如果lineCt> sizS
s = [s; cell(sizS,1)];
sizS = sizS + sizS;
end
tline = fgetl(fid);
end
%删除s
s(lineCt:end)= [];
end
toc

运行时间为12.741492秒。

大约比原来快10倍:

  s = textsca n(fid,'%s','Delimiter','\\\
','whitespace','','bufsize',files(i).bytes);

我必须将'whitespace'为了保留前导空格(我需要解析)和'bufsize'到文件的大小(默认4000抛出一个缓冲区溢出错误) 。

解决方案

你的第一个例子很慢的主要原因是 s 每次迭代都会增长。这意味着重新创建一个新的数组,复制旧的行,并添加新的行,这会增加不必要的开销。
$ b

为了加速, pre> %#预赋值给某个大单元格数组
s =单元格(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen('test.txt');
tline = fgetl(fid);
而ischar(tline)
s {lineCt} = tline;
lineCt = lineCt + 1;
%#如果需要,增加s
如果lineCt> sizS
s = [s; cell(10000,1)];
sizS = sizS + 10000;
end
tline = fgetl(fid);
end
%#删除s
s中的空项(lineCt:end)= [];






下面是一个预分配可以做的例子你

 >> tic,对于i = 1:100000,c {i} = i; end,toc 
经过的时间为10.513190秒。

>> d =单元格(100000,1);
>> tic,对于i = 1:100000,d {i} = i; end,toc
经过的时间为0.046177秒。
>>






编辑 p>

作为 fgetl 的替代方法,您可以使用 TEXTSCAN

  fid = fopen('test.txt'); 
s = textscan(fid,'%s','Delimiter','\\\
');
s = s {1};

读取 test.txt 作为一个字符串进入单元格数组 s


I would like to read a (fairly big) log file into a MATLAB string cell in one step. I have used the usual:

s={};
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
   s=[s;tline];
   tline = fgetl(fid);
end

but this is just slow. I have found that

fid = fopen('test.txt');
x=fread(fid,'*char');

is way faster, but I get a nx1 char matrix, x. I could try and convert x to a string cell, but then I get into char encoding hell; line delimiter seems to be \n\r, or 10 and 56 in ASCII (I've looked at the end of the first line), but those two characters often don't follow each other and even show up solo sometimes.

Is there an easy fast way to read an ASCII file into a string cell in one step, or convert x to a string cell?

Reading via fgetl:

Code                           Calls        Total Time      % Time
tline = lower(fgetl(fid));     903113       14.907 s        61.2%

Reading via fread:

>> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc

Elapsed time is 0.208614 seconds.

I have tested preallocation, and it does not help :(

files=dir('.');
tic
for i=1:length(files),   
    if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end
    %# preassign s to some large cell array
    sizS = 50000;
    s=cell(sizS,1);

    lineCt = 1;
    fid = fopen(files(i).name);
    tline = fgetl(fid);
    while ischar(tline)
       s{lineCt} = tline;
       lineCt = lineCt + 1;
       %# grow s if necessary
       if lineCt > sizS
           s = [s;cell(sizS,1)];
           sizS = sizS + sizS;
       end
       tline = fgetl(fid);
    end
    %# remove empty entries in s
    s(lineCt:end) = [];
end
toc

Elapsed time is 12.741492 seconds.

Roughly 10 times faster than the original:

s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes);

I had to set 'whitespace' to '' in order to keep the leading spaces (which I need for parsing), and 'bufsize' to the size of the file (the default 4000 threw a buffer overflow error).

解决方案

The main reason your first example is slow is that s grows in every iteration. This means recreating a new array, copying the old lines, and adding the new one, which adds unnecessary overhead.

To speed up things, you can preassign s

%# preassign s to some large cell array
s=cell(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
   s{lineCt} = tline;
   lineCt = lineCt + 1;
   %# grow s if necessary
   if lineCt > sizS
       s = [s;cell(10000,1)];
       sizS = sizS + 10000;
   end
   tline = fgetl(fid);
end
%# remove empty entries in s
s(lineCt:end) = [];


Here's a little example of what preallocation can do for you

>> tic,for i=1:100000,c{i}=i;end,toc
Elapsed time is 10.513190 seconds.

>> d = cell(100000,1);
>> tic,for i=1:100000,d{i}=i;end,toc
Elapsed time is 0.046177 seconds.
>> 


EDIT

As an alternative to fgetl, you could use TEXTSCAN

fid = fopen('test.txt');
s = textscan(fid,'%s','Delimiter','\n');
s = s{1};

This reads the lines of test.txt as string into the cell array s in one go.

这篇关于将整个文本文件一次读入一个MATLAB变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆