将整个文本文件一次读入一个MATLAB变量 [英] Read a whole text file into a MATLAB variable at once
问题描述
我想在一个步骤中将一个(相当大的)日志文件读入一个MATLAB字符串单元格。我曾经用过:
s = {};
fid = fopen('test.txt');
tline = fgetl(fid);
ischar(tline)
s = [s; tline];
tline = fgetl(fid);
end
但这只是缓慢的。我发现,
fid = fopen('test.txt');
x = fread(fid,'* char');
速度更快,但是我得到了一个 nx1
char矩阵, x
。我可以尝试将 x
转换为一个字符串单元格,但是后来我进入了字符编码的地狱;行分隔符似乎是\ n \ r,或者 ASCII 中的10和56已经看过了第一行的结尾),但是这两个角色往往不会跟随对方,甚至有时会单独出现。
有没有一个简单的快速如何在一个步骤中将ASCII文件读入字符串单元格,或将 x
转换为字符串单元格?
代码调用总计时间%时间
tline = lower(fgetl(fid));通过fgetl读取:
903113 14.907 s 61.2%
通过fread读取:
>> tic; for i = 1:length(files),fid = open(files(i).name); x = fread(fid,'* char * 1'); fclose(fid);结束; toc
已用时间为0.208614秒。
我测试了预分配,并没有帮助:($ / b
$ b
files = dir('。');
tic
for i = 1:length(files),
if files i).isdir || isempty(strfind(files(i).name,'.log')),continue; end
%#预赋值给某个大单元阵列
sizS = 50000;
s = cell(sizS,1);
lineCt = 1;
fid = fopen(files(i).name);
tline = fgetl(fid);
whilechar(tline)
s {lineCt} = tline;
lineCt = lineCt + 1;
%#如果需要,增加
如果lineCt> sizS
s = [s; cell(sizS,1)];
sizS = sizS + sizS;
end
tline = fgetl(fid);
end
%删除s
s(lineCt:end)= [];
end
toc
运行时间为12.741492秒。
大约比原来快10倍:
s = textsca n(fid,'%s','Delimiter','\\\
','whitespace','','bufsize',files(i).bytes);
我必须将'whitespace'
为了保留前导空格(我需要解析)和'bufsize'到文件的大小(默认4000抛出一个缓冲区溢出错误) 。
你的第一个例子很慢的主要原因是 s
每次迭代都会增长。这意味着重新创建一个新的数组,复制旧的行,并添加新的行,这会增加不必要的开销。
$ b
为了加速, pre> %#预赋值给某个大单元格数组
s =单元格(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen('test.txt');
tline = fgetl(fid);
而ischar(tline)
s {lineCt} = tline;
lineCt = lineCt + 1;
%#如果需要,增加s
如果lineCt> sizS
s = [s; cell(10000,1)];
sizS = sizS + 10000;
end
tline = fgetl(fid);
end
%#删除s
s中的空项(lineCt:end)= [];
下面是一个预分配可以做的例子你
>> tic,对于i = 1:100000,c {i} = i; end,toc
经过的时间为10.513190秒。
>> d =单元格(100000,1);
>> tic,对于i = 1:100000,d {i} = i; end,toc
经过的时间为0.046177秒。
>>
编辑 p>
作为 fgetl
的替代方法,您可以使用 TEXTSCAN
fid = fopen('test.txt');
s = textscan(fid,'%s','Delimiter','\\\
');
s = s {1};
读取 test.txt
作为一个字符串进入单元格数组 s
。
I would like to read a (fairly big) log file into a MATLAB string cell in one step. I have used the usual:
s={};
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
s=[s;tline];
tline = fgetl(fid);
end
but this is just slow. I have found that
fid = fopen('test.txt');
x=fread(fid,'*char');
is way faster, but I get a nx1
char matrix, x
. I could try and convert x
to a string cell, but then I get into char encoding hell; line delimiter seems to be \n\r, or 10 and 56 in ASCII (I've looked at the end of the first line), but those two characters often don't follow each other and even show up solo sometimes.
Is there an easy fast way to read an ASCII file into a string cell in one step, or convert x
to a string cell?
Reading via fgetl:
Code Calls Total Time % Time
tline = lower(fgetl(fid)); 903113 14.907 s 61.2%
Reading via fread:
>> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc
Elapsed time is 0.208614 seconds.
I have tested preallocation, and it does not help :(
files=dir('.');
tic
for i=1:length(files),
if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end
%# preassign s to some large cell array
sizS = 50000;
s=cell(sizS,1);
lineCt = 1;
fid = fopen(files(i).name);
tline = fgetl(fid);
while ischar(tline)
s{lineCt} = tline;
lineCt = lineCt + 1;
%# grow s if necessary
if lineCt > sizS
s = [s;cell(sizS,1)];
sizS = sizS + sizS;
end
tline = fgetl(fid);
end
%# remove empty entries in s
s(lineCt:end) = [];
end
toc
Elapsed time is 12.741492 seconds.
Roughly 10 times faster than the original:
s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes);
I had to set 'whitespace'
to ''
in order to keep the leading spaces (which I need for parsing), and 'bufsize' to the size of the file (the default 4000 threw a buffer overflow error).
The main reason your first example is slow is that s
grows in every iteration. This means recreating a new array, copying the old lines, and adding the new one, which adds unnecessary overhead.
To speed up things, you can preassign s
%# preassign s to some large cell array
s=cell(10000,1);
sizS = 10000;
lineCt = 1;
fid = fopen('test.txt');
tline = fgetl(fid);
while ischar(tline)
s{lineCt} = tline;
lineCt = lineCt + 1;
%# grow s if necessary
if lineCt > sizS
s = [s;cell(10000,1)];
sizS = sizS + 10000;
end
tline = fgetl(fid);
end
%# remove empty entries in s
s(lineCt:end) = [];
Here's a little example of what preallocation can do for you
>> tic,for i=1:100000,c{i}=i;end,toc
Elapsed time is 10.513190 seconds.
>> d = cell(100000,1);
>> tic,for i=1:100000,d{i}=i;end,toc
Elapsed time is 0.046177 seconds.
>>
EDIT
As an alternative to fgetl
, you could use TEXTSCAN
fid = fopen('test.txt');
s = textscan(fid,'%s','Delimiter','\n');
s = s{1};
This reads the lines of test.txt
as string into the cell array s
in one go.
这篇关于将整个文本文件一次读入一个MATLAB变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!