最大大小未知时,MatLab内存分配 [英] MatLab memory allocation when max size is unknown
问题描述
所以这里是我尝试加速脚本:我尝试使用零函数在大块中分配内存,然后一旦块填充我分配另一个大块。以下是一些示例代码:
data = [];
count = 0;
ii = 1:num_filelines
如果mod(count,1000)== 0
data = [data;零(1000)];读取1000行后的%,再分配1000行
end
data(ii,:) = line_read(file); %line_read从'file'读取一行数据
end
不幸的是,这不工作,当我运行它,我得到一个错误说使用vertcat
错误矩阵连接的维度不一致。所以这里是我的问题:这种在大块中分配内存的方法实际上比增量动态分配更快,为什么上面的代码不运行?感谢您的帮助。
我建议做什么,如果你知道行数,可以猜到一个足够大的数字的可接受列,使用稀疏矩阵
。
%创建一个稀疏矩阵
mat = sparse(numRows,numCols)
稀疏矩阵不会存储所有零元素,它只存储非零索引的指针。这可以帮助节省大量的空间。它们被使用和访问与任何其他矩阵相同。只有当你从一开始就以矩阵格式确实需要它。
如果没有,你可以做一切作为一个单元格
。将$ code>单元格数组
分配给与文件中的行一样多的元素。
data = cell(1,numLines);
%从行
获取矩阵i = 1:numLines
%从行
获取矩阵数据{i} = lineData;
end
data = cell2mat(data);
此方法将所有内容都放入单元格数组,可以动态存储em >,然后转换为常规矩阵。
加号
如果你正在做稀疏矩阵方法,一旦你完成了矩阵,因为你的矩阵可能会大于必要的,你可以轻松地修剪它,然后将其转换成一个常规的矩阵。
[val,〜] = max(sum(mat〜= 0,2));
mat(:,val:size(mat,2))= [];
mat = full(mat); %仅当您真正需要完整的矩阵
这将删除任何不必要的列,然后转换它到包含0个元素的完整矩阵。我不会建议将其投放到一个完整的矩阵,因为这需要更多的空间,但如果你真的需要它,使用它。
更新
要轻松获取文件中的行数,请使用MATLAB的perl解释器
创建一个名为 countlines.pl
的文件,并粘贴在下面的两行中
while(<>){};
print $。,\\\
;
然后,您可以按照以下方式在您的文件上运行此脚本:
numLines = str2double(perl('countlines.pl','data.csv'));
问题解决。
从MATLAB论坛线程这里
记住,总是最好预先分配所有的手,因为在技术上,当做shai的方法,你正在重新分配大量很多,特别是如果它是一个大文件。
I am trying to speed up a script that I have written in Matlab that dynamically allocates memory to a matrix (basicallly reads a line of data from a file and writes it into a matrix, then reads another line and allocates more memory for a larger matrix to store the next line). The reason I did this instead of preallocating memory using zeroes() or something was that I don't know the exact size the matrix needs to be to hold all of the data. I also don't know the maximum size of the matrix, so I can't just preallocate a max size and then get rid of memory that I didn't use. This was fine for small amounts of data, but now I need to scale my script up to read many millions of data points and this implementation of dynamic allocation is just much too slow.
So here is my attempt to speed up the script: I tried to allocate memory in large blocks using the zeroes function, then once the block is filled I allocate another large block. Here is some sample code:
data = [];
count = 0;
for ii = 1:num_filelines
if mod(count, 1000) == 0
data = [data; zeroes(1000)]; %after 1000 lines are read, allocate another 1000 line
end
data(ii, :) = line_read(file); %line_read reads a line of data from 'file'
end
Unfortunately this doesn't work, when I run it I get an error saying "Error using vertcat Dimensions of matrices being concatenated are not consistent."
So here is my question: Is this method of allocating memory in large blocks actually any faster than incremental dynamic allocation, and also why does the above code not run? Thanks for the help.
What I recommend doing, if you know the number of lines and can just guess a large enough number of acceptable columns, use a sparse matrix
.
% create a sparse matrix
mat = sparse(numRows,numCols)
A sparse matrix will not store all of the zero elements, it only stores pointers to indices that are non-zero. This can help save a lot of space. They are used and accessed the same as any other matrix. That is only if you really need it in a matrix format from the beginning.
If not, you can just do everything as a cell
. Preallocate a cell array
with as many elements as lines in your file.
data = cell(1,numLines);
% get matrix from line
for i = 1:numLines
% get matrix from line
data{i} = lineData;
end
data = cell2mat(data);
This method will put everything into a cell array, which can store "dynamically" and then be converted to a regular matrix.
Addition
If you are doing the sparse matrix method, to trim up your matrix once you are done, because your matrix will likely be larger than necessary, you can trim this down easily, and then cast it to a regular matrix.
[val,~] = max(sum(mat ~= 0,2));
mat(:,val:size(mat,2)) = [];
mat = full(mat); % use this only if you really need the full matrix
This will remove any unnecessary columns and then cast it to a full matrix that includes the 0 elements. I would not recommend casting it to a full matrix, as this requires a ton more space, but if you truly need it, use it.
UPDATE
To get the number of lines in a file easily, use MATLAB's perl interpretter
create a file called countlines.pl
and paste in the two lines below
while (<>) {};
print $.,"\n";
Then you can run this script on your file as follows
numLines = str2double(perl('countlines.pl','data.csv'));
Problem solved.
From MATLAB forum thread here
remember it is always best to preallocate everything before hand, because technically when doing shai's method you are reallocating large amounts a lot, especially if it is a large file.
这篇关于最大大小未知时,MatLab内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!