最大大小未知时,MatLab内存分配 [英] MatLab memory allocation when max size is unknown

查看:297
本文介绍了最大大小未知时,MatLab内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在加速一个我在Matlab中编写的脚本,动态地将内存分配给一个矩阵(基本上从文件读取一行数据,并将其写入一个矩阵,然后读取另一行并为其分配更多的内存一个较大的矩阵来存储下一行)。我做这个而不是使用zeroes()或事情预先分配内存的原因是我不知道矩阵需要保存所有数据的确切大小。我也不知道矩阵的最大大小,所以我不能只是预分配最大大小,然后摆脱我没有使用的内存。这对于少量的数据是好的,但是现在我需要扩展我的脚本以读取数百万的数据点,这种动态分配的实现太慢了。



所以这里是我尝试加速脚本:我尝试使用零函数在大块中分配内存,然后一旦块填充我分配另一个大块。以下是一些示例代码:

  data = []; 
count = 0;

ii = 1:num_filelines
如果mod(count,1000)== 0
data = [data;零(1000)];读取1000行后的%,再分配1000行
end
data(ii,:) = line_read(file); %line_read从'file'读取一行数据
end

不幸的是,这不工作,当我运行它,我得到一个错误说使用vertcat
错误矩阵连接的维度不一致。所以这里是我的问题:这种在大块中分配内存的方法实际上比增量动态分配更快,为什么上面的代码不运行?感谢您的帮助。

解决方案

我建议做什么,如果你知道行数,可以猜到一个足够大的数字的可接受列,使用稀疏矩阵

 %创建一个稀疏矩阵
mat = sparse(numRows,numCols)

稀疏矩阵不会存储所有零元素,它只存储非零索引的指针。这可以帮助节省大量的空间。它们被使用和访问与任何其他矩阵相同。只有当你从一开始就以矩阵格式确实需要它。



如果没有,你可以做一切作为一个单元格。将$ code>单元格数组分配给与文件中的行一样多的元素。

  data = cell(1,numLines); 
%从行
获取矩阵i = 1:numLines
%从行
获取矩阵数据{i} = lineData;
end
data = cell2mat(data);

此方法将所有内容都放入单元格数组,可以动态存储em >,然后转换为常规矩阵。



加号



如果你正在做稀疏矩阵方法,一旦你完成了矩阵,因为你的矩阵可能会大于必要的,你可以轻松地修剪它,然后将其转换成一个常规的矩阵。

  [val,〜] = max(sum(mat〜= 0,2)); 
mat(:,val:size(mat,2))= [];
mat = full(mat); %仅当您真正需要完整的矩阵

这将删除任何不必要的列,然后转换它到包含0个元素的完整矩阵。我不会建议将其投放到一个完整的矩阵,因为这需要更多的空间,但如果你真的需要它,使用它。



更新



要轻松获取文件中的行数,请使用MATLAB的perl解释器



创建一个名为 countlines.pl 的文件,并粘贴在下面的两行中

  while(<>){}; 
print $。,\\\
;

然后,您可以按照以下方式在您的文件上运行此脚本:

  numLines = str2double(perl('countlines.pl','data.csv')); 

问题解决。



从MATLAB论坛线程这里



记住,总是最好预先分配所有的手,因为在技术上,当做shai的方法,你正在重新分配大量很多,特别是如果它是一个大文件。


I am trying to speed up a script that I have written in Matlab that dynamically allocates memory to a matrix (basicallly reads a line of data from a file and writes it into a matrix, then reads another line and allocates more memory for a larger matrix to store the next line). The reason I did this instead of preallocating memory using zeroes() or something was that I don't know the exact size the matrix needs to be to hold all of the data. I also don't know the maximum size of the matrix, so I can't just preallocate a max size and then get rid of memory that I didn't use. This was fine for small amounts of data, but now I need to scale my script up to read many millions of data points and this implementation of dynamic allocation is just much too slow.

So here is my attempt to speed up the script: I tried to allocate memory in large blocks using the zeroes function, then once the block is filled I allocate another large block. Here is some sample code:

data = [];   
count = 0;

for ii = 1:num_filelines    
   if mod(count, 1000) == 0  
       data = [data; zeroes(1000)];  %after 1000 lines are read, allocate another 1000 line
   end  
   data(ii, :) = line_read(file);  %line_read reads a line of data from 'file'
end

Unfortunately this doesn't work, when I run it I get an error saying "Error using vertcat Dimensions of matrices being concatenated are not consistent."

So here is my question: Is this method of allocating memory in large blocks actually any faster than incremental dynamic allocation, and also why does the above code not run? Thanks for the help.

解决方案

What I recommend doing, if you know the number of lines and can just guess a large enough number of acceptable columns, use a sparse matrix.

% create a sparse matrix
mat = sparse(numRows,numCols)

A sparse matrix will not store all of the zero elements, it only stores pointers to indices that are non-zero. This can help save a lot of space. They are used and accessed the same as any other matrix. That is only if you really need it in a matrix format from the beginning.

If not, you can just do everything as a cell. Preallocate a cell array with as many elements as lines in your file.

data = cell(1,numLines);
% get matrix from line
for i = 1:numLines
    % get matrix from line
    data{i} = lineData;
end
data = cell2mat(data);

This method will put everything into a cell array, which can store "dynamically" and then be converted to a regular matrix.

Addition

If you are doing the sparse matrix method, to trim up your matrix once you are done, because your matrix will likely be larger than necessary, you can trim this down easily, and then cast it to a regular matrix.

[val,~] = max(sum(mat ~= 0,2));
mat(:,val:size(mat,2)) = [];
mat = full(mat); % use this only if you really need the full matrix

This will remove any unnecessary columns and then cast it to a full matrix that includes the 0 elements. I would not recommend casting it to a full matrix, as this requires a ton more space, but if you truly need it, use it.

UPDATE

To get the number of lines in a file easily, use MATLAB's perl interpretter

create a file called countlines.pl and paste in the two lines below

while (<>) {};
print $.,"\n";

Then you can run this script on your file as follows

numLines = str2double(perl('countlines.pl','data.csv'));

Problem solved.

From MATLAB forum thread here

remember it is always best to preallocate everything before hand, because technically when doing shai's method you are reallocating large amounts a lot, especially if it is a large file.

这篇关于最大大小未知时,MatLab内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆