Matlab中单元格的预分配 [英] Preallocation of cell array in matlab

查看:114
本文介绍了Matlab中单元格的预分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是理解行为的一个问题,而不是特定的问题.

This is more a question to understand a behavior rather than a specific problem.

Mathworks a>指出数字是连续存储的,这使得预分配很重要.单元格数组不是这种情况.

Mathworks states that numerical are stored continuous which makes preallocation important. This is not the case for cell arrays.

它们是否类似于C ++中的向量或指针数组?

Are they something similar than vector or array of pointers in C++?

这意味着预分配不是那么重要,因为指针的大小是双精度大小的一半(根据whos-但在存储mxArray数据类型的地方肯定会有开销).

This would mean that prealocation is not so important since a pointer is half the size of a double (according to whos - but there surely is overhead somewhere to store the datatype of the mxArray).

运行此代码:

clear all
n = 1e6;

tic
A = [];
for i=1:n
    A(end + 1) = 1;
end
fprintf('Numerical without preallocation %f s\n',toc)

clear A

tic
A = zeros(1,n);
for i=1:n
    A(i) = 1;
end
fprintf('Numerical with preallocation %f s\n',toc)

clear A
tic
A = cell(0);
for i=1:n
    A{end + 1} = 1;
end
fprintf('Cell without preallocation %f s\n',toc)

tic
A = cell(1,n);
for i=1:n
    A{i} = 1;
end
fprintf('Cell with preallocation %f s\n',toc)

返回: 无预分配的数值0.429240 s 数值预分配0.025236 s 没有预分配的单元4.960297 s 预分配电池0.554257 s

returns: Numerical without preallocation 0.429240 s Numerical with preallocation 0.025236 s Cell without preallocation 4.960297 s Cell with preallocation 0.554257 s

这些数值不足为奇.但是确实让我感到惊讶,因为只有指针的容器而不是数据本身才需要重新分配.哪个应该(由于指针小于两倍)导致差异小于.2s.这些开销来自哪里?

There is no surprise for the numerical values. But the did surprise me since only the container of the pointers and not the data itself would need reallocation. Which should (since the pointer is smaller than a double) lead to difference of <.2s. Where does this overhead come from?

一个相关的问题是,如果我想在Matlab中为异构数据制作一个数据容器(由于开始时最终大小未知,因此无法进行预分配).我认为句柄类不好,因为句柄类也有巨大的开销.

A related question would be, if I would like to make a data container for heterogeneous data in Matlab (preallocation is not possible since the final size is not known in the beginning). I think handle classes are not good since the also have huge overhead.

已经期待学习一些东西

magu _

修改: 我尝试了Eitan T提出的链表,但我认为matlab的开销仍然很大.我尝试使用双数组作为数据(rand(200000,1)).

I tried out the linked list proposed by Eitan T but I think the overhead from matlab is still rather big. I tried something with an double array as data (rand(200000,1)).

我做了一个小图来说明:

I made a little plot to illustrate:

该图的代码:(我按照答卷中所述使用了matlab hompage中的dlnode类)

code for the graph: (I used the dlnode class from the matlab hompage as stated in the answering post)

D = rand(200000,1);

D = rand(200000,1);

s = linspace(10,20000,50);
nC = zeros(50,1);
nL = zeros(50,1);

for i = 1:50
a = cell(0);

tic
for ii = 1:s(i)
    a{end + 1} = D;
end
nC(i) = toc;

a = list([]);

tic
for ii = 1:s(i)
    a.insertAfter(list(D));
end
nL(i) = toc;

end

figure
plot(s,nC,'r',s,nL,'g')
xlabel('#iter')
ylabel('time (s)')
legend({'cell' 'list'})

不要误会我的意思,我喜欢链接列表的想法,因为它相当灵活,但是我认为开销可能很大.

Don't get me wrong I love the idea of linked list, since there are rather flexible, but I think the overhead might be to big.

推荐答案

单元格数组类似于C ++中的向量还是指针数组?

Are cell arrays something similar to a vector or an array of pointers in C++?

单元数组确实允许存储不同类型和大小的数据,但是每个单元还增加了112字节的恒定开销(请参阅我的另一个答案).这远远超过了8字节的两倍,并且这是不可忽略的,尤其是在处理如您的示例中的大单元格数组时.

Cell arrays allow storing data of different types and sizes indeed, but each cell also adds a constant overhead of 112 bytes (see this other answer of mine). This is far more than an 8-byte double, and this is non-negligible, especially when dealing with large cell arrays as in your example.

合理地假设一个单元格数组被实现为一个连续的指针数组,每个指针都指向该单元格的实际内容.

It is reasonable to assume that a cell array is implemented as a continuous array of pointers, each pointing to the actual content of the cell.

这意味着您可以单独修改每个单元格的内容,而无需实际调整单元格数组容器本身的大小.但是,这也意味着向单元阵列添加新单元需要动态存储分配,这就是为什么为单元阵列预分配内存可以提高性能的原因.

This means that you can modify the content of each cell individually without actually resizing the cell array container itself. However, this also means that adding new cells to the cell array requires dynamic storage allocation and this is why preallocating memory for a cell array improves performance.

一个相关的问题是,如果我想为Matlab中的异构数据创建一个数据容器(由于最终的大小在开始时就未知,因此无法进行预分配)

A related question would be, if I would like to make a data container for heterogeneous data in Matlab (preallocation is not possible since the final size is not known in the beginning)

不知道最终大小确实可能是一个问题,但是您始终可以预先分配一个具有必要的最大支持大小(如果有)的单元格数组,并最后删除空单元格.我还建议您研究在MATLAB .

Not knowing the final size may indeed be a problem, but you could always preallocate a cell array with the maximum supported size necessary (if there is one), and remove the empty cells in the end. I also suggest that you look into implementing linked lists in MATLAB.

这篇关于Matlab中单元格的预分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆