MATLAB迭代添加数组元素:时间行为 [英] MATLAB adding array elements iteratively: time behavior

查看:199
本文介绍了MATLAB迭代添加数组元素:时间行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我知道这不是推荐的技术(预分配更好),但是我对这种计时行为感到非常好奇.我很好奇引擎盖下会发生什么.

So I know this isn't the recommended technique (preallocating is better), but I got really curious about this timing behavior; I'm curious what might be happening under the hood.

在我看来,根据实现的不同,将元素添加到数组可能会在内存中引发几种不同的合理行为:(1)摊销后,像在链表中添加元素一样,花费相同的时间维护指向最后一个元素的指针,(2)现在可能要花费大量时间,然后为例如列表中当前元素的两倍(例如Java数组)预分配足够的内存,(3)比我想像的还要聪明.

In my head, adding an element to an array could induce a couple different reasonable behaviors in memory depending on implementation: (1) amortized, it would take the same amount of time to add an element like in a linked list where you maintain a pointer to the last element, (2) it could take a big chunk of time now and then to preallocate enough memory for, say, twice as many elements as currently in the list (like a Java array), (3) something more clever than I can think of.

MATLAB似乎在做一些我不太了解的事情.成本似乎呈线性增加,偶有峰值.对它可能正在做什么的任何猜测(或明智的解释)?我对模拟进行了平均(我提交的模拟可能会隐藏一些有趣的模式).

MATLAB seems to do something wonky that I don't quite grock. There seems to be a linear increase in cost with occasional spikes. Any guesses (or intelligent explanations) of what it might be doing? I averaged over simulations (which, I submit, might hide some interesting patterns).

当您将一个元素迭代地添加到最初为空的列表的末尾时,会发生这种情况.为什么线性增加?这些看似周期性的峰值有没有很酷的原因?

This is what happens when you add one element to the end of an initially empty list iteratively. Why the linear increase? Is there a cool reason for those seemingly periodic spikes?

我用来生成它的代码:

% for averaging over
num_averages = 100000;
% number of simulations
num_sims = 10000;
% the time it takes to add one more item, array
time_store = nan(num_sims, num_averages);

% averaging count
for i = 1:num_averages

    % an array that grows with every loop
    building_array = [];

    for j = 1:num_sims
        tic;
        building_array = [building_array 1];
        time_store(j, i) = toc;
    end
end

plot(mean(time_store, 2)); hold all;
xlabel('Element num'); ylabel('Time');

推荐答案

这真的很有趣!在非常规则的时间间隔处出现峰,并且曲线在两者之间或多或少是平坦的.在每个峰值之后,线上升一点.整洁的!我认为这与缓存行有关.您正在测量复制阵列的成本,这大概与读写缓存行的成本有关.

This is really interesting! There are peaks at very regular intervals, and the curve is more or less flat in between. After each peak the line rises a bit. Neat! I think this is related to cache lines. You are measuring the cost of copying the array, which is presumably related to the cost of reading and writing cache lines.

如果替换行

building_array = [building_array 1];

使用

building_array(end+1) = 1;

那么您将不会在每个迭代循环中都复制数据,而是实际上将一个元素附加到数组中.进行此更改后,我得到的线条几乎是平坦的,峰的距离对数增加(并且高度也对数增加),这与在需要时将数组大小加倍的常识性实现是一致的:

then you won't be copying the data at every iteration loop, but actually appending an element to the array. With this change I get a mostly flat line, with peaks at logarithmically increasing distances (and of logarithmically increasing height too), which is consistent with the common-sense implementation of doubling array size when needed:

仅需更多说明:building_array = [building_array 1]创建一个新数组,该数组比building_array大一个元素,并将building_array1复制到其中.然后将其分配给building_array.看来MATLAB的JIT尚未针对此代码模式进行优化!

Just some more explanation: building_array = [building_array 1] creates a new array, one element larger than building_array, and copies building_array and 1 into it. This is then assigned to building_array. It seems that MATLAB's JIT has not yet been optimized for this code pattern!

这篇关于MATLAB迭代添加数组元素:时间行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆