MATLAB 迭代添加数组元素:时间行为 [英] MATLAB adding array elements iteratively: time behavior

查看:44
本文介绍了MATLAB 迭代添加数组元素:时间行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我知道这不是推荐的技术(预分配更好),但我对这种计时行为非常好奇;我很好奇幕后会发生什么.

So I know this isn't the recommended technique (preallocating is better), but I got really curious about this timing behavior; I'm curious what might be happening under the hood.

在我的脑海中,根据实现的不同,向数组添加元素可能会在内存中引起几种不同的合理行为:(1) 摊销,添加元素需要相同的时间,例如在链表中添加元素维护一个指向最后一个元素的指针,(2) 可能会时不时地花费大量时间来预分配足够的内存,例如,列表中当前元素数量的两倍(如 Java 数组),(3) 某些东西比我想象的还要聪明.

In my head, adding an element to an array could induce a couple different reasonable behaviors in memory depending on implementation: (1) amortized, it would take the same amount of time to add an element like in a linked list where you maintain a pointer to the last element, (2) it could take a big chunk of time now and then to preallocate enough memory for, say, twice as many elements as currently in the list (like a Java array), (3) something more clever than I can think of.

MATLAB 似乎做了一些我不太熟悉的奇怪的事情.成本似乎呈线性增长,偶尔会出现峰值.对它可能在做什么的任何猜测(或明智的解释)?我对模拟进行了平均(我提交,这可能会隐藏一些有趣的模式).

MATLAB seems to do something wonky that I don't quite grock. There seems to be a linear increase in cost with occasional spikes. Any guesses (or intelligent explanations) of what it might be doing? I averaged over simulations (which, I submit, might hide some interesting patterns).

当您迭代地将一个元素添加到最初为空的列表的末尾时,就会发生这种情况.为什么线性增加?这些看似周期性的峰值是否有很酷的原因?

This is what happens when you add one element to the end of an initially empty list iteratively. Why the linear increase? Is there a cool reason for those seemingly periodic spikes?

我用来生成它的代码:

% for averaging over
num_averages = 100000;
% number of simulations
num_sims = 10000;
% the time it takes to add one more item, array
time_store = nan(num_sims, num_averages);

% averaging count
for i = 1:num_averages

    % an array that grows with every loop
    building_array = [];

    for j = 1:num_sims
        tic;
        building_array = [building_array 1];
        time_store(j, i) = toc;
    end
end

plot(mean(time_store, 2)); hold all;
xlabel('Element num'); ylabel('Time');

推荐答案

这真的很有趣!以非常规律的间隔出现峰值,曲线或多或少是平坦的.在每个峰值之后,线会上升一点.整洁的!我认为这与缓存行有关.您正在测量复制数组的成本,这大概与读取和写入缓存行的成本有关.

This is really interesting! There are peaks at very regular intervals, and the curve is more or less flat in between. After each peak the line rises a bit. Neat! I think this is related to cache lines. You are measuring the cost of copying the array, which is presumably related to the cost of reading and writing cache lines.

如果换行

building_array = [building_array 1];

building_array(end+1) = 1;

那么你就不会在每个迭代循环中复制数据,而是实际将一个元素附加到数组中.通过这种变化,我得到了一条基本平坦的线,峰值在对数增加的距离处(以及对数增加的高度),这与需要时加倍阵列大小的常识实现一致:

then you won't be copying the data at every iteration loop, but actually appending an element to the array. With this change I get a mostly flat line, with peaks at logarithmically increasing distances (and of logarithmically increasing height too), which is consistent with the common-sense implementation of doubling array size when needed:

再解释一下:building_array = [building_array 1] 创建一个新数组,比building_array 大一个元素,并复制building_array1 进去.然后将其分配给 building_array.看来 MATLAB 的 JIT 还没有针对这种代码模式进行优化!

Just some more explanation: building_array = [building_array 1] creates a new array, one element larger than building_array, and copies building_array and 1 into it. This is then assigned to building_array. It seems that MATLAB's JIT has not yet been optimized for this code pattern!

这篇关于MATLAB 迭代添加数组元素:时间行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆