Julia 中的 push!() 和 append!() 方法的效率如何? [英] How efficient are push!() and append!() methods in Julia?

查看:85
本文介绍了Julia 中的 push!() 和 append!() 方法的效率如何?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

this 页面上,它说方法 push!()append!() 非常高效.

On this page it says that methods push!() and append!() are very efficient.

我的问题是它们的效率究竟如何?

My question is how exactly efficient they are?

即,

如果知道最终数组的大小,那么预分配数组还是使用 append!()/push!() 会一样高效吗?

If one knows the size of the final array, is it still faster to preallocate the array or growing it incrementally using append!() / push!() would be just as efficient?

现在考虑一个不知道最终数组大小的情况.例如,将多个数组合并为 1 个大数组(称为 A).

Now consider the case when one does not know the size of the final array. For example, merging multiple arrays into 1 big array (call it A).

实现这一点的两种方法:

Two ways to achieve this:

  1. append!()- 将每个数组添加到 A,其大小尚未预先分配.
  2. 首先对每个数组的维度求和,以找到合并数组A的最终大小.然后预分配 A 并复制每个数组的内容.
  1. append!()-ing each array to A, whose size has not been preallocated.
  2. First sum dimensions of each array to find the final size of the merged array A. Then preallocate A and copy over contents of each array.

在这种情况下,哪个更有效?

Which one would be more efficient in this case?

推荐答案

此类问题的答案通常是:视情况而定".例如,您要制作什么大小的数组?数组的元素类型是什么?

The answer to a question like this is usually: "it depends". For example, what size array are you trying to make? What is the element-type of the array?

但是,如果您只是进行启发式测试,为什么不运行一个简单的速度测试呢?例如,以下代码段:

But if you're just after a heuristic, why not run a simple speed test? For example, the following snippet:

function f1(N::Int)
    x = Array(Int, N)
    for n = 1:N
        x[n] = n
    end
    return(x)
end

function f2(N::Int)
    x = Array(Int, 0)
    for n = 1:N
        push!(x, n)
    end
    return(x)
end

f1(2)
f2(2)

N = 5000000000
@time f1(N)
@time f2(N)

建议使用 push! 比预分配慢大​​约 6 倍.如果您使用 append! 以更少的步骤添加更大的块,那么乘数几乎肯定会更少.

suggests that using push! is about 6 times slower than pre-allocating. If you were using append! to add larger blocks with less steps, the multiplier would almost certainly be less.

在解释这些数字时,请抵制什么!?慢 6 倍!?"的下意识反应.这个数字需要放在数组构建对整个程序/函数/子例程的重要性的上下文中.例如,如果数组构建仅占例程运行时间的 1%(对于大多数典型例程,数组构建将包含小于 1%),那么如果您的例程运行 100 秒, 1 秒用于构建数组.将其乘以 6 得到 6 秒.99 秒 + 6 秒 = 105 秒.因此,使用 push! 代替预分配将整个程序的运行时间增加 5%.除非您从事高频交易,否则您可能不会关心这一点.

When interpreting these numbers, resist the knee-jerk reaction of "What!? 6-times slower!?". This number needs to be placed in the context of how important array building is to your entire program/function/subroutine. For example, if array building comprises only 1% of the run-time of your routine (for most typical routines, array building would comprise much less than 1%), then if your routine runs for 100 seconds, 1 second is spent building arrays. Multiply that by 6 to get 6 seconds. 99 seconds + 6 seconds = 105 seconds. So, using push! instead of pre-allocating increases the runtime of your whole program by 5%. Unless you work in high-frequency trading, you're probably not going to care about that.

对于我自己,我通常的规则是:如果我可以轻松预分配,那么我就预分配.但是如果 push! 使例程更容易编码,引入错误的可能性更低,并且尝试预先确定适当的数组大小的麻烦更少,那么我使用 push! 不假思索.

For myself, my usual rule is this: if I can pre-allocate easily, then I pre-allocate. But if push! makes the routine much easier to code, with lower possibility of introducing bugs, and less messing around trying to pre-determine the appropriate array size, then I use push! without a second thought.

最后一点:如果您想真正了解 push! 的具体工作原理,您需要深入研究 C 例程,因为 julia source 只是包装了一个 ccall.

Final note: if you want to actually look at the specifics of how push! works, you'll need to delve into the C routines, since the julia source just wraps a ccall.

更新: OP 在评论中质疑 push! 与 MATLAB 中的 array(end+1) = n 之类的操作之间的区别.我最近没有在 MATLAB 中编码,但我确实在我的机器上保留了一份副本,因为我所有旧论文的代码都在 MATLAB 中.我当前的版本是 R2014a.我的理解是,在这个版本的 MATLAB 中,添加到数组的末尾会重新分配 整个 数组.相比之下,据我所知,Julia 中的 push!.NET 中的列表非常相似.随着向量大小的增长,分配给向量的内存会以块的形式动态添加.这大大减少了需要执行的重新分配量,尽管我的理解是一些重新分配仍然是必要的(我很高兴在这一点上得到纠正).所以 push! 应该比在 Matlab 中添加到数组快很多.所以我们可以运行下面的 MATLAB 代码:

UPDATE: OP questioned in the comments the difference between push! and an operation like array(end+1) = n in MATLAB. I haven't coded in MATLAB recently, but I do keep a copy on my machine since the code for all my older papers is in MATLAB. My current version is R2014a. My understanding is that in this version of MATLAB, adding to the end of the array will re-allocate the entire array. In contrast, push! in Julia works, to the best of my knowledge, much like lists in .NET. The memory allocated to the vector is dynamically added in blocks as the size of the vector grows. This massively reduces the amount of re-allocation that needs to be performed, although my understanding is that some re-allocation is still necessary (I'm happy to be corrected on this point). So push! should work much faster than adding to an array in Matlab. So we can run the following MATLAB code:

N = 10000000;
tic
x = ones(N, 1);
for n = 1:N
    x(n) = n;
end
toc


N = 10000000;
tic
x = [];
for n = 1:N
    x(end+1) = n;
end
toc

我明白了:

Elapsed time is 0.407288 seconds.
Elapsed time is 1.802845 seconds.

因此,大约减速了 5 倍.鉴于计时方法中应用的极端不严谨,人们可能会说这与 Julia 案例等价.但是等等,如果我们用 N = 10000000 在 Julia 中重新运行这个练习,时间是 0.01 和 0.07 秒.这些数字与 MATLAB 数字的巨大差异让我非常紧张,不敢断言引擎盖下实际发生的事情,以及将 MATLAB 的 5 倍减速与 MATLAB 的 6 倍减速进行比较是否合理.朱莉娅.基本上,我现在已经超出了我的深度.也许更了解 MATLAB 在幕后实际工作的人可以提供更多见解.关于 Julia,我不是一个 C 编码员,所以我怀疑我是否会通过查看源代码(与 MATLAB 不同,它是公开的)获得很多洞察力.

So, about a 5-times slowdown. Given the extreme non-rigour applied in the timing methodology, one might be tempted to say this is equivalent to the Julia case. But wait, if we re-run the exercise in Julia with N = 10000000, the timings are 0.01 and 0.07 seconds. The sheer difference in the magnitude of these numbers to the MATLAB numbers makes me very nervous about making claims about what is actually happening under the hood, and whether it is legitimate to compare the 5-times slowdown in MATLAB to the 6-times slowdown in Julia. Basically, I'm now out of my depth. Maybe someone who knows more about what MATLAB actually does under the hood can offer more insight. Regarding Julia, I'm not much of a C-coder, so I doubt I'll get much insight from looking through the source (which is publicly available, unlike MATLAB).

这篇关于Julia 中的 push!() 和 append!() 方法的效率如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆