在Julia中为数组建立索引时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia
问题描述
问题::我想在不触发内存分配的情况下对数组建立索引,尤其是在将索引元素传递到函数中时.通过阅读Julia文档,我怀疑答案围绕着sub
函数,但是还不太清楚...
Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From reading the Julia docs, I suspect the answer revolves around using the sub
function, but can't quite see how...
工作示例::我建立了Float64
(x
)的大向量,然后为x
中的每个观察值建立了索引.
Working Example: I build a large vector of Float64
(x
) and then an index to every observation in x
.
N = 10000000
x = randn(N)
inds = [1:N]
现在,我将mean
函数的时间设置在x
和x[inds]
上(我先运行mean(randn(2))
以避免时间上的任何编译器异常):
Now I time the mean
function over x
and x[inds]
(I run mean(randn(2))
first to avoid any compiler irregularities in the timing):
@time mean(x)
@time mean(x[inds])
这是一个相同的计算,但是正如预期的那样,计时结果是:
It's an identical calculation, but as expected the results of the timings are:
elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)
那么,对于inds
的任意选择(以及数组和函数的任意选择),是否有办法解决内存分配问题?
So, is there a way around the memory allocation problem for arbitrary choices of inds
(and arbitrary choice of array and function)?
推荐答案
也阅读tholy的答案以获得完整图片!
Read tholy's answer too to get a full picture!
使用索引数组时,Julia 0.4-pre(2015年2月开始)的情况现在并不好:
When using an array of indices, the situation is not great right now on Julia 0.4-pre (start of Feb 2015):
julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds); # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N) # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N) # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)
- 因此,尽管我们可以避免分配内存,但实际上速度仍然较慢(!).
- 问题是通过数组而不是范围建立索引.您不能在0.3上使用
sub
,但可以在0.4上使用. - 如果我们可以按范围进行索引,则可以在0.3上使用 ArrayViews.jl 其内置于0.4.这种情况与原始
mean
差不多. - So while we can avoid the memory allocation, we are actually slower(!) still.
- The issue is indexing by an array, as opposed to a range. You can't use
sub
for this on 0.3, but you can on 0.4. - If we can index by a range, then we can use ArrayViews.jl on 0.3 and its inbuilt on 0.4. This case is pretty much as good as the original
mean
.
我注意到使用较少数量的索引(而不是整个范围)时,间隙要小得多,并且内存分配很低,因此sub
可能值得:
I noticed that with a smaller number of indices used (instead of the whole range), the gap is much smaller, and the memory allocation is low, so sub
might be worth:
N = 100000000
x = randn(N)
inds = [1:div(N,10)]
@time mean(x)
@time mean(x)
@time mean(x)
@time mean(x[inds])
@time mean(x[inds])
@time mean(x[inds])
xi = sub(x,inds)
@time mean(xi)
@time mean(xi)
@time mean(xi)
给予
elapsed time: 0.092831612 seconds (985 kB allocated)
elapsed time: 0.067694917 seconds (96 bytes allocated)
elapsed time: 0.066209038 seconds (96 bytes allocated)
elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.186084807 seconds (4 MB allocated)
elapsed time: 0.057476269 seconds (96 bytes allocated)
elapsed time: 0.05733602 seconds (96 bytes allocated)
这篇关于在Julia中为数组建立索引时避免内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!