在Julia中为数组建立索引时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia

查看：72 发布时间：2020/4/25 4:28:44 julia

本文介绍了在Julia中为数组建立索引时避免内存分配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题::我想在不触发内存分配的情况下对数组建立索引，尤其是在将索引元素传递到函数中时.通过阅读Julia文档，我怀疑答案围绕着sub函数，但是还不太清楚...

Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From reading the Julia docs, I suspect the answer revolves around using the sub function, but can't quite see how...

工作示例::我建立了Float64(x)的大向量，然后为x中的每个观察值建立了索引.

Working Example: I build a large vector of Float64 (x) and then an index to every observation in x.

N = 10000000
x = randn(N)
inds = [1:N]

现在，我将mean函数的时间设置在x和x[inds]上(我先运行mean(randn(2))以避免时间上的任何编译器异常):

Now I time the mean function over x and x[inds] (I run mean(randn(2)) first to avoid any compiler irregularities in the timing):

@time mean(x)
@time mean(x[inds])

这是一个相同的计算，但是正如预期的那样，计时结果是:

It's an identical calculation, but as expected the results of the timings are:

elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)

那么，对于inds的任意选择(以及数组和函数的任意选择)，是否有办法解决内存分配问题?

So, is there a way around the memory allocation problem for arbitrary choices of inds (and arbitrary choice of array and function)?

推荐答案

也阅读tholy的答案以获得完整图片！

Read tholy's answer too to get a full picture!

使用索引数组时，Julia 0.4-pre(2015年2月开始)的情况现在并不好:

When using an array of indices, the situation is not great right now on Julia 0.4-pre (start of Feb 2015):

julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds);  # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N)  # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N)  # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)

因此，尽管我们可以避免分配内存，但实际上速度仍然较慢(！).
问题是通过数组而不是范围建立索引.您不能在0.3上使用sub，但可以在0.4上使用.
如果我们可以按范围进行索引，则可以在0.3上使用 ArrayViews.jl 其内置于0.4.这种情况与原始mean差不多.

So while we can avoid the memory allocation, we are actually slower(!) still.
The issue is indexing by an array, as opposed to a range. You can't use sub for this on 0.3, but you can on 0.4.
If we can index by a range, then we can use ArrayViews.jl on 0.3 and its inbuilt on 0.4. This case is pretty much as good as the original mean.

我注意到使用较少数量的索引(而不是整个范围)时，间隙要小得多，并且内存分配很低，因此sub可能值得:

I noticed that with a smaller number of indices used (instead of the whole range), the gap is much smaller, and the memory allocation is low, so sub might be worth:

N = 100000000
x = randn(N)
inds = [1:div(N,10)]

@time mean(x)
@time mean(x)
@time mean(x)
@time mean(x[inds])
@time mean(x[inds])
@time mean(x[inds])
xi = sub(x,inds)
@time mean(xi)
@time mean(xi)
@time mean(xi)

给予

elapsed time: 0.092831612 seconds (985 kB allocated)
elapsed time: 0.067694917 seconds (96 bytes allocated)
elapsed time: 0.066209038 seconds (96 bytes allocated)
elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.186084807 seconds (4 MB allocated)
elapsed time: 0.057476269 seconds (96 bytes allocated)
elapsed time: 0.05733602 seconds (96 bytes allocated)

这篇关于在Julia中为数组建立索引时避免内存分配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Julia中为数组建立索引时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Julia中为数组建立索引时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭