在 Julia 中索引数组时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia

查看:19
本文介绍了在 Julia 中索引数组时避免内存分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:注意Julia v1+中的相关函数是view

UPDATE: Note that the relevant function in Julia v1+ is view

问题:我想在不触发内存分配的情况下对数组进行索引,尤其是在将索引元素传递给函数时.通过阅读 Julia 文档,我怀疑答案是围绕使用 sub 函数展开的,但不太明白如何......

Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From reading the Julia docs, I suspect the answer revolves around using the sub function, but can't quite see how...

工作示例: 我构建了一个 Float64 (x) 的大向量,然后为 x<中的每个观察结果创建一个索引/代码>.

Working Example: I build a large vector of Float64 (x) and then an index to every observation in x.

N = 10000000
x = randn(N)
inds = [1:N]

现在我在 xx[inds] 上计时 mean 函数(我运行 mean(randn(2)) 首先避免任何编译器在计时上的不规则):

Now I time the mean function over x and x[inds] (I run mean(randn(2)) first to avoid any compiler irregularities in the timing):

@time mean(x)
@time mean(x[inds])

这是一个相同的计算,但正如预期的计时结果是:

It's an identical calculation, but as expected the results of the timings are:

elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)

那么,对于 inds 的任意选择(以及数组和函数的任意选择),有没有办法解决内存分配问题?

So, is there a way around the memory allocation problem for arbitrary choices of inds (and arbitrary choice of array and function)?

推荐答案

也请阅读 tholy 的答案以获得全貌!

Read tholy's answer too to get a full picture!

当使用索引数组时,现在 Julia 0.4-pre(2015 年 2 月开始)的情况不是很好:

When using an array of indices, the situation is not great right now on Julia 0.4-pre (start of Feb 2015):

julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds);  # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N)  # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N)  # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)

  • 因此,虽然我们可以避免内存分配,但实际上我们仍然更慢(!).
  • 问题在于按数组而非范围进行索引.您不能在 0.3 上为此使用 sub,但可以在 0.4 上使用.
  • 如果我们可以按范围索引,那么我们可以在 0.3 和它内置于 0.4.这个案例和原来的mean差不多.
    • So while we can avoid the memory allocation, we are actually slower(!) still.
    • The issue is indexing by an array, as opposed to a range. You can't use sub for this on 0.3, but you can on 0.4.
    • If we can index by a range, then we can use ArrayViews.jl on 0.3 and its inbuilt on 0.4. This case is pretty much as good as the original mean.
    • 我注意到使用较少数量的索引(而不是整个范围),差距要小得多,内存分配也很低,所以 sub 可能值得:

      I noticed that with a smaller number of indices used (instead of the whole range), the gap is much smaller, and the memory allocation is low, so sub might be worth:

      N = 100000000
      x = randn(N)
      inds = [1:div(N,10)]
      
      @time mean(x)
      @time mean(x)
      @time mean(x)
      @time mean(x[inds])
      @time mean(x[inds])
      @time mean(x[inds])
      xi = sub(x,inds)
      @time mean(xi)
      @time mean(xi)
      @time mean(xi)
      

      给予

      elapsed time: 0.092831612 seconds (985 kB allocated)
      elapsed time: 0.067694917 seconds (96 bytes allocated)
      elapsed time: 0.066209038 seconds (96 bytes allocated)
      elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
      elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
      elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
      elapsed time: 0.186084807 seconds (4 MB allocated)
      elapsed time: 0.057476269 seconds (96 bytes allocated)
      elapsed time: 0.05733602 seconds (96 bytes allocated)
      

      这篇关于在 Julia 中索引数组时避免内存分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆