在 Julia 中索引数组时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia

查看：19 发布时间：2022/1/23 19:10:05 julia

本文介绍了在 Julia 中索引数组时避免内存分配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

更新:注意Julia v1+中的相关函数是view

UPDATE: Note that the relevant function in Julia v1+ is view

问题:我想在不触发内存分配的情况下对数组进行索引，尤其是在将索引元素传递给函数时.通过阅读 Julia 文档，我怀疑答案是围绕使用 sub 函数展开的，但不太明白如何......

Question: I would like to index into an array without triggering memory allocation, especially when passing the indexed elements into a function. From reading the Julia docs, I suspect the answer revolves around using the sub function, but can't quite see how...

工作示例: 我构建了一个 Float64 (x) 的大向量，然后为 x<中的每个观察结果创建一个索引/代码>.


Working Example: I build a large vector of Float64 (x) and then an index to every observation in x.
N = 10000000
x = randn(N)
inds = [1:N]

现在我在 x 和 x[inds] 上计时 mean 函数(我运行 mean(randn(2)) 首先避免任何编译器在计时上的不规则):
Now I time the mean function over x and x[inds] (I run mean(randn(2)) first to avoid any compiler irregularities in the timing):
@time mean(x)
@time mean(x[inds])

这是一个相同的计算，但正如预期的计时结果是:
It's an identical calculation, but as expected the results of the timings are:
elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)

那么，对于 inds 的任意选择(以及数组和函数的任意选择)，有没有办法解决内存分配问题?
So, is there a way around the memory allocation problem for arbitrary choices of inds (and arbitrary choice of array and function)?
推荐答案

也请阅读 tholy 的答案以获得全貌！

   Read tholy's answer too to get a full picture!
当使用索引数组时，现在 Julia 0.4-pre(2015 年 2 月开始)的情况不是很好:
When using an array of indices, the situation is not great right now on Julia 0.4-pre (start of Feb 2015):
julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds);  # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N)  # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N)  # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)

因此，虽然我们可以避免内存分配，但实际上我们仍然更慢(！).
问题在于按数组而非范围进行索引.您不能在 0.3 上为此使用 sub，但可以在 0.4 上使用.
如果我们可以按范围索引，那么我们可以在 0.3 和它内置于 0.4.这个案例和原来的mean差不多.




So while we can avoid the memory allocation, we are actually slower(!) still.
The issue is indexing by an array, as opposed to a range. You can't use sub for this on 0.3, but you can on 0.4.
If we can index by a range, then we can use ArrayViews.jl on 0.3 and its inbuilt on 0.4. This case is pretty much as good as the original mean.

我注意到使用较少数量的索引(而不是整个范围)，差距要小得多，内存分配也很低，所以 sub 可能值得:
I noticed that with a smaller number of indices used (instead of the whole range), the gap is much smaller, and the memory allocation is low, so sub might be worth:
N = 100000000
x = randn(N)
inds = [1:div(N,10)]

@time mean(x)
@time mean(x)
@time mean(x)
@time mean(x[inds])
@time mean(x[inds])
@time mean(x[inds])
xi = sub(x,inds)
@time mean(xi)
@time mean(xi)
@time mean(xi)

给予
elapsed time: 0.092831612 seconds (985 kB allocated)
elapsed time: 0.067694917 seconds (96 bytes allocated)
elapsed time: 0.066209038 seconds (96 bytes allocated)
elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.186084807 seconds (4 MB allocated)
elapsed time: 0.057476269 seconds (96 bytes allocated)
elapsed time: 0.05733602 seconds (96 bytes allocated)


                        这篇关于在 Julia 中索引数组时避免内存分配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在 Julia 中索引数组时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 Julia 中索引数组时避免内存分配 [英] Avoid memory allocation when indexing an array in Julia

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭