对数组求和比在 Julia 中对单个变量求和慢 [英] summation over array slower than summing individual variables in Julia

查看:37
本文介绍了对数组求和比在 Julia 中对单个变量求和慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,我最近正在做一系列测试.我有一个 MC 模拟,其中有几个变量 (20),将它们全部放在一个一维数组中是有意义的,因为它使一些东西更容易阅读.

Ok, I'm doing a serie of tests lately. I have a MC simulation where I have several variables (20) which make sense to put them all inside a one-dimensional array because it makes several things easier to read.

但是我遇到了一个问题,我需要对每次迭代中的变量求和,并且模拟需要大量迭代,所以我遇到了这个问题(减少到 7 个变量):

But I'm having one problem, I need to sum the variables in every iteration and the simulation takes A LOT of iterations, so I run into this problem (reduced to 7 variables):

function sumtest1(N)
    s=0.0
    a=1.0
    b=2.0
    c=3.0
    d=4.0
    e=5.0
    f=6.0
    g=7.0
    for i = 1:N
        s += (a+b+c+d+e+f+g)
    end
    return s
end

function sumtest2(N)
    s=0.0
    A=[1.0,2.0,3.0,4.0,5.0,6.0,7.0]
    for i = 1:N
        s += sum(A)
    end
    return s
end

@time sumtest1(1_000_000_000)
elapsed time: 0.998272756 seconds (96 bytes allocated)

@time sumtest1(1_000_000_000)
elapsed time: 7.522198967 seconds (208 bytes allocated)

这是预期的吗?还是我做错了什么?我真的很想将我的变量编入索引,因为其他原因现在无法解释,但这种性能损失是我无法接受的.

Is this expected? Or am I doing something wrong? I would really like to have my variables indexed because of other reasons too long to explain now, but this performance penalty is something I can't go with.

推荐答案

让我们看看正在为 sumtest1 执行的 LLVM 代码:

Let's look at the LLVM code that's being executed for sumtest1:

julia> @code_llvm sumtest1(10^9)

define double @julia_sumtest1_21391(i64) {
top:
  %1 = icmp sgt i64 %0, 0
  %2 = select i1 %1, i64 %0, i64 0
  %3 = icmp eq i64 %2, 0
  br i1 %3, label %L3, label %L.preheader

L.preheader:                                      ; preds = %top
  %4 = icmp sgt i64 %0, 0
  %smax = select i1 %4, i64 %0, i64 0
  br label %L

L:                                                ; preds = %L, %L.preheader
  %lsr.iv = phi i64 [ %smax, %L.preheader ], [ %lsr.iv.next, %L ]
  %s.0 = phi double [ %5, %L ], [ 0.000000e+00, %L.preheader ]
  %5 = fadd double %s.0, 2.800000e+01
  %lsr.iv.next = add i64 %lsr.iv, -1
  %6 = icmp eq i64 %lsr.iv.next, 0
  br i1 %6, label %L3, label %L

L3:                                               ; preds = %L, %top
  %s.1 = phi double [ 0.000000e+00, %top ], [ %5, %L ]
  ret double %s.1
}

这很难理解,但是在循环体中突出了一点,L:

That's pretty hard to grok, but one thing stands out in the body of the loop, L:

  %5 = fadd double %s.0, 2.800000e+01

对于每次迭代,预先计算的常量 28.0 被添加到累加器 s.编译器可以告诉您永远不会更改任何局部变量,因此它知道每次都添加相同的总和.循环必须执行的唯一原因是重复的浮点加法并不完全等同于乘法.如果所有局部变量都改成整数,其中重复加法完全等价于乘法,则完全消除循环:

For each iteration, the pre-computed constant 28.0 is being added to the accumulator, s. The compiler can tell that you never change any of the local variables and so it knows that the same sum is being added every time. The only reason that the loop has to execute at all is that repeated floating-point addition is not exactly equivalent to multiplication. If all the local variables are changed to integers, where repeated addition is exactly equivalent to multiplication, the loop is eliminated entirely:

julia> @time sumtest1_int(10^9)
  0.000005 seconds (6 allocations: 192 bytes)
28000000000

julia> @code_llvm sumtest1_int(10^9)

define i64 @julia_sumtest1_int_21472(i64) {
top:
  %1 = icmp slt i64 %0, 1
  br i1 %1, label %L3, label %L.preheader

L.preheader:                                      ; preds = %top
  %2 = icmp sgt i64 %0, 0
  %.op = mul i64 %0, 28
  %3 = select i1 %2, i64 %.op, i64 0
  br label %L3

L3:                                               ; preds = %L.preheader, %top
  %s.1 = phi i64 [ 0, %top ], [ %3, %L.preheader ]
  ret i64 %s.1
}

这大致翻译回 Julia 为:

Which translates roughly back into Julia as:

sumtest1_int(N) = N < 1 ? 0 : ifelse(N > 0, N*28, 0)

这有点多余,因为主体可以简化为 ifelse(N > 1, N*28, 0) (反过来也可以改为 28N 因为我们不关心 N) 的负值,但它仍然比执行循环快得多.

That's a little redundant, since the body could be simplified to ifelse(N > 1, N*28, 0) (which could in turn be changed to just 28N since we don't care about negative values of N), but it's still much faster than executing a loop.

sumtest2 函数几乎无法轻易分析或优化.这需要证明数组 A 永远不能更改,这非常困难.所以编译器别无选择,只能做所有的工作,当然这比不做要慢得多.在您的模拟中,使用局部变量可能仍然比将值存储在数组中更快,但可能不会.您必须测量那些更难完全优化的代码才能确定.

The function sumtest2 cannot be analyzed or optimized nearly so easily. It would require proving that the array A can never be changed, which is quite difficult. So the compiler has no choice but to do all the work, which is, of course, much slower than not doing it. In your simulation, it may still be faster to use local variables than storing values in an array, but it may not. You'll have to measure code that does something harder to optimize away completely to be sure.

这篇关于对数组求和比在 Julia 中对单个变量求和慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆