在数组上求和比在Julia中求和单个变量慢 [英] summation over array slower than summing individual variables in Julia

查看:159
本文介绍了在数组上求和比在Julia中求和单个变量慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好,最近我正在做一系列的测试.我有一个MC模拟,其中有几个变量(20),可以将它们全部放入一维数组中,这是很有意义的,因为它使几件事更易于阅读.

Ok, I'm doing a serie of tests lately. I have a MC simulation where I have several variables (20) which make sense to put them all inside a one-dimensional array because it makes several things easier to read.

但是我有一个问题,我需要在每次迭代中对变量求和,并且模拟需要进行很多次迭代,所以我遇到了这个问题(减少到7个变量):

But I'm having one problem, I need to sum the variables in every iteration and the simulation takes A LOT of iterations, so I run into this problem (reduced to 7 variables):

function sumtest1(N)
    s=0.0
    a=1.0
    b=2.0
    c=3.0
    d=4.0
    e=5.0
    f=6.0
    g=7.0
    for i = 1:N
        s += (a+b+c+d+e+f+g)
    end
    return s
end

function sumtest2(N)
    s=0.0
    A=[1.0,2.0,3.0,4.0,5.0,6.0,7.0]
    for i = 1:N
        s += sum(A)
    end
    return s
end

@time sumtest1(1_000_000_000)
elapsed time: 0.998272756 seconds (96 bytes allocated)

@time sumtest1(1_000_000_000)
elapsed time: 7.522198967 seconds (208 bytes allocated)

这是预期的吗?还是我做错了什么?我真的很想为我的变量建立索引,因为其他原因现在解释太久了,但是这种性能损失是我无法接受的.

Is this expected? Or am I doing something wrong? I would really like to have my variables indexed because of other reasons too long to explain now, but this performance penalty is something I can't go with.

推荐答案

让我们看看为sumtest1执行的LLVM代码:

Let's look at the LLVM code that's being executed for sumtest1:

julia> @code_llvm sumtest1(10^9)

define double @julia_sumtest1_21391(i64) {
top:
  %1 = icmp sgt i64 %0, 0
  %2 = select i1 %1, i64 %0, i64 0
  %3 = icmp eq i64 %2, 0
  br i1 %3, label %L3, label %L.preheader

L.preheader:                                      ; preds = %top
  %4 = icmp sgt i64 %0, 0
  %smax = select i1 %4, i64 %0, i64 0
  br label %L

L:                                                ; preds = %L, %L.preheader
  %lsr.iv = phi i64 [ %smax, %L.preheader ], [ %lsr.iv.next, %L ]
  %s.0 = phi double [ %5, %L ], [ 0.000000e+00, %L.preheader ]
  %5 = fadd double %s.0, 2.800000e+01
  %lsr.iv.next = add i64 %lsr.iv, -1
  %6 = icmp eq i64 %lsr.iv.next, 0
  br i1 %6, label %L3, label %L

L3:                                               ; preds = %L, %top
  %s.1 = phi double [ 0.000000e+00, %top ], [ %5, %L ]
  ret double %s.1
}

这很难理解,但是循环的主体中有一个突出的地方,L:

That's pretty hard to grok, but one thing stands out in the body of the loop, L:

  %5 = fadd double %s.0, 2.800000e+01

对于每次迭代,都会将预计算的常量28.0添加到累加器s中.编译器可以告诉您永远不要更改任何局部变量,因此它知道每次都会添加相同的总和.循环必须执行的唯一原因是重复的浮点加法并不完全等同于乘法.如果将所有局部变量都更改为整数,其中重复加法完全等于乘法,则将完全消除循环:

For each iteration, the pre-computed constant 28.0 is being added to the accumulator, s. The compiler can tell that you never change any of the local variables and so it knows that the same sum is being added every time. The only reason that the loop has to execute at all is that repeated floating-point addition is not exactly equivalent to multiplication. If all the local variables are changed to integers, where repeated addition is exactly equivalent to multiplication, the loop is eliminated entirely:

julia> @time sumtest1_int(10^9)
  0.000005 seconds (6 allocations: 192 bytes)
28000000000

julia> @code_llvm sumtest1_int(10^9)

define i64 @julia_sumtest1_int_21472(i64) {
top:
  %1 = icmp slt i64 %0, 1
  br i1 %1, label %L3, label %L.preheader

L.preheader:                                      ; preds = %top
  %2 = icmp sgt i64 %0, 0
  %.op = mul i64 %0, 28
  %3 = select i1 %2, i64 %.op, i64 0
  br label %L3

L3:                                               ; preds = %L.preheader, %top
  %s.1 = phi i64 [ 0, %top ], [ %3, %L.preheader ]
  ret i64 %s.1
}

其中大致翻译成Julia的形式是:

Which translates roughly back into Julia as:

sumtest1_int(N) = N < 1 ? 0 : ifelse(N > 0, N*28, 0)

这有点多余,因为可以将主体简化为ifelse(N > 1, N*28, 0)(由于我们不关心N的负值,因此可以将其更改为28N),但是它仍然要快得多而不是执行循环.

That's a little redundant, since the body could be simplified to ifelse(N > 1, N*28, 0) (which could in turn be changed to just 28N since we don't care about negative values of N), but it's still much faster than executing a loop.

功能sumtest2几乎无法如此轻松地进行分析或优化.这将需要证明数组A永远不能更改,这是非常困难的.因此,编译器别无选择,只能完成所有工作,这当然比不执行要慢得多.在您的仿真中,使用局部变量可能比在数组中存储值要快,但可能没有.确保必须测量那些难以完全优化的代码.

The function sumtest2 cannot be analyzed or optimized nearly so easily. It would require proving that the array A can never be changed, which is quite difficult. So the compiler has no choice but to do all the work, which is, of course, much slower than not doing it. In your simulation, it may still be faster to use local variables than storing values in an array, but it may not. You'll have to measure code that does something harder to optimize away completely to be sure.

这篇关于在数组上求和比在Julia中求和单个变量慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆