朱莉娅并行性:@distributed(+)比串行速度慢? [英] Julia parallelism: @distributed (+) slower than serial?

查看:175
本文介绍了朱莉娅并行性:@distributed(+)比串行速度慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在网上看到有关Julia并行性的一些教程之后,我决定实现一个小的并行代码段来计算谐波序列.

After seeing a couple tutorials on the internet on Julia parallelism I decided to implement a small parallel snippet for computing the harmonic series.

序列号是:

harmonic = function (n::Int64)
    x = 0
    for i in n:-1:1 # summing backwards to avoid rounding errors
        x +=1/i
    end
    x
end

我制作了2个并行版本,一个使用@distributed宏,另一个使用@everywhere宏(julia -p 2 btw):

And I made 2 parallel versions, one using @distributed macro and another using the @everywhere macro (julia -p 2 btw):

@everywhere harmonic_ever = function (n::Int64)
    x = 0
    for i in n:-1:1
        x +=1/i
    end
    x
end

harmonic_distr = function (n::Int64)
    x = @distributed (+) for i in n:-1:1
        x = 1/i
    end
    x
end

但是,当我运行上述代码并@time时,却没有任何提速-实际上,@distributed版本的运行速度明显慢了!

However, when I run the above code and @time it, I don't get any speedup - in fact, the @distributed version runs significantly slower!

@time harmonic(10^10)
>>> 53.960678 seconds (29.10 k allocations: 1.553 MiB) 23.60306659488827
job = @spawn harmonic_ever(10^10)
@time fetch(job)
>>> 46.729251 seconds (309.01 k allocations: 15.737 MiB) 23.60306659488827
@time harmonic_distr(10^10)
>>> 143.105701 seconds (1.25 M allocations: 63.564 MiB, 0.04% gc time) 23.603066594889185

让我完全困惑的是"0.04% gc time".我显然缺少了一些东西,而且我看到的示例也不适用于1.0.1版本(例如,其中一个使用@parallel).

What completely and absolutely baffles me is the "0.04% gc time". I'm clearly missing something and also the examples I saw weren't for 1.0.1 version (one for example used @parallel).

推荐答案

您的分布式版本应为

function harmonic_distr2(n::Int64)
    x = @distributed (+) for i in n:-1:1
        1/i # no x assignment here
    end
    x
end

@distributed循环将在每个工作者上最终在主进程上积累1/i的值.

The @distributed loop will accumulate values of 1/i on every worker an then finally on the master process.

请注意,通常也最好使用 BenchmarkTools @btime@time用于基准测试:

Note that it is also generally better to use BenchmarkTools's @btime macro instead of @time for benchmarking:

julia> using Distributed; addprocs(4);

julia> @btime harmonic(1_000_000_000); # serial
  1.601 s (1 allocation: 16 bytes)

julia> @btime harmonic_distr2(1_000_000_000); # parallel
  754.058 ms (399 allocations: 36.63 KiB)

julia> @btime harmonic_distr(1_000_000_000); # your old parallel version
  4.289 s (411 allocations: 37.13 KiB)

如果仅在一个进程上运行,并行版本当然会慢一些.

The parallel version is, of course, slower if run only on one process:

julia> rmprocs(workers())
Task (done) @0x0000000006fb73d0

julia> nprocs()
1

julia> @btime harmonic_distr2(1_000_000_000); # (not really) parallel
  1.879 s (34 allocations: 2.00 KiB)

这篇关于朱莉娅并行性:@distributed(+)比串行速度慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆