@distributed似乎可以正常工作,函数返回很奇怪 [英] @distributed seems to work, function return is wonky
问题描述
我正在学习如何在Julia中进行并行计算.我在3x嵌套的 for
循环的开始处使用 @sync @distributed
来并行化事物(请参见底部的代码).从行 println(errCmp [row,col])
中,我可以看到数组 errCmp
的所有元素都已打印出来.例如
I'm just learning how to do parallel computing in Julia. I'm using @sync @distributed
at the start of a 3x nested for
loop to parallelize things (see code at bottom). From the line println(errCmp[row, col])
I can watch all the elements of the array errCmp
be printed out. E.g.
From worker 3: 2.351134946074191e9
From worker 4: 2.3500830193505473e9
From worker 5: 2.3502416529551845e9
From worker 2: 2.3509105625656652e9
From worker 3: 2.3508352842971106e9
From worker 4: 2.3497049296121807e9
From worker 5: 2.35048428351797e9
From worker 2: 2.350742582031195e9
From worker 3: 2.350616273660934e9
From worker 4: 2.349709546599313e9
但是,当函数返回时, errCmp
是我在乞讨时预分配的零数组.
However, when the function returns, errCmp
is the array of zeros I pre-allocate at the begging.
我想念一些收集所有东西的闭门词吗?
Am I missing some closing term to collect everything?
function optimizeDragCalc(df::DataFrame)
paramGrid = [cd*AoM for cd = range(1e-3, stop = 0.01, length = 50), AoM = range(2e-4, stop = 0.0015, length = 50)]
errCmp = zeros(size(paramGrid))
# totalSize = size(paramGrid, 1) * size(paramGrid, 2) * size(df.time, 1)
@sync @distributed for row = 1:size(paramGrid, 1)
for col = 1:size(paramGrid, 2)
# Run the propagation here
BC = 1/paramGrid[row, col]
slns, _ = propWholeTraj(df, BC)
for time = 1:size(df.time, 1)
errDF = propError(slns[time], df, time)
errCmp[row, col] += sum(errDF.totalErr)
end # time
# println("row: ", row, " of ",size(paramGrid, 1)," col: ", col, " of ", size(paramGrid, 2))
println(errCmp[row, col])
end # col
end # row
# plot(heatmap(z = errCmp))
return errCmp, paramGrid
end
errCmp, paramGrid = @time optimizeDragCalc(df)
推荐答案
您没有提供最低限度的工作示例,但我想这可能很难.这是我的MWE.让我们假设我们要使用 Distributed
来计算 Array
列的总和:
You did not provide a minimal working example but I guess it might be hard. So here is mine MWE. Let us assume that we want to use Distributed
to calculate sums of Array
's columns:
using Distributed
addprocs(2)
@everywhere using StatsBase
data = rand(1000,2000)
res = zeros(2000)
@sync @distributed for col = 1:size(data)[2]
res[col] = StatsBase.mean(data[:,col])
# does not work!
# ... because data is created locally and never returned!
end
为了更正上面的代码,您需要提供一个聚合函数(我故意简化示例-可以进行进一步的优化).
In order to correct the above code you need to provide an aggregator function (I keep the example intentionally simplified - a further optimization is possible).
using Distributed
addprocs(2)
@everywhere using Distributed,StatsBase
data = rand(1000,2000)
@everywhere function t2(d1,d2)
append!(d1,d2)
d1
end
res = @sync @distributed (t2) for col = 1:size(data)[2]
[(myid(),col, StatsBase.mean(data[:,col]))]
end
现在让我们看一下输出.我们可以看到,有些值是在worker 2
上计算的,而其他值是在worker 3
上计算的:
Now let us see the output. We can see that some of the values have been calculated on worker 2
while others on worker 3
:
julia> res
2000-element Array{Tuple{Int64,Int64,Float64},1}:
(2, 1, 0.49703681326230276)
(2, 2, 0.5035341367791002)
(2, 3, 0.5050607022354537)
⋮
(3, 1998, 0.4975699181976122)
(3, 1999, 0.5009498778934444)
(3, 2000, 0.499671315490524)
其他可能的改进/修改:
Further possible improvements/modifications:
- 使用
@spawnat
在远程进程(而不是主进程并发送它们)上生成值 - 使用
SharedArray
-这可以在工作人员之间自动分配数据.根据我的经验,需要非常仔细的编程. - 使用
ParallelDataTransfer.jl
在工作人员之间发送数据.非常易于使用,对于大量消息而言效率不高. - 始终考虑使用Julia线程机制(在某些情况下,它使工作更轻松-再次取决于问题)
- use
@spawnat
to generate values at remote processes (instead of the master process and sending them) - use
SharedArray
- this allows to automatically distribute data among workers. From my experience requires very careful programming. - use
ParallelDataTransfer.jl
to send data among workers. Very easy to use, not efficient for huge number of messages. - always consider Julia threading mechanism (in some scenarios it makes life easier - again depends on the problem)
这篇关于@distributed似乎可以正常工作,函数返回很奇怪的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!