如何使用 Julia 并行运行方法? [英] How to run a method in parallel using Julia?
问题描述
我正在阅读 Julia 的 Parallel Computing 文档,但从未做过任何并行编码,所以我想要一个更温和的介绍.所以,我想到了一个(可能)简单的问题,我无法弄清楚如何在并行 Julia 范式中编写代码.
I was reading Parallel Computing docs of Julia, and having never done any parallel coding, I was left wanting a gentler intro. So, I thought of a (probably) simple problem that I couldn't figure out how to code in parallel Julia paradigm.
假设我有一个来自某个实验的矩阵/数据框 df
.它的 N
行是变量,M
列是样本.我有一个方法 pwCorr(..)
计算行的成对相关性.如果我想要一个包含所有成对相关性的 NxN 矩阵,我可能会运行一个 for 循环,该循环会迭代 N*N/2
(矩阵的上三角形或下三角形)并填写价值;然而,这似乎是一个完美的并行化处理,因为每个 pwCorr()
调用都是独立于其他调用的.(我以这种方式思考什么可以并行化,什么不能并行化是否正确?)
Let's say I have a matrix/dataframe df
from some experiment. Its N
rows are variables, and M
columns are samples. I have a method pwCorr(..)
that calculates pairwise correlation of rows. If I wanted an NxN matrix of all the pairwise correlations, I'd probably run a for-loop that'd iterate for N*N/2
(upper or lower triangle of the matrix) and fill in the values; however, this seems like a perfect thing to parallelize since each of the pwCorr()
calls are independent of others. (Am I correct in thinking this way about what can be parallelized, and what cannot?)
为此,我觉得我必须创建一个由 @parallel
for 循环填充的 DArray
.如果是这样,我不确定如何在 Julia 中实现这一点.如果这不是正确的方法,我想我什至不知道从哪里开始.
To do this, I feel like I'd have to create a DArray
that gets filled by a @parallel
for loop. And if so, I'm not sure how this can be achieved in Julia. If that's not the right approach, I guess I don't even know where to begin.
推荐答案
这应该可以,首先你需要 将顶层变量(数据)传播给所有工作人员:
This should work, first you need to propagate the top level variable (data) to all the workers:
for pid in workers()
remotecall(pid, x->(global data; data=x; nothing), data)
end
然后使用带有一些花哨索引的 DArray 构造函数分块执行计算:
then perform the computation in chunks using the DArray constructor with some fancy indexing:
corrs = DArray((20,20)) do I
out=zeros(length(I[1]),length(I[2]))
for i=I[1], j=I[2]
if i<j
out[i-minimum(I[1])+1,j-minimum(I[2])+1]= 0.0
else
out[i-minimum(I[1])+1,j-minimum(I[2])+1] = cor(vec(data[i,:]), vec(data[j,:]))
end
end
out
end
更详细地说,DArray
构造函数采用一个函数,该函数采用索引范围的元组并返回对应于这些索引范围的结果矩阵的块.在上面的代码中,I
是范围的元组,其中 I[1]
是第一个范围.您可以通过以下方式更清楚地看到这一点:
In more detail, the DArray
constructor takes a function which takes a tuple of index ranges and returns a chunk of the resulting matrix which corresponds to those index ranges. In the code above, I
is the tuple of ranges with I[1]
being the first range. You can see this more clearly with:
julia> DArray((10,10)) do I
println(I)
return zeros(length(I[1]),length(I[2]))
end
From worker 2: (1:10,1:5)
From worker 3: (1:10,6:10)
您可以看到它在第二个轴上将数组拆分为两个块.
where you can see it split the array into two chunks on the second axis.
该示例中最棘手的部分是从这些全局"索引范围转换为本地索引范围,方法是减去最小元素,然后为 Julia 的基于 1 的索引加回 1.希望对您有所帮助!
The trickiest part of the example was converting from these 'global' index ranges to local index ranges by subtracting off the minimum element and then adding back 1 for the 1 based indexing of Julia. Hope that helps!
这篇关于如何使用 Julia 并行运行方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!