如何使用Julia并行运行方法? [英] How to run a method in parallel using Julia?
问题描述
我正在阅读Julia的并行计算文档,并且从未做过任何并行编码,所以我想要一个更柔和的介绍.因此,我想到了一个(可能)简单的问题,无法弄清楚如何以并行Julia模式进行编码.
I was reading Parallel Computing docs of Julia, and having never done any parallel coding, I was left wanting a gentler intro. So, I thought of a (probably) simple problem that I couldn't figure out how to code in parallel Julia paradigm.
假设我有一些实验的矩阵/数据框df
.它的N
行是变量,而M
列是样本.我有一个方法pwCorr(..)
来计算行的成对相关性.如果我想要一个包含所有成对相关性的NxN矩阵,我可能会运行一个for循环,该循环会迭代N*N/2
(矩阵的上三角或下三角)并填充值;但是,由于每个pwCorr()
调用都独立于其他调用,因此这似乎是一件很完美的事情. (我这样正确地思考什么可以并行化,什么不能并行化吗?)
Let's say I have a matrix/dataframe df
from some experiment. Its N
rows are variables, and M
columns are samples. I have a method pwCorr(..)
that calculates pairwise correlation of rows. If I wanted an NxN matrix of all the pairwise correlations, I'd probably run a for-loop that'd iterate for N*N/2
(upper or lower triangle of the matrix) and fill in the values; however, this seems like a perfect thing to parallelize since each of the pwCorr()
calls are independent of others. (Am I correct in thinking this way about what can be parallelized, and what cannot?)
要做到这一点,我觉得我必须创建一个由@parallel
for循环填充的DArray
.如果是这样,我不确定在Julia中如何实现.如果那不是正确的方法,我想我什至不知道从哪里开始.
To do this, I feel like I'd have to create a DArray
that gets filled by a @parallel
for loop. And if so, I'm not sure how this can be achieved in Julia. If that's not the right approach, I guess I don't even know where to begin.
推荐答案
This should work, first you need to propagate the top level variable (data) to all the workers:
for pid in workers()
remotecall(pid, x->(global data; data=x; nothing), data)
end
然后使用带有一些精美索引的DArray构造函数按块执行计算:
then perform the computation in chunks using the DArray constructor with some fancy indexing:
corrs = DArray((20,20)) do I
out=zeros(length(I[1]),length(I[2]))
for i=I[1], j=I[2]
if i<j
out[i-minimum(I[1])+1,j-minimum(I[2])+1]= 0.0
else
out[i-minimum(I[1])+1,j-minimum(I[2])+1] = cor(vec(data[i,:]), vec(data[j,:]))
end
end
out
end
更详细地讲,DArray
构造函数采用一个函数,该函数采用索引范围的元组,并返回与这些索引范围相对应的结果矩阵的块.在上面的代码中,I
是范围的元组,其中I[1]
是第一个范围.您可以通过以下方式更清楚地看到这一点:
In more detail, the DArray
constructor takes a function which takes a tuple of index ranges and returns a chunk of the resulting matrix which corresponds to those index ranges. In the code above, I
is the tuple of ranges with I[1]
being the first range. You can see this more clearly with:
julia> DArray((10,10)) do I
println(I)
return zeros(length(I[1]),length(I[2]))
end
From worker 2: (1:10,1:5)
From worker 3: (1:10,6:10)
在这里您可以看到它在第二个轴上将数组拆分为两个块.
where you can see it split the array into two chunks on the second axis.
该示例中最棘手的部分是通过减去最小元素,然后为基于Julia的1的索引加1,从而从这些全局"索引范围转换为局部索引范围. 希望有帮助!
The trickiest part of the example was converting from these 'global' index ranges to local index ranges by subtracting off the minimum element and then adding back 1 for the 1 based indexing of Julia. Hope that helps!
这篇关于如何使用Julia并行运行方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!