如何使用Julia并行运行方法? [英] How to run a method in parallel using Julia?

查看:165
本文介绍了如何使用Julia并行运行方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读Julia的并行计算文档,并且从未做过任何并行编码,所以我想要一个更柔和的介绍.因此,我想到了一个(可能)简单的问题,无法弄清楚如何以并行Julia模式进行编码.

I was reading Parallel Computing docs of Julia, and having never done any parallel coding, I was left wanting a gentler intro. So, I thought of a (probably) simple problem that I couldn't figure out how to code in parallel Julia paradigm.

假设我有一些实验的矩阵/数据框df.它的N行是变量,而M列是样本.我有一个方法pwCorr(..)来计算行的成对相关性.如果我想要一个包含所有成对相关性的NxN矩阵,我可能会运行一个for循环,该循环会迭代N*N/2(矩阵的上三角或下三角)并填充值;但是,由于每个pwCorr()调用都独立于其他调用,因此这似乎是一件很完美的事情. (我这样正确地思考什么可以并行化,什么不能并行化吗?)

Let's say I have a matrix/dataframe df from some experiment. Its N rows are variables, and M columns are samples. I have a method pwCorr(..) that calculates pairwise correlation of rows. If I wanted an NxN matrix of all the pairwise correlations, I'd probably run a for-loop that'd iterate for N*N/2 (upper or lower triangle of the matrix) and fill in the values; however, this seems like a perfect thing to parallelize since each of the pwCorr() calls are independent of others. (Am I correct in thinking this way about what can be parallelized, and what cannot?)

要做到这一点,我觉得我必须创建一个由@parallel for循环填充的DArray.如果是这样,我不确定在Julia中如何实现.如果那不是正确的方法,我想我什至不知道从哪里开始.

To do this, I feel like I'd have to create a DArray that gets filled by a @parallel for loop. And if so, I'm not sure how this can be achieved in Julia. If that's not the right approach, I guess I don't even know where to begin.

推荐答案

这应该有效,首先您需要

This should work, first you need to propagate the top level variable (data) to all the workers:

 for pid in workers()
       remotecall(pid, x->(global data; data=x; nothing), data)
       end

然后使用带有一些精美索引的DArray构造函数按块执行计算:

then perform the computation in chunks using the DArray constructor with some fancy indexing:

corrs = DArray((20,20)) do I
         out=zeros(length(I[1]),length(I[2]))
         for i=I[1], j=I[2]
           if i<j 
             out[i-minimum(I[1])+1,j-minimum(I[2])+1]= 0.0
           else
             out[i-minimum(I[1])+1,j-minimum(I[2])+1] = cor(vec(data[i,:]), vec(data[j,:]))
           end
         end
         out 
       end

更详细地讲,DArray构造函数采用一个函数,该函数采用索引范围的元组,并返回与这些索引范围相对应的结果矩阵的块.在上面的代码中,I是范围的元组,其中I[1]是第一个范围.您可以通过以下方式更清楚地看到这一点:

In more detail, the DArray constructor takes a function which takes a tuple of index ranges and returns a chunk of the resulting matrix which corresponds to those index ranges. In the code above, I is the tuple of ranges with I[1] being the first range. You can see this more clearly with:

julia> DArray((10,10)) do I
       println(I)
       return zeros(length(I[1]),length(I[2]))
       end
        From worker 2:  (1:10,1:5)
        From worker 3:  (1:10,6:10)

在这里您可以看到它在第二个轴上将数组拆分为两个块.

where you can see it split the array into two chunks on the second axis.

该示例中最棘手的部分是通过减去最小元素,然后为基于Julia的1的索引加1,从而从这些全局"索引范围转换为局部索引范围. 希望有帮助!

The trickiest part of the example was converting from these 'global' index ranges to local index ranges by subtracting off the minimum element and then adding back 1 for the 1 based indexing of Julia. Hope that helps!

这篇关于如何使用Julia并行运行方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆