与foreach和mclapply同时并行 [英] parallel r with foreach and mclapply at the same time

查看:110
本文介绍了与foreach和mclapply同时并行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实现一个并行处理系统,它最终将被部署在一个集群上,但是在处理各种并行处理方法如何交互时遇到了困难。

我需要使用for循环来运行一大块代码,其中包含几个大型的矩阵操作列表。为了加快速度,我想用foreach()并行化for循环,并用mclapply并行化列表操作。

示例伪代码:

  cl <-makeCluster(2)
registerDoParallel(cl)

输出< - foreach(k = 1 :2,.packages =various packages){

l_output1 < - mclapply(l_input1,function,mc.cores = 2)
l_output2 < - mclapply(l_input2,function, mc.cores = 2)
return = mapply(cbind,l_output1,l_output2,SIMPLIFY = FALSE)
}

这似乎有效。我的问题是:
$ b $ 1)这是一个合理的方法吗?他们似乎在我的小规模测试中一起工作,但感觉有点笨拙。

2)在任何特定时间将使用多少核心/处理器?当我把它升级到一个集群时,我需要了解我能推多少(这个foreach只循环7次,但是mclapply列表是70个左右的大矩阵)。它似乎创建了6个核心的写法(大概2为foreach和2每个mclapply。

解决方案

这对于一个集群来说是一个非常合理的方法,因为它允许你使用多个节点,同时在每个节点的内核之间仍然使用效率更高的 mclapply ,它也允许你对工人进行一些后期处理(在这种情况下调用 cbind ),这可以显着提高工作效率。



在一台机器上,你的例子将总共创建10个额外的进程:两个由 makeCluster ,每次调用 mclapply 两次(2 + 2(2 + 2)),但是,其中只有4个应该使用任何显着的CPU时间,可以通过重构 mclapply ,所以你只需要在foreach循环中调用一次 mclapply ,这可能会更有效率。



在多台机器上,您将创建相同数量的进程,但每个节点只有两个进程会一次占用大量CPU时间。因为它们分布在多台机器上,所以它应该很好地扩展。



请注意, mclapply 您使用MPI群集。 MPI不喜欢你分叉进程,如 mclapply 所做的那样。它可能只是发出严重的警告,但我也看到了其他问题,所以我建议使用PSOCK群集它使用ssh启动远程节点上的worker,而不是使用MPI。






更新

从parallel和snow创建的集群工作者看起来有一个问题叫 mclapply 包。有关更多信息,请参阅我的问题报告的答案


I am implementing a parallel processing system which will eventually be deployed on a cluster, but I'm having trouble working out how the various methods of parallel processing interact.

I need to use a for loop to run a big block of code, which contains several large list of matrices operations. To speed this up, I want to parallelise the for loop with a foreach(), and parallelise the list operations with mclapply.

example pseudocode:

cl<-makeCluster(2)
registerDoParallel(cl)

outputs <- foreach(k = 1:2, .packages = "various packages") {

    l_output1 <- mclapply(l_input1, function, mc.cores = 2)
    l_output2 <- mclapply(l_input2, function, mc.cores = 2)
    return = mapply(cbind, l_output1, l_output2, SIMPLIFY=FALSE)
}

This seems to work. My questions are:

1) is it a reasonable approach? They seem to work together on my small scale tests, but it feels a bit kludgy.

2) how many cores/processors will it use at any given time? When I upscale it to a cluster, I will need to understand how much I can push this (the foreach only loops 7 times, but the mclapply lists are up to 70 or so big matrices). It appears to create 6 "cores" as written (presumably 2 for the foreach, and 2 for each mclapply.

解决方案

I think it's a very reasonable approach on a cluster because it allows you to use multiple nodes while still using the more efficient mclapply across the cores of the individual nodes. It also allows you to do some of the post-processing on the workers (calling cbind in this case) which can significantly improve performance.

On a single machine, your example will create a total of 10 additional processes: two by makeCluster which each call mclapply twice (2 + 2(2 + 2)). However, only four of them should use any significant CPU time at a time. You could reduce that to eight processes by restructuring the functions called by mclapply so that you only need to call mclapply once in the foreach loop, which may be more efficient.

On multiple machines, you will create the same number of processes, but only two processes per node will use much CPU time at a time. Since they are spread out across multiple machines it should scale well.

Be aware that mclapply may not play nicely if you use an MPI cluster. MPI doesn't like you to fork processes, as mclapply does. It may just issue some stern warnings, but I've also seen other problems, so I'd suggest using a PSOCK cluster which uses ssh to launch the workers on the remote nodes rather than using MPI.


Update

It looks like there is a problem calling mclapply from cluster workers created by the "parallel" and "snow" packages. For more information, see my answer to a problem report.

这篇关于与foreach和mclapply同时并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆