R foreach循环中的负载平衡 [英] load-balancing in R foreach loops

查看:67
本文介绍了R foreach循环中的负载平衡的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法修改R foreach循环如何使用doParallel后端进行负载平衡?当并行化执行时间非常不同的任务时,它可以发生在所有节点上,但是一个节点已经完成了他们的任务,而最后一个节点仍然有几个任务要做.这是一个玩具示例:

Is there a way to modify how R foreach loop does load balancing with doParallel backend ? When parallelizing tasks that have very different execution time, it can happen all nodes but one have finished their tasks while the last one still have several tasks to do. Here is a toy example:

library(foreach)
library(doParallel)

registerDoParallel(4)

waittime = c(10,1,1,1,10,1,1,1,10,1,1,1,10,1,1,1)

w = iter(waittime)

foreach(i=w) %dopar% {
    message(paste("waiting",i, "on",Sys.getpid()))
    Sys.sleep(i)
}

基本上,代码注册4个核心.对于每个循环i,任务是等待waittime[i]秒.但是,由于默认情况下,由于foreach循环中的负载平衡似乎是将任务总数分成具有已注册核心数的长度的集合,因此在上面的示例中,第一个核心接收所有waittime = 10,而其他三个则接收waittime = 1的任务,因此这3个内核将在第一个内核完成第一个任务之前完成所有任务.

Basically, the code register 4 cores. For each loop i, the task is to wait for waittime[i] seconds. However, because the load balancing in the foreach loop seems to be, by default, to split the total number of tasks into sets having a length of the number of registered cores, in the above example, the first core receives all the tasks with waittime = 10, while the 3 others receive tasks with waittime = 1 so that these 3 cores will have finished all their tasks before the first one have finished its first.

是否可以使foreach()一次分发任务?也就是说,在上述情况下,我希望将前4个任务分配到4个内核中,然后将每个下一个任务分配给下一个可用内核.

Is there a way to make foreach() distribute tasks one at a time ? i.e. in the above case, I'd like that the first 4 tasks are distributed among the 4 cores, and then that each next task is distributed to the next available core.

谢谢.

推荐答案

我自己尚未对其进行测试,但是doParallel后端提供了类似于mclapply()mc.preschedule参数的preschedule选项. (请参见 doParallel小插图的第7节.)

I haven't tested it myself, but the doParallel backend provides a preschedule option akin to the mc.preschedule argument in mclapply(). (See section 7 of the doParallel vignette.)

您可以尝试:

mcoptions <- list(preschedule = FALSE)
foreach(i = w, .options.multicore = mcoptions)

这篇关于R foreach循环中的负载平衡的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆