在R并行中将clusterExport导出到单线程 [英] clusterExport to single thread in R parallel

查看:193
本文介绍了在R并行中将clusterExport导出到单线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个大的data.frame分成多个块,然后分别将其分别传递给集群的不同成员.

I would like to split a large data.frame into chunks and pass each individually to the different members of the cluster.

类似的东西:

library(parallel)
cl <- makeCluster(detectCores())
for (i in 1:detectCores()) {
  clusterExport(cl, mydata[indices[[i]]], <extra option to specify a thread/process>)
}

这可能吗?

推荐答案

下面是一个示例,该示例在for循环内使用clusterCall将数据帧的不同块发送给每个工人:

Here is an example that uses clusterCall inside a for loop to send a different chunk of the data frame to each of the workers:

library(parallel)
cl <- makeCluster(detectCores())
df <- data.frame(a=1:10, b=1:10)
ix <- splitIndices(nrow(df), length(cl))
for (i in seq_along(cl)) {
  clusterCall(cl[i], function(d) {
    assign('mydata', d, pos=.GlobalEnv)
    NULL  # don't return any data to the master
  }, df[ix[[i]],,drop=FALSE])
}

请注意,对clusterCall的调用是对cl的子集,以便每次通过for循环在单个工作线程上执行该功能.

Note that the call to clusterCall is subsetting cl in order to execute the function on a single worker each time through the for loop.

在此示例中,您可以使用以下命令验证工人是否已正确初始化:

You can verify that the workers were properly initialized in this example using:

r <- do.call('rbind', clusterEvalQ(cl, mydata))
identical(df, r)

有更简单的方法可以执行此操作,但是此示例将主服务器使用的内存和发送给每个工作线程的数据量最小化.当数据帧很大时,这一点很重要.

There are easier ways to do this, but this example minimizes the memory used by the master and the amount of data sent to each of the workers. This is important when the data frame is very large.

这篇关于在R并行中将clusterExport导出到单线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆