multidplyr:将功能分配给集群 [英] multidplyr : assign functions to cluster

查看:78
本文介绍了multidplyr:将功能分配给集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(请参阅下面的工作解决方案)

(see working solution below)

我想使用multidplyr来并行化一个函数:

I want to use multidplyr to parallelize a function :

calculs.R
f <- function(x){
return(x+1)
}

main.R
library(dplyr)
library(multidplyr)
source("calculs.R")
d <- data.frame(a=1:1000,b=sample(1:2,1000),replace=T)

result <- d %>% 
   partition(b) %>% 
     do(f(.)) %>%
     collect()  

然后我得到:

Initialising 3 core cluster.
Error in checkForRemoteErrors(lapply(cl, recvResult)) : 
  2 nodes produced errors; first error: could not find function "f"
In addition: Warning message:
group_indices_.grouped_df ignores extra arguments 

如何为每个内核分配源函数?

How can I assign sourced functions to each core?

=================

==================

这是完美的脚本:

必须提取要更新的值,然后将结果转换为数据框

Must extract the value to update, and turn the result into a dataframe

calcul.R
f <- function(x){
    return(data.frame(x$a+1))
    }

必须设置集群并分配源函数

Must set the clusters and assign the sourced functions

main.R
 library(dplyr)
library(multidplyr)
source("calculs.R")

cl <- create_cluster(3)
set_default_cluster(cl)
cluster_copy(cl, f)

d <- data.frame(a=1:10,b=c(rep(1,5),rep(2,5)))

  result <- d %>%
   partition(b) %>%
     do(f(.)) %>%
     collect()

推荐答案

您似乎初始化了集群(尽管未显示此部分).您需要将变量/函数从全局环境导出到每个工作人员.假设您将群集设置为

It looks like you initialized a cluster (though you don't show this part). You need to export variables/function from your global environment to each worker. Assuming you made your cluster as

cl <- create_cluster(3)
set_default_cluster(cl)

你能尝试

cluster_copy(cl, f)    

这会将 f 复制并导出到每个工作人员(我认为...)

This will copy-and-export f to each worker (I think...)

额外

您可能会遇到另一个问题,那就是您的函数接受 x 作为参数,并向其添加1

You'll likely run into another problem which is that your function accepts x as an argument, to which you add 1

f <- function(x){
         return(x+1)
}

由于您要向 f 传递数据帧,因此您需要的是 data.frame + 1 ,这没有任何意义.您可能需要将功能更改为类似的

Since you're passing a data frame to f, you are asking for data.frame+1, which doesn't make sense. You might want to change your function to something like

f <- function(x){
         return(x$a+1)
}

这篇关于multidplyr:将功能分配给集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆